-
Notifications
You must be signed in to change notification settings - Fork 259
Description
External data source naming conflict when multiple storage accounts have same container names
Problem
The current implementation of source_SetupExternalDataSource uses only the container name for the external data source name:
set @externalds_name = (select value from string_split(@StorageDS, '/', 1) where ordinal = 4)This causes naming conflicts when different storage accounts have containers with identical names, resulting in failed deployments or overwritten external data sources.
Real-World Use Case
I encountered this issue while using Synapse Link for D365 feature, integrating FinOps data with Data Lake. In my setup:
- Profile 1: Configured for Parquet export → Storage account with auto-generated container
- Profile 2: Configured for CSV export → Different storage account with identical auto-generated container name
Both profiles automatically created containers with the same name pattern, causing the stored procedure to fail when trying to create external data sources for both configurations.
Example from my environment:
Profile 1 (Parquet): https://stdlsd365syndev001.blob.core.windows.net/dataverse-XXXXXX-unq57e32b3f934df011be530022486db
Profile 2 (CSV): https://stdlsd365syndev002.blob.core.windows.net/dataverse-XXXXXX-unq57e32b3f934df011be530022486db
Both would try to create an external data source named dataverse-XXXXXX-unq57e32b3f934df011be530022486db, causing conflicts where:
- The second call overwrites/duplicate the first external data source
- Subsequent queries may access the wrong storage account/format
- Database scoped credentials get mixed up between different storage accounts
Proposed Solution
Include the storage account name in the external data source name to ensure uniqueness:
in script : Step1_EDL_To_SynapseLinkDV_CreateUpdate_SetupScript.sql
-- Extract storage account name (first part of the hostname before the first dot)
set @storageAccountName = (select LEFT(value, CHARINDEX('.', value) - 1)
from string_split(@StorageDS, '/', 1)
where ordinal = 3)
-- Extract container name (4th segment)
set @containerName = (select value from string_split(@StorageDS, '/', 1) where ordinal = 4)
-- Combine storage account name and container name for unique external data source name
set @externalds_name = @storageAccountName + '_' + @containerNameExample Results
With the proposed fix:
https://account1.blob.core.windows.net/mycontainer→ External data source:account1_mycontainerhttps://account2.blob.core.windows.net/mycontainer→ External data source:account2_mycontainer
Benefits
- ✅ Eliminates naming conflicts between storage accounts
- ✅ Maintains backward compatibility - existing calling code unchanged
- ✅ No breaking changes - OUTPUT parameter still works the same way
- ✅ More descriptive names - easier to identify which storage account is being accessed
- ✅ Supports enterprise scenarios with multiple storage accounts
Backward Compatibility
This change is fully backward compatible:
- The stored procedure signature remains unchanged
- The OUTPUT parameter
@externalds_namestill returns a valid name - Existing calling scripts continue to work without modification
- No impact on existing external data sources (they continue to function)
Alternative Approaches Considered
- Hash-based naming: Would make names less readable
- GUID suffixes: Would make names unnecessarily long and non-descriptive
- Manual name parameters: Would require breaking changes to calling code
The proposed solution balances uniqueness, readability, and compatibility.
Implementation
I'm happy to provide a pull request with the complete fix if the maintainers are interested in this enhancement.