Debugging Sitecore 9.1.1 xConnect Anonymous Contacts Indexing on Azure PaaS and Solr Cloud 7.2.1
Our project uses Sitecore 9.1.1 and Solr Cloud 7.2.1 hosted in Azure PaaS and I wanted to enable the Anonymous Contact Indexing to customize tracked Contacts in the future. So I followed this LinkedIn blog post. However, when I reached the part where I needed run the command to request a rebuild to the xConnect Search Indexer:
XConnectSearchIndexer -requestrebuild
I encountered this error:
Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: Response status code does not indicate success: 404 (Not Found). at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode() at Sitecore.Xdb.Collection.Search.Solr.SolrWriter.<ExecutePostRequest>d__18.MoveNext() — End of stack trace from previous location where exception was thrown — at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Sitecore.Xdb.Collection.Search.Solr.SolrWriter.<ExecuteWithRetryAsync>d__21.MoveNext() — End of stack trace from previous location where exception was thrown — at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Sitecore.Xdb.Collection.Search.Solr.SolrWriter.<ExecuteWithRetryAsync>d__21.MoveNext() — End of stack trace from previous location where exception was thrown — at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Sitecore.Xdb.Collection.Search.Solr.SolrWriter.<RunWithThrottle>d__20`1.MoveNext() — End of inner exception stack trace — at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Task.Wait(TimeSpan timeout) at Sitecore.XConnect.Search.Indexer.IndexerAppDomainHost.IndexerHostAppDomainBridge.RequestRebuild() at Sitecore.XConnect.Search.Indexer.IndexerAppDomainHost.IndexerHostAppDomainBridge.RequestRebuild() at Sitecore.XConnect.Search.Indexer.Program.RequestRebuild() at Sitecore.XConnect.Search.Indexer.Program.Main(String[] args)
The command above was required to make sure that the previous Anonymous Contacts are indexed and show up in the Experience Profile. But this is not just the way to trigger an xDB index rebuild, as documented in this Sitecore page. Actually, this command only creates a new document in the index that indicates that it should start work indexing again on all the Contacts.
Debugging Journey
Together with Sitecore Support, we tracked down the cause of this, and our journey was documented below:
1. Ensure Anonymous Indexing in Enabled in both SearchIndexer and Jobs
- Open the App Service Editor for the *xc-search Azure App Service via App Service Editor and navigate to App_Data/Config/Sitecore/SearchIndexer/sc.Xdb.Collection.IndexerSettings.xml.
- Ensure that this setting has been set to true:
<IndexAnonymousContactData>true</IndexAnonymousContactData>
- Also apply the same setting in App_Data/jobs/continuous/IndexWorker/App_data/config/sitecore/SearchIndexer/sc.Xdb.Collection.IndexerSettings.xml
2. Remove Extra Trailing Slash in xDB Connection String
- Check all connection strings connecting to xdb and remove its extra trailing slash:
3. Check if xDB Core Exists in Solr Cloud
- While inside the *-xc-search App Service, open ConnectionString.config under App_Config and look for one named “solrCore”
- Copy the URL from <domain> just until /solr
- Open another browser tab and paste the copied URL. Also enter credentials as necessary (in our case, it’s been defined under /zzz patch folder).
- Under the Core Selector dropdown found in the side panel, type in xdb:
- Select one of the cores (e.g. xdb_internal_shard1_replica_n1) and looking at its Statistics, under the Num Docs field, check if there are number of documents. In our case, it’s zero.
- At this point, we could hypothesize that since our index is empty, it could be that the Indexing did not start at all.
4. Check if IndexWorker WebJob is Running
- Head to the *-xc-search app in Azure and open WebJobs section from the side panel. WebJobs are like Windows Services.
- Verify if the IndexWorker is running. If yes, try restarting it.
5. Check if IndexWorker Logs are Generated
The logs found under xc-search App_Data/Logs which the App Service uses is returning an exception:
But we also would like to investigate the IndexWorker WebJobs logs that are generated on the fly on a temporary folder.
- Go to Kudo or Advanced Tools in the side panel or the App Service
- Navigate to the System drive by clicking the CMD from the Debug Console
- Go to the directory D:\local\Temp\jobs\continuous\IndexWorker\
- You can notice a folder with randomly generated name
- Under this folder, the logs can be found under App_data/Logs. If the logs folder could not be found, proceed to the next step.
6. Enable IndexWorker Serilog Logging
- Going back to the xc-search App Service Editor, open this file located in App_Data/jobs/continuous/IndexWorker/App_data/config/sitecore/CoreServices/sc.Serilog.xml. Note that this is the actual setting to enable the logs from and not in the temporary folder.
- Under the node Serilog, change the MinimumLevel to from Warning to Information. Your setting should look like this:
<Settings> <Serilog> <Using> <FileSinkAssembly>Serilog.Sinks.File</FileSinkAssembly> <RollingFileSinkAssembly>Serilog.Sinks.RollingFile</RollingFileSinkAssembly> </Using> <MinimumLevel> <Default>Information</Default> </MinimumLevel> <WriteTo> <FileSink> <Name>RollingFile</Name> <Args> <pathFormat>App_Data\\Logs\\xconnect-log-${MachineName}-${InstanceName}-{Date}.txt</pathFormat> <retainedFileCountLimit>7</retainedFileCountLimit> <buffered>False</buffered> </Args> </FileSink> </WriteTo> <Properties> <Application>XConnect</Application> </Properties> </Serilog> </Settings>
- Under the same folder, disable the following configs by appending .disabled text in the file extension.
- sc.Serilog.AppInsights.xml
- sc.Serilog.TraceSink.xml
- Restart the IndexWorker WebJob so that it will generate the temp folder now with logs. This could take time to accumulate data. Then you can verify that the config changes have been applied by accessing the temporary generated folder and navigating to App_data/config/sitecore/CoreServices. If you can see the disabled configs, that means the change already reflected.
7. Investigate if there are Errors in the IndexWorker Logs
- While in the temporary logs folder, under the App_data/Logs folder open the file named in this format:
xconnect-log-RD<randomnumbers>-XConnectIndexer_AppDomain-<yyyymmdd>.txt - Look for errors. In our case, we found a connection issue error:
2021-01-11 23:58:51.051 +00:00 [Error] The attempt to recover from previous failure has not been successful. There will be another attempt. Attempts count: 949 Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.ShardManagementException: Store Error: Login failed for user 'admin'.. The error occurred while attempting to perform the underlying storage operation during 'Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.StoreException: Error occurred while performing store operation. See the inner SqlException for details. ---> System.Data.SqlClient.SqlException: Login failed for user 'admin'. at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, DbConnectionPool pool, String accessToken, Boolean applyTransientFaultHandling, SqlAuthenticationProviderManager sqlAuthProviderManager) at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection) at System.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection) at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry) at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry) at System.Data.SqlClient.SqlConnection.Open() at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.SqlUtils.WithSqlExceptionHandling(Action operation) --- End of inner exception stack trace --- at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.SqlUtils.WithSqlExceptionHandling(Action operation) at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.StoreOperationGlobal.<Do>b__0() at Microsoft.Azure.SqlDatabase.ElasticScale.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func) at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.StoreOperationGlobal.Do()' operation on the shard map manager database. See the inner StoreException for details. at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.StoreOperationGlobal.Do() at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.ShardMapManagerFactory.GetSqlShardMapManager(String connectionString, ShardMapManagerLoadPolicy loadPolicy, RetryBehavior retryBehavior, EventHandler`1 retryEventHandler, Boolean throwOnFailure) at Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.ShardMapManagerFactory.TryGetSqlShardMapManager(String connectionString, ShardMapManagerLoadPolicy loadPolicy, RetryBehavior retryBehavior, ShardMapManager& shardMapManager) at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.ElasticScaleConnectionManager.InitializeShardMapManager() at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.ElasticScaleConnectionManager.get_ShardMapManager() at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.ElasticScaleConnectionManager.GetRangeShardMap(String rangeShardMapName) at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.ElasticScaleConnectionManager.<>c__DisplayClass36_0.<GetShards>b__0() at System.Lazy`1.CreateValue() at System.Lazy`1.LazyInitValue() at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.MemoryCacheManager.AddOrGetExistingFromCache[T](String key, Func`1 valueFactory) at Sitecore.Xdb.Collection.Data.SqlServer.Configuration.ElasticScaleConnectionManager.GetShardLocations(String rangeShardMapName) at Sitecore.Xdb.Collection.Data.SqlServer.Managers.SqlDataRecordsManager`2.<GetChanges>d__51.MoveNext()
8. Ensure IndexWorker Connection Strings Credentials is Correct
- While under the temporary folder, open App_Config/ConnectionStrings.config
- Under the connection name “collection” copy the credentials
- Try to use those in a new SSMS Database connection to verify if the credentials are working.
- Then go back to the original IndexWorker connection strings under the xc-search‘s /App_Data/jobs/continuous/IndexWorker/App_Config and input the correct “collection” credential.
- Restart the IndexWorker WebJob.
9. Look for Errors related to Indexing
- Now that the credentials issue has been fixed, look for possible errors related to indexing. The IndexWorker WebJob are generating a gibberish new temporary folder again, so give it some time.
- After a few minutes, go check the App_data/Logs folder and open the latest xConnect log
- We found one:
[Error] Failed indexing next set of changes. There will be an attempt to recover from the failure. System.Net.Http.HttpRequestException: Response status code does not indicate success: 400 (Bad Request). at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode() at Sitecore.Xdb.Collection.Search.Solr.SolrWriter.<ExecutePostRequest>d__18.MoveNext()
- The highlighted line is an important finding as the focus of investigation will now move to Solr.
10. Look for Solr Admin Errors in Logs
- Open another browser tab and navigate to the logging of Solr admin panel.
- We found this repeating series of same ERRORs:
RequestHandlerBase org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: _uniqueid
- At this point, we can see that this index field called _uniqueid is what causes the error because theoretically it should be added for content indexes but NOT for xdb index, so our direction forward is to create a new xDB Solr Core from scratch and then remove this index field from its schema.
11. Create a new xDB Solr Collection in Solr Admin Panel
- While in Solr admin panel, click Collections from the side panel
- Since we already have a collection named xdb_internal, we can add a similar name. In our case, we named it xdb_index instead. You can name it anything you want.
- This Sitecore documentation on creating a new Solr Collection was the initial reference used for creating a new Collection. However, this documentation was intended for Sitecore 9.3 and Solr 8.1.1. We are using Sitecore 9.1.1 and Solr 7.2.1 so this needs a bit of tweaking. The screenshot below shows the difference:
- NOTE: Putting a value of 2 in the replicationFact, shows this error:
- Now that the creation was successful, you can see it added in the Collection list:
- Follow the same steps above and create its rebuild index: xdb_rebuild_index
12. Assign Aliases to the new Solr Collections
- Assign aliases to both of the newly created xdb Collections as you must use these in your connection strings.
- While in the same screen, click the button called Create Alias. A dialog box pops up and assign the new alias name and assign it to the newly created Collection:
- To verify the assigned aliases to the Solr Collections, try using this URL after the domain:
/solr/admin/collections?action=LISTALIASES
12. Adjust the XC-Search Connection Strings
- Go back to the ConnectionStrings.config of the xc-search App Service and change the solrCore connectionString:
- Restart the XC-Search WebJob.
- After creating temporary logs folder, again verify inside if the connectionstrings.config reflected the “xdb2” changes.
- Check if the error still happens. Upon checking the xconnect logs, the issue is still there. So creating a new collection would not improve the symptoms. However, both the default xdb_internal collection and newly created xdb_index are using the config set called xdb. So we can further deduce that perhaps the xdb configSet may be what’s causing the error. We need to find where this configSet is in the Azure App Services file system.
13. Find where the Indexes are Located in the Solr Cloud File System
- Normally in the non-Solr Cloud instances, we can find the indexes defined under Azure App Service file system/wwwroot/server/solr/<index_name>/ folder, but in a Solr Cloud instance, it’s different.
- Open the Solr Cloud Azure App Service instance file system.
NOTE: We usually use the App Service Editor (Preview) tool from Azure to explore the file system, however, since the Solr Cloud instance have been setup in a Docker Container, you would not see the option, but you can use other tools like Kudu (Advanced Tools) > Bash or FTP.
- The location of the Solr Cloud indexes are found under /home/node1-solr when using Kudu Bash. You can also see this clue from the Deployment Center > Registry Settings > Config:
14. Locate the ConfigSet from this Solr Cloud Directory
- While in same folder (/home/node1-solr), navigate further down to /home/node1-solr/configsets/xdb/conf directory.
- Open the file called managed-schema by typing the cat command:
cat managed-schema
- Now search for text instances of the uniqueid field, the one that has been causing the error. Since this is viewed in a browser, you can use Ctrl+F to do text search.
- Next we will need to remove this line in the Solr Cloud server.
15. Edit the Managed Schema
- Since the edit and vi commands do not work here in Bash, I used FTP to edit and transfer the file instead. Plus, this has the added benefit of being able to backup the file too.
- Download the Publish Profile of this Solr Cloud App Service from the Deployment Center and connect to it using an FTP tool (I personally use FileZilla), navigate to the same configSet directory and transfer the managed-schema file. Back it up first as a best practice.
- Comment out this line where the _uniqueid is defined:
<field name="_uniqueid" type="string" indexed="true" stored="true" required="true" multiValued="false" />
- Then search for the node called <uniqueKey> and change its value to use id instead of _uniqueId
<uniqueKey>id</uniqueKey>
- Transfer back the file to the current folder in the FTP client server.
- Restart the Solr Cloud App Service.
- Visit Solr Admin Panel > Logging and check if there are still errors related to the _uniqueId field.
16. Cancel the use of xdb2 core
17. Verify if the Solr Collection still has the _uniqueId field
- Select the xdb_internal from the Collections then search for the field called _uniqueId:
- This means that the configSet did not apply to the already created xdb_internal Solr Collection.
- Delete all xdb-related Solr Collections xdb_index, xdb_rebuild_index, xdb_internal and xdb_rebuild_internal.
18. Recreate xDB Indexes
- Create another xdb primary index called xdb_index with the configset of xdb selected. Note that we do not name it as xdb as it will be the same as with alias that will be the name used for swapping after index rebuild.
- Create aliases again both for xdb and xdb rebuild:
- Under the Solr Admin Panel > Cloud > Tree, you can check the configSets:
- Restart the Solr Cloud instance. *not sure if this helps
- Correct configSet location:
- Now it’s there: