Enabling and Debugging Sitecore 9.1.1 Experience Analytics in Azure PaaS

Enabling Sitecore Analytics

Sitecore xConnect is a crucial interface needed to level up the overall marketing and ultimately site experience. It’s a service layer that sits in between the xDB Collection database, xDB index and client, that lets marketers focus on the right customer data, put it all in one place to analyze and get to market faster. However, setting up xConnect in Sitecore 9.1.1 (rev. 002459) hosted in Azure PaaS could be daunting task, especially when met by errors like when the Experience Analytics not showing any result. The steps done during the collaboration with Sitecore Support are documented here to preserve the debugging journey.

I referred to this Sitecore documentation to enable xDB and Tracker.

Also it’s worth mentioning that the xDB features will work well and are allowed by the other Sitecore roles, only when a valid license has been set in place in the following Azure App Services, usually under the /App_Data folder:

  • Content Management (*-cm)
  • Content Delivery (*-cd)
  • Cortex Processing
  • Cortex Reporting
  • Marketing Automation Operations (*-ma-ops)
  • Marketing Automation Reporting (*-ma-rep)
  • xDB Processing (*-prc)
  • xDB Reporting (*-rep)
  • Sitecore Identity (*-si) – under /sitecoreruntime folder
  • xConnect Collection (*-xc-collect)
  • xConnect Reference Data (*-xc-refdata)
  • xConnect Search (*-xc-search) – aside from App_Data, ensure to check if there is license.xml here too: App_Data/jobs/continuous/IndexWorker/App_data

Having done all these things, when I checked my Experience Analytics panel in Sitecore, I still got no results:

No interaction results from Experience Analytics

Debugging Sitecore Analytics:

1. Investigate the Reporting Database Shard

First thing to remember here is that the Sitecore Analytics report found in the Experience Analytics DOES NOT use the Solr Index, instead it uses the aggregated data in SQL, so things like the Anonymous and Known Contact data and interactions are collected like raw records stored in the Reporting Database Shards.

  • Get the xDB database credentials by opening any xConnect App Service (*-xc-collect, *-xc-refdata or *-xc-search) and find which xDB the App is connecting to. I used the connection string named “collection” from xConnect Collection App Service and copied the credentials, as masked below. Additionally, if credentials are missing, double check if they are defined in the App_config/include config patches.
xDB Collection database connection string
  • Copy the database credentials, open SSMS and connect.
SSMS new connection to a Database Engine

Depending on the setup, some kind of an Azure rule may appear like below image. Sign in using your Azure account, then as for the firewall rule, just tick the “Add my client IP address” radio.

Firewall validation
  • Once you are in, expand either Shard 0 or Shard 1 database and navigate to the xdb_collection.Interaction table. I initially checked it at Shard 0.
xDB Collection Interaction table at Shard 0
  • Check if data is being submitted by selecting the top 1000 rows of the table and filtering it by Created date of more than the desired date.
Filter Interactions table by Created
  • Since we are getting zero results from it, we wanted to check if xDB and Tracking were really enabled in the first place.

2. Verify if the xDB and Tracking are enabled

CM Servers do not need xDB and Tracking to be enabled because we are tracking users from the CD environment, so:

  • Open CD App Service and use any tool to access the File System like the Azure App Service Editor, Kudu or FTP. Navigate to the most recent logs to under App_Data/logs. In the App Service Editor, you can check the file name, but personally if you’re a double-checker like me, I opted using FTP to retrieve the logs using an FTP application (I prefer FileZilla). Of course, make sure that you have proper credentials to the CD App Service, which you may get from the App Service’s Publish Profile.
  • Read the most recent log and look for the part where it confirms if the xDB or Tracking is enabled or not.
xDB and Tracking disabled

3. Check xDB and Tracking-related patches

Any improperly named patches may make it harder for other developers to locate relevant configs, so check for those:

  • Under App_Config/Include/zzz, investigate each related patch in the list of config files.
  • If you have any patch that looks like this, remove it:
Analytics-related patch file
  • Restart the CD Server.
  • After the restart, look at the logs and see if the xDB and Tracking.
xDB and Tracking now enabled!
  • However, note that Analytics will take some time to gather in data because tracking data is submitted on Session End.

4. Abandon Site Session to get tracked data

If you have some sort of a script on the site to end the user session, you can trigger it, but if not, you can follow these steps:

  • Download and extract this file called abandon.aspx
  • Using FTP, upload this file in your CD Server, the default site root would be fine.
  • Access it via URL: <domain.com>/abandon.aspx
  • Click Abandon session

5. Recheck the Reporting Database Shard

After ensuring that a user session has ended in the CD site, revisit the Reporting Database to check if there has been a tracked interaction record.

  • Using the previous SQL query, re-run it from the current Shard.
  • If there are still no results, move to the other database shard (Shard 1 in my case) and filter the same xdb_collection.Interaction table.
  • Fortunately, we got the results that we wanted to see!
Interaction results showing up in Shard 1

So now that we are sure that we are getting the right data, we wanted to verify if this is showing in the Experience Analytics. However, even if we filter for the date for tomorrow, we still could not see the interactions:

But no interactions showing up in the Analytics

6. Verify if Processing Instance is Running

The Processing App Service is a separate instance that performs the aggregation of the Analytics data to the reports.

  • Navigate to its Azure App Service (*-prc)
  • Under the Overview panel, click on the URL to verify if it is running
  • If you see a Sitecore notfound page, it is.
Processing App Service is running

7. Investigate the Processing Logs for Errors

  • Look into the logs generated by the Processing App Service. If you haven’t FTP’d to this *-prc App Service before, get the publish profile and locate the credentials under “FTP”.
  • Use those credentials to create a new connection using any FTP client
  • Navigate to /site/wwwroot/App_Data/logs/<latest folder>
  • Download the latest log file locally to investigate
  • Open using any text editor and find exceptions. This is the error that we got:
15040 10:33:07 ERROR Exception when executing agent aggregation/aggregator
Exception: Sitecore.Xdb.Processing.Queue.ProcessingPoolException
Message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Source: Sitecore.Xdb.Processing.Queue.SqlServer
   at Sitecore.Xdb.Processing.Queue.SqlServer.SqlServerProcessingPool`2.<OnCheckOutAsync>d__41.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Xdb.Processing.Queue.ProcessingPool`1.<CheckOutAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Analytics.Processing.AsyncPoolScheduler`2.<TryGetNextAsync>d__18.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Analytics.Aggregation.InteractionAggregationAgent.<ExecuteCoreAsync>d__15.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Analytics.Core.Agent.<ExecuteAsync>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sitecore.Analytics.Core.AsyncBackgroundService.<ExecuteAgentAsync>d__22.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Sitecore.Analytics.Core.AsyncBackgroundService.<RunAsync>d__26.MoveNext()

Nested Exception

Exception: System.Data.SqlClient.SqlException
Message: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
Source: .Net SqlClient Data Provider
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean&amp; dataReady)
   at System.Data.SqlClient.SqlDataReader.TryCloseInternal(Boolean closeReader)
   at System.Data.SqlClient.SqlDataReader.Close()
   at System.Data.Common.DbDataReader.Dispose(Boolean disposing)
   at Sitecore.Xdb.Processing.Queue.SqlServer.SqlServerProcessingPool`2.<OnCheckOutAsync>d__41.MoveNext()

Nested Exception

Exception: System.ComponentModel.Win32Exception
Message: The wait operation timed out

8. Verify Processing Server Connection

  • Going back to the Processing App Service, navigate to the App Service Editor and open App_Config/ConnectionStrings.config.
  • Check the connection named “xdb.processing.pools
  • Ensure that the credentials are correct by trying a connection via SSMS

9. Check Live Processing Pool Attempts

  • Once verified and logged in, navigate to xdb_processing_pools.InteractionLiveProcessingPool table
  • Select Top 1000 rows and check the column called Attempts
Maximum number of attempts reached
  • One important thing to note is that the xdb_processing_pools.InteractionHistoryProcessingPool is used only when you are rebuilding the whole database, but it usually is empty. Then the xdb_processing_pools.InteractionLiveProcessingPool means that it is processing all of the interactions that happened recently as they get queued automatically.
  • These 32767 attempts were not processed successfully (Side story: Earlier developers tried to enable xDB and Analytics so there were unprocessed attempts). However, the latest attempt count which is 0 also means that this interaction is still waiting in the queue to be processed. Usually there are no records which are more than 0 because when a record is processed, it is immediately removed from this queue.
    • On another side note, I’ve checked the logs from when this attempts maxed out and found some possible WARN level logs related to xConnect.
    • The Entity ID is the same ID found in the database records
1400 20:28:28 WARN  Failed to load 1 entities from XConnect.
Entity d4477d67-1872-0000-0000-05d435821a15 was not found.
Database record matches the xConnect Entity Id
  • One of the possible reasons for this is that the Entity does not exist in xConnect, that’s why the attempts to reconnect fails over and over.

10. Check the Azure SQL Resource & Performance

A reason why the Processing Server is denying the attempt at processing may be because the Azure SQL database is choking in resources. To verify:

  • Go back to Azure Portal home page, and navigate to the SQL Databases.
  • In the Filter text box, filter the one with the suffix *-pools-db and click to open
  • Under Compute utilization panel, you may see something like this:
100% compute utilization for the Pools Database
  • Open the Elastic Pool itself by clicking on the Elastic Pool URL found in the upper right section of the overview.
Shared Elastic Pool URL
  • At this point, check if the elastic pool resources are getting maxed out:
Used space in Elastic Pool is just 27% so that’s fine
  • Go back to the Processing Server and stop it. The idea here is to check what will happen to the pool resources if Processing is stopped.
Usage dropped to zero which is ideal scenario we were hoping to see
  • Start it again. What is happening is that the Processing Server is now ready to always checks this Pool for new records. However, after the restart, it looked the same. Looks like the nudging of the server did nothing:
Restarting did not solve the resource choking issue
  • Also check if the other tables inside the Pools database has tons of records that may affect performance. If this is the case, an upgrade of tier may help.

11. Clear Out Processing Pool Attempts

Going back to the Pools DB, we needed to clear out the records that may be causing the resource choke.

  • Delete the records from xdb_processing_pools.GenericProcessingPool. You can do that easily by selecting top 1000 rows and just changing the SELECT statement to DELETE:
DELETE FROM [xdb_processing_pools].[GenericProcessingPool]
  • Also do it in xdb_processing_pools.GenericProcessingPoolDefinitions:
DELETE FROM [xdb_processing_pools].[GenericProcessingPoolDefinitions]
  • From the table xdb_processing_pools.InteractionLiveProcessingPool, remove the maxed out attempts, in our case anything before Jan. 11, 2021:
DELETE FROM [xdb_processing_pools].[InteractionLiveProcessingPool]
WHERE Created < '2021-01-11'
  • Recheck the Processing Pool Attempt by selecting top 1000 rows in xdb_processing_pools.InteractionLiveProcessingPool
This Attempt count now is signaling that it is trying to be processed

12. Now Open the Experience Analytics!

A pleasant surprise! Finally, we can now see the Experience Analytics showing up some real tracked data:

  • Revisit the Pools Database:
Maximum usage fixed

The usage is now minimal, and it can cope with the load which makes it possible to get the queued interactions. Problem solved!

Leave a Reply