Almost on a daily basis, I receive questions about GA4’s reporting identity. Read this article to learn about this feature and the potential impact on your business and data insights.
As with many other topics, you need to dive deep into the ‘reporting identity’ feature to learn exactly how it works and impacts your reports and data.
I will start with a basic introduction to the ‘reporting identity’ feature before discussing key concepts you need to understand when working with this feature in GA4.
Table of Contents
- What is the Reporting Identity in GA4
- Where to find the Reporting Identity
- Impact of Changing the Reporting Identity
- Exclude Google Signals Data
- Concluding Thoughts
Let’s dive right in and start with a basic introduction to GA4’s ‘reporting identity’ feature.
What is the Reporting Identity in GA4
The reporting identity in GA4 refers to how Google Analytics measures the behavior of users across multiple platforms and devices.
Here is an example of Jennifer’s purchase journey when buying a new pair of sunglasses.
- In the morning – at the office – she browses on Google Chrome (desktop) to stumble upon your sunglasses store online.
- Late afternoon she checks in again via her mobile phone (Safari) and makes a note on three potential sunglasses to buy.
- In the evening – after having her coffee – she takes her private tablet to eventually buy one pair of sunglasses.
Each of these three sessions is measured separately, but based on the reporting identity set and the GA4 implementation, they could potentially be all stitched together into a single cross-device user journey.
I say ‘could’ as there are many factors impacting how these measurements come through in GA4.
Identity Spaces
GA4 comes with four so-called identity spaces to measure users across multiple platforms and devices. And to (try to) unify sessions from the same person into a single user journey.
- User-ID
- Google Signals
- Device-ID
- Modelling
I will now explain how each of these methods work.
User-ID
The User-ID is a unique identifier that you assign to your users when they authenticate during their session (i.e. when logging in to your website or app). The great thing is that this User-ID remains the same regardless of what device or platform is used when accessing your website or app.
In general, the User-ID is the most accurate method to identify unique users, but it is only available for a (small) subset of users (after authentication). Also, keep in mind that implementing this User-ID feature requires a strong privacy check up.
“Note: You’re responsible for ensuring that your use of the user ID is in accordance with the Google Analytics Terms of Service. This includes avoiding the use of impermissible personally identifiable information, and providing appropriate notice of your use of identifiers in your Privacy Policy. Your user ID must not contain information that a third party could use to determine a user’s identity.”
Google Signals
Google signals is data collected from users who are signed in to Google. When this data is available, GA4 associates event data it collects from users with the Google accounts of signed-in users who have consented to sharing this information.
Also, for this feature I recommend reviewing your privacy policy and specific requirements in your country.
Device-ID
The Device-ID is a unique identifier for the device you are using to access a website or app.
This is how the Device-ID is visible across websites and apps:
- Websites pass the client ID to GA4.
- Apps pass the app-instance ID to GA4.
The Device-ID is less reliable than the User ID, as users can clear their cookies and/or use different devices. In that case the same person is visible in GA4 as multiple users (which is often the case).
Modelling
The last identity space is ‘Modelling’. This is a model where Google uses machine learning to predict the behavior of users who don’t accept the analytics cookies. It helps to fill in these gaps using the data of similar users who accept the tracking cookies from the same property.
You might see this in your GA4 property under reporting identity:
This is because ‘Modelling’ comes with a long list of requirements:
- Consent mode is enabled across all pages of your sites and/or all app screens of your apps.
- Consent mode for web pages must be implemented so that tags are loaded before the consent dialog appears, and Google tags load in all cases, not only if the user consents (advanced implementation).
- The property collects at least 1,000 events per day with analytics_storage=’denied’ for at least 7 days.
- The property has at least 1,000 daily users sending events with analytics_storage=’granted’ for at least 7 of the previous 28 days.
Also, whether or not to implement Google Consent mode, is a topic that would require an entire blogpost by itself.
Where to Find the Reporting Identity
The ‘Reporting Identity’ settings is visible under the property settings (below).
By default, when creating a new GA4 property, the reporting identity is set to ‘Blended’.
This means that – when appropriate – Google Analytics will evaluate all four identity spaces to associate events with users.
And this is how each of the methods work:
- Blended: Google Analytics uses the user ID if it is collected. If no user ID is collected, then Analytics uses information from Google signals if available. Analytics will use the device ID if both identity spaces don’t yield any results. If no identifier is available, Analytics uses behavioral modeling.
- Observed: Google Analytics uses the user ID if it is collected. If no user ID is collected, then Analytics uses information from Google signals if available. Analytics will use the device ID if both identity spaces don’t yield any results.
- Device-based: Google Analytics only uses the device ID and ignores all other identity spaces.
In the next section we will evaluate the pros and cons of changing the reporting identity.
Note: ‘Editor’ or ‘Administrator’ access to the GA4 property is required to make changes to the reporting identity setup.
Impact of Changing the Reporting Identity
It’s very important to note that changing the reporting identity doesn’t impact the underlying data in GA4.
But, it does retroactively impact the GA4 data that you see in your reports for all users that have access to the same GA4 property. Meaning you can experiment with the different reporting identities to see how it impacts the reports and data, but be aware of the impact on everyone who is accessing this property.
Especially, in larger organizations with many people having access to the same properties, this requires (in my opinion) a careful approach when making these type of changes to GA4 settings.
What Reporting Identity is Best?
In my experience, most companies need to be very careful taking these two steps:
- Enabling Google Signals.
- Primarily using the ‘Blended’ or ‘Observed’ reporting identity setting.
Google Signals
Whether or not to link Google Signals partly depends on the User-IDs in the platform and if you plan to leverage that feature.
- Go ahead with linking Google Signals if you don’t need the User-ID.
- Think twice before linking Google Signals if leveraging the insights from the User-ID within the GA4 UI are important to you.
Impact of linking Google Signals:
- To protect the privacy of Google’s proprietary data (device-graph), Google will threshold your data. The threshold means that if a report contains rows with a small number of users (less than +/- 40 per row), Google will ‘hide’ that row from your report.
Note: For your reports to include Google-signals data you need a monthly average of 500 users per day per property.
‘Blended’ or ‘Observed’
Unfortunately, once Google Signals is linked, it can cause permanent thresholding in your GA4 property using User-id identity methods (Blended or Observed).
Google Analytics applies ‘thresholding’ to your report if these three conditions are met:
- Google Signals is linked.
- The reporting identity is set to Blended or Observed.
- A report contains rows with small user or event numbers.
In short, I recommend using ‘device-based’ in most cases and only switching to ‘Blended’ or ‘Observed’ when appropriate.
Update October 2023: you now have an option to exclude Google Signals data from your reports and explorations.
How to Avoid Thresholding in GA4
I will create a more thorough blogpost on this topic in the future.
For now, these five methods will help you to remove or limit the thresholding in GA4:
- By default, use the device-based reporting identity.
- Reduce the number of dimension values with low user/event counts by increasing your date range.
- Turn off Google Signals or exclude Google Signals data in reporting identity.
- Use BigQuery (in addition to the GA4 UI).
- Set up two GA4 properties: one with Google Signals enabled and the other one without.
Exclude Google Signals Data
Early October 2023 Google launched a new feature.
“If you’ve activated Google signals for your property, you can now turn off Include Google Signals in Reporting Identity on the Data Collection page in Admin to omit specific demographics and interest data from reports—specifically, data from signed-in, consented users. It can help to reduce the likelihood of data thresholding if your property uses Blended or Observed.”
It sounds like a great feature, but in larger organizations – with many GA4 users – switching on and off Google Signals (if needed) might lead to more confusion.
It’s great to see Google listening to the community, but in my opinion there are still some pros and cons when looking at this feature.
You can see your data through different angles which is great, but everybody in the organization needs to be aware of what setting is implemented at a certain point in time.
Concluding Thoughts
Many companies are unaware of the actual impact of turning on Google Signals in relation to the reporting identity feature in GA4.
By now you should have a better understanding of the reporting identity and related concepts as Google Signals and data thresholding.
As mentioned, turning on Google Signals can have a permanent impact on your data.
As a ‘solution’:
- You can set up two distinct GA4 properties, one with Google Signals turned on and the other without Google Signals. Not ideal, but for some companies it is currently the best option available.
- Since early October 2023 you can go a different route and turn off Google Signals in the Reporting Identify (via data collection). A great new feature, but you need to understand the implication it has for all users of the same GA4 property if your setting is not permanently defined and you regularly change it over time.
Also, I recommend experimenting with the different reporting identity options available and analyzing the impact on the metrics and dimensions in your GA4 reports.
This is it from my side. As always, let me know your thoughts and any tips or questions you have in the comments below!
One last thing... Make sure to get my automated Google Analytics 4 Audit Tool. It contains 30 key health checks on the GA4 Setup.
Kaylee says
If the new “include Google signals in reporting identity” toggle is switched off, does this prevent the data collection as well? or is it similar to the reporting identity that it just shows the data differently in the UI? Trying to understand if this change is made in the UI will it also match the data in our warehouse or is data collection still including Google Signals (and therefore thresholding?).
Paul Koks says
Hi Kaylee, it doesn’t affect the data collection process, but only impacts the reports in the UI (you can switch it on and off again). Please note that early next year (Feb 12th) Google is making some changes. Google signals will be removed from the reporting identity on February 12, 2024.