Google Analytics sampling can ruin your data and insights. In this post you will learn how sampling works and how you can deal with it in the best way.
Understanding at least the basics of sampling is crucial for everybody who is actively involved with Google Analytics.
In general, sampling is not a big issue on smaller websites with less than 50K sessions each month. However, it all depends on the time period that you select.
You have to be more careful when analyzing your data, if you have millions of people visiting your website each month.
Chances are big that you are running into sampling issues.
The first part of this article deals with sampling in Google Analytics and how it works.
In the second part I discuss eight strategies to solve or at least limit the impact of sampling on your data.
How Google Analytics Sampling Works
Each Google Analytics account consists of one or more properties and one or more views.
Unfiltered data is stored in a property based on a unique property number.
And then in each reporting view a set of pre-aggregated and unsampled data tables is stored (and processed on a daily basis).
All the basic or standard reports that you see in Google Analytics can be displayed rather quickly because of this pre-aggregated data.
However, sometimes you need to run ad-hoc queries to retrieve custom data.
Examples of ad-hoc queries include:
- Applying segments to default reports.
- Applying secondary dimensions to default reports.
- Creating custom reports.
Let’s take a look at the report below (segment applied on organic traffic):
Less than 20% of the overall sessions are used to generate this report.
So make sure to watch for any message containing This report is based on N sessions. That’s a signal that the resulting report is sampled.
The sampling level or threshold in Google Analytics for ad-hoc reports is 500k sessions (and 25M for Premium).
Is Sampling Really an Issue?
It depends. Reports that are created around 90+% of your data are quite reliable in my experience.
However, if the percentage of sessions involved drops far below 75%, the risk of taking big decisions on inaccurate data is a lot higher.
To make it simple (it’s not completely true statistically):
- 25% of sessions in sample -> 25% confidence in data being correct.
- 50% of sessions in sample -> 50% confidence in data being correct.
- 75% of sessions in sample -> 75% confidence in data being correct.
Just do the maths and ask yourself how high this percentage needs to be to feel confortable with making decisions.
And it probably differs according to the situation.
Would you need a higher percentage if it’s a millions decision compared to just making a trend graph for your colleague? I think so!
Read this support article by Google if you want to know all the details around how sampling works.
Eight Solutions for Google Analytics Sampling
There are many different ways to approach and deal with sampling.
In the following chapters I describe eight methods you could use.
Some are easy to implement and others require some extra coding.
1. Adjust Your Data Range
Let’s assume you are looking at a one year period and notice a 20% sampled report.
What you could do is simply reduce the time span to a two months period to get rid of sampling issues.
And then use the Google Analytics API to automate the data export for two months period data ranges. Or aggregate it in a spreadsheet by yourself.
Remember that to avoid sampling you need to have less than 500k sessions within the associated property.
2. Use Standard Reports
Google Analytics counts dozens of useful standard reports. These reports can show millions of sessions without sampling problems.
They are pre-aggregated on a daily basis and never sampled.
I don’t say that limiting yourself to only using standard reports is the perfect solution, but it works!
3. Create New Views with Filters
One of the reasons that I create many different views per property, is to overcome Google Analytics sampling issues.
Let’s asumme you are a big spender in paid search.
Instead of applying a paid search segment to the overall data, you could create a separate view with a filter on paid search.
- The segmented standard report shows sampled data.
- The standard report in a new view with filter on paid search would not have any sampling issues.
I recommend to consider creating new views for traffic segments that are very useful to analyze and optimize on a regular basis.
Read this ultimate guide to Google Analytics filters to further educate yourself on this topic.
4. Reduce the Amount of Traffic per Property
The collection of data of multiple sites in one property might lead to sampling as well.
Here is an example:
- You are the owner of a large soccer brand with 20 different websites.
- Each website records 50k sessions in Google Analytics each month (in same property).
- This makes 1M sessions in total.
Instead of recording all sessions in one property, why not using separate properties for each of your websites?
Of course, it all depends on the structure and how they relate to each other.
However, if you can break it down, you won’t have any sampling issues for a 10 month time period. Not bad I think!
5. Sample Your Data by Modifying Tracking Code
Another option is to modify the GA tracking code.
By sampling the hits for your site or app, you will get reliable report results while staying within the hit limits for your account.
Data collection sampling occurs consistently across users. Therefore, once a user has been selected for data collection, all sessions for the user will send data to GA. This includes future sessions as well.
Here are four useful links to help you set up sample rates:
- Univeral Analytics – website tracking.
- Classic Analytics – website tracking.
- Android apps – mobile app tracking.
- iOS apps – mobile app tracking.
6. Use Google Analytics API
The Google Analytics API is a neat solution in dealing with sampling in Google Analytics.
At maximum you can run 50.000 requests per project per day. This can be increased.
Especially if you run a highly trafficked website, you might want to consider the API for sampling issues.
For example, you could run one or more calls on a daily basis on your desired set of metrics and dimensions. And further aggregate it in Google Sheets.
Take into account that user related metrics might be biased if you use this method.
Review this page for all the configuration and reporting API limits in Google Analytics.
I recommend to watch this video about “Best Practices with Google Analytics API” as well:
7. Use Google Analytics Premium or Adobe Analytics
Sampling in Google Analytics Premium occurs for date ranges that exceed more than 25M sessions (compared to 500k in the normal version).
For Google Analytics Premium, sampling occurs at the view level (instead of property level).
Keep in mind that if you are working on a smaller website, these solutions are probably not in scope.
Note: I am not affiliated with one of these tools.
8. Use Big Query
The last solution I wanted to touch upon is Big Query for GA.
LunaMetrics has written quite a few articles around Big Query and this one I recommend to read if you want to learn more.
If you have time, this is a useful 10-minutes video to watch as well:
Do you have to deal with sampling in Google Analytics? What methods do you use to keep your data as accurate as possible?One last thing... Make sure to get my extensive checklist for your Google Analytics setup. It contains 50+ crucial things to take into account when setting up Google Analytics.