Exploring Google Analytics Realtime Data with Python

Google Analytics can provide a lot of insight into traffic and about users visiting your website. A lot of this data is available in nice format in web console, but what if you wanted to build your own diagrams and visualizations, process the data further or just generally work with it programmatically? That's where Google Analytics API can help you, and in this article we will look at how you can use it to query and process realtime analytics data with Python.

Exploring The API

Before jumping into using some specific Google API, it might be a good idea to first play around with some of them. Using Google's API explorer, you can find out which API will be most useful for you, and it will also help you determine which API to enable in Google Cloud console.

We will start with Real Time Reporting API as we're interested in realtime analytics data, whose API explorer is available here. To find other interesting APIs, check out the reporting landing page, from where you can navigate to other APIs and their explorers.

For this specific API to work, we need to provide at least 2 values - ids and metrics. First of them is so-called table ID, which is the ID of your analytics profile. To find it, go to your analytics dashboard, click Admin in bottom left, then choose View Settings, where you will find the ID in View ID field. For this API you need to provide the ID formatted as ga:<TABLE_ID>.

The other value you will need is a metric. You can choose one from metrics columns here. For the realtime API, you will want either rt:activeUsers or rt:pageviews.

With those values set, we can click execute and explore the data. If the data looks good, and you determine that this is the API you need then it's time enable it and set up the project for it...

Setting Up

To be able to access the API, we will need to first create a project in Google Cloud. To do that, head over to Cloud Resource Manager and click on Create Project. Alternatively, you can do it also via CLI, with gcloud projects create $PROJECT_ID. After a few seconds you will see new project in the list.

Next, we need to enable the API for this project. You can find all the available APIs in API Library. The one we're interested in - Google Analytics Reporting API - can be found here.

API is now ready to be used, but we need credentials to access it. There are couple different types of credentials based on the type of application. Most of them are suited for application that require user consent, such as client-side or Android/iOS apps. The one that is for our use-case (querying data and processing locally) is using service accounts.

To create a service account, go to credentials page, click Create Credentials and choose Service Account. Give it some name and make note of service account ID (second field), we'll need it in a second. Click Create and Continue (no need to give service account accesses or permissions).

Next, on the Service Account page choose your newly created service account and go to Keys tab. Click Add Key and Create New Key. Choose JSON format and download it. Make sure to store it securely, as it can be used to access your project in Google Cloud account.

With that done, we now have project with API enabled and service account with credentials to access it programmatically. This service account however doesn't have access to your Google Analytics view, so it cannot query your data. To fix this, you need to add the previously mentioned service account ID (XXXX@some-project-name.iam.gserviceaccount.com) as user in Google Analytics with Read & Analyse access - a guide for adding users can be found here.

Finally, we need to install Python client libraries to use the APIs. We need 2 of them, one for authentication and one for the actual Google APIs:


pip install google-auth-oauthlib
pip install google-api-python-client

Basic Queries

With all that out of the way, let's write our first query:


import os
from googleapiclient.discovery import build
from google.oauth2 import service_account

KEY_PATH = os.getenv('SA_KEY_PATH', 'path-to-secrets.json')
TABLE_ID = os.getenv('TABLE_ID', '123456789')
credentials = service_account.Credentials.from_service_account_file(KEY_PATH)

scoped_credentials = credentials.with_scopes(['https://www.googleapis.com/auth/analytics.readonly'])

with build('analytics', 'v3', credentials=credentials) as service:
    realtime_data = service.data().realtime().get(
        ids=f'ga:{TABLE_ID}', metrics='rt:pageviews', dimensions='rt:pagePath').execute()

    print(realtime_data)

We begin by authenticating to the API using the JSON credentials for our service account (downloaded earlier) and limiting the scope of the credentials only to the read-only analytics API. After that we build a service which is used to query the API - the build function takes name of the API, it's version and previously created credentials object. If you want to access different API, then see this list for the available names and versions.

Finally, we can query the API - we set ids, metrics and optionally dimensions as we did with API explorer earlier. You might be wondering where did I find the methods of service object (.data().realtime().get(...)) - they're all documented here.

And when we run the code above, the print(...) will show us something like this (trimmed for readability):


{
  "query": {
    "ids": "ga:<TABLE_ID>",
    "dimensions": "rt:pagePath",
    "metrics": [
      "rt:pageviews"
    ]
  },
  "profileInfo": {
    "profileName": "All Web Site Data",
    ...
  },
  "totalsForAllResults": {
    "rt:pageviews": "23"
  },
  "rows": [
    ["/", "2"],
    ["/404", "1"],
    ["/blog/18", "1"],
    ["/blog/25", "3"],
    ["/blog/28", "2"],
    ["/blog/3", "3"],
    ["/blog/51", "2"],
    ...
  ]
}

That works, but considering that the result is dictionary, you will probably want to access individual fields of the result:


print(realtime_data["profileInfo"]["profileName"])
# All Web Site Data
print(realtime_data["query"]["metrics"])
# ['rt:pageviews']
print(realtime_data["query"]["dimensions"])
# rt:pagePath
print(realtime_data["totalResults"])
# 23

The previous example shows usage of the realtime() method of the API, but there are 2 more we can make use of. First of them is ga():


with build('analytics', 'v3', credentials=credentials) as service:
    ga_data = service.data().ga().get(
        ids=f'ga:{TABLE_ID}',
        metrics='ga:sessions', dimensions='ga:country',
        start_date='yesterday', end_date='today').execute()

    print(ga_data)
    # 'totalsForAllResults': {'ga:sessions': '878'}, 'rows': [['Angola', '1'], ['Argentina', '5']]

This method returns historical (non-realtime) data from Google Analytics and also has more arguments that can be used for specifying time range, sampling level, segments, etc. This API also has additional required fields - start_date and end_date.

You probably also noticed that the metrics and dimensions for this method are a bit different - that's because each API has its own set of metrics and dimensions. Those are always prefixed with the name of API - in this case ga:, instead of rt: earlier.

The third available method .mcf() is for Multi-Channel Funnels data, which is beyond scope of this article. If it sounds useful for you, check out the docs.

One last thing to mention when it comes to basic queries is pagination. If you build queries that return a lot of data, you might end up exhausting your query limits and quotas or have problems processing all the data at once. To avoid this you can use pagination:


with build('analytics', 'v3', credentials=credentials) as service:
    ga_data = service.data().ga().get(
        ids=f'ga:{TABLE_ID}',
        metrics='ga:sessions', dimensions='ga:country',
        start_index='1', max_results='2',
        start_date='yesterday', end_date='today').execute()

    print(f'Items per page  = {ga_data["itemsPerPage"]}')
    # Items per page  = 2
    print(f'Total results   = {ga_data["totalResults"]}')
    # Total results   = 73

    # These only have values if other result pages exist.
    if ga_data.get('previousLink'):
        print(f'Previous Link  = {ga_data["previousLink"]}')
    if ga_data.get('nextLink'):
        print(f'Next Link      = {ga_data["nextLink"]}')
        #       Next Link      = https://www.googleapis.com/analytics/v3/data/ga?ids=ga:<TABLE_ID>&dimensions=
        #                        ga:country&metrics=ga:sessions&start-date=yesterday&end-date=today&start-index=3&max-results=2

In the above snippet we added start_index='1' and max_results='2' to force pagination. This causes the previousLink and nextLink to get populated which can be used to request previous and next pages, respectively. This however doesn't work for realtime analytics using realtime() method, as it lacks the needed arguments.

Metrics and Dimensions

The API itself is pretty simple. The part that is very customizable is arguments such as metrics and dimensions. So, let's take a better look at all the arguments and their possible values to see how we can take full advantage of this API.

Starting with metrics - there are 3 most important values to choose from - rt:activeUsers, rt:pageviews and rt:screenViews:

  • rt:activeUsers gives you number of users currently browsing your website as well as their attributes
  • rt:pageviews tells you which pages are being viewed by users
  • rt:screenViews - same as page views, but only relevant within application, e.g. Android or iOS

For each metric a set of dimensions can be used to break down the data. There's way too many of them to list here, so let's instead see some combinations of metrics and dimensions that you can plug into above examples to get some interesting information about visitors of your website:

  • metrics='rt:activeUsers', dimensions='rt:userType' - Differentiate currently active users based on whether they're new or returning.
  • metrics='rt:pageviews', dimensions='rt:pagePath' - Current page views with breakdown by path.
  • metrics='rt:pageviews', dimensions='rt:medium,rt:trafficType' - Page views with breakdown by medium (e.g. email) and traffic type (e.g. organic).
  • metrics='rt:pageviews', dimensions='rt:browser,rt:operatingSystem' - Page views with breakdown by browser and operating system.
  • metrics='rt:pageviews', dimensions='rt:country,rt:city' - Page views with breakdown by country and city.

As you can see there's a lot of data that can be queried and because of the sheer amount it might be necessary to filter it. To filter the results, filters argument can be used. The syntax is quite flexible and supports arithmetic and logical operators as well as regex queries. Let's look at some examples:

  • rt:medium==ORGANIC - show only page visits from organic search
  • rt:pageviews>2 - show only results that have more than 2 page views
  • rt:country=~United.*,ga:country==Canada - show only visits from countries starting with "United" (UK, US) or Canada (, acts as OR operator, for AND use ;).

For complete documentation on filters see this page.

Finally, to make results a bit more readable or easier to process, you can also sort them using sort argument. For ascending sorting use you can use e.g. sort=rt:pagePath and for descending you will prepend -, e.g. sort=-rt:pageTitle.

Beyond Realtime API

If you can't find some data, or you're missing some features in Realtime Analytics API, then you can try exploring other Google Analytics APIs. One of them could be Reporting API v4, which has some improvements over older APIs.

It however, also has a little different approach to building queries, so let's look at an example to get you started:


with build('analyticsreporting', 'v4', credentials=credentials) as service:
    reports = service.reports().batchGet(body={
        "reportRequests": [
            {
                "viewId": f"ga:{TABLE_ID}",
                "dateRanges": [
                    {
                        "startDate": "yesterday",
                        "endDate": "today"
                    }],
                "dimensions": [
                    {
                        "name": "ga:browser"
                    }],
                "metrics": [
                    {
                        "expression": "ga:sessions"
                    }]
            }]
    }).execute()

    print(reports)

As you can see, this API doesn't provide large number of arguments that you can populate, instead it has single body argument, which takes request body with all the values that we've seen previously.

If you want to dive deeper into this one, then you should check out the samples in documentation, which give complete overview of its features.

Closing Thoughts

Even though this article shows only usage of analytics APIs, it should give you general idea for how to use all Google APIs with Python, as all the APIs in client library use same general design. Additionally, the authentication shown earlier can be applied to any API, all you need to change is the scope.

While this article used google-api-python-client library, Google also provides lightweight libraries for individual services and APIs at https://github.com/googleapis/google-cloud-python. At the time of writing the specific library for analytics is still in beta and lacks documentation, but when it becomes GA (or more stable), you should probably consider exploring it.

Subscribe: