Google Analytics and Search Console API Query using Python
Previously, I explained how we could control Google Analytics and Google Search Console API using the R programming language. This time, I will follow the same process with Python as a follow-up to the article titled SiteMap Control with Python.
As stated in the article titled SiteMap Control with Python regarding the content analysis process, I reached the links in the sitemap and checked the status codes of these links.
In the context of this article, let's go to the next step and collect some basic information about the pages and, therefore, the content in line with the time range we want to perform analysis through Google Analytics and Google Search Console.
I previously gave some details about how we can do the same job with R programming language; you can take a look at my articles titled Google Analytics Reporting API Access with R, Google Analytics and Google Search Console Page Data and Compiling Page Data with R.
First, we can start with the installation of the relevant packages.
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib oauth2client
If you use Colab or Jupyter Notebook, you can put an exclamation point (!
) at the beginning of the above command to install packages within the notebook. Before I forget, you can use many commands together with !
. E.g; !ls -li
.
Now we can import the relevant packages.
from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
Google Analytics
We will make a request to Google Analytics API to get an idea of the current page details.
This request includes page path and page title dimensions that will provide a partnership with other tables (sitemap, etc.), bounce rate (bounce rate), average time on page (average time on page), average session.
New metrics and dimensions can be included in line with needs, and pre-processes can be performed on the data obtained with filter and segment options. A period of 1 year seems reasonable as a date range.
Our scope for requesting this data; https://www.googleapis.com/auth/analytics.readonly
.
build('analyticsreporting', 'v4', credentials=ServiceAccountCredentials.from_json_keyfile_name(getKeyFile, 'https://www.googleapis.com/auth/analytics.readonly')).reports().batchGet(body={
'reportRequests': [{
'viewId': '123456789',
'dateRanges': {
'startDate': '2020-05-30',
'endDate': '2021-05-30'
},
'metrics': [{
'expression': 'ga:bounceRate',
'alias': 'bounceRate'
}, {
'expression': 'ga:pageviews',
'alias': 'pageviews'
}, {
'expression': 'ga:avgTimeOnPage',
'alias': 'avgTimeOnPage'
}, {
'expression': 'ga:avgSessionDuration',
'alias': 'avgSessionDuration'
}, {
'expression': 'ga:avgPageLoadTime',
'alias': 'avgPageLoadTime'
}],
'dimensions': [{
'name': 'ga:pagePath'
}, {
'name': 'ga:pageTitle'
}],
'orderBys': [{
'fieldName': 'ga:pageviews',
'orderType': 'VALUE',
'sortOrder': 'DESCENDING'
}],
'samplingLevel': 'LARGE',
'pageSize': 10000
}]
}).execute()
Google Search Console
Google Search Console is of extra importance for us to understand queries and query-page relationships.
In the following step, we will also look at the on-page usage frequency of the GSC queries and related words with the page title that we have just mentioned under the Google Analytics title1.
In addition, thanks to GSC, we can also obtain important information such as which queries and how often we are listed and, accordingly, the click-through rate2.
Our scope of requests for relevant data; https://www.googleapis.com/auth/webmasters.readonly
build('webmasters', 'v3', credentials=ServiceAccountCredentials.from_json_keyfile_name(getKeyFile, 'https://www.googleapis.com/auth/webmasters.readonly').searchanalytics().query(siteUrl='https://google.com', body={
'startDate': '2020-05-30',
'endDate': '2021-05-30',
'dimensions': ['query', 'page'],
'rowLimit': 10000
}).execute()
You can create the getKeyFile
file mentioned in both code snippets by following the steps of Settings > IAM & Admin > Service Accounts > Keys > Add Key > JSON > Create for the project you created on Google Cloud Platform3.
You must define the e-mail address in the Email field (see identity@project-name.iam.gserviceaccount.com
) as a user for Google Analytics and Google Search Console accounts4.
I frequently work on the code that is why I did not include the code in the article so that there is no difference in current functions.
You can use the colab worksheet to implement your ideas and also access other files through the PageContentAnalysis repository.