Using Python and Data Science Practices in SEO Analysis of Data
At this year’s BrightonSEO in San Diego, I had the opportunity to share insights on integrating Python and data science practices into SEO analysis for faster, more scalable workflows. Traditional SEO relies on manual processes like keyword analysis, data exports, and spreadsheets, but with Python, these tasks can be automated, saving time and delivering deeper insights.
Slide Deck: Python and Data Science Practices in SEO Analysis of Data
Here is a copy of my deck as posted on SpeakerDeck.com and SlideShare.net.
Often times it is better to listen to how the slide is presented, than just reading the slide, so I also share a partial video recording during the BrightonSEO conference. But the slides in the video may be a bit far from the camera but hopefully having these slides handy can help you follow along.
Partial Video at the BrightonSEO Conference on Python SEO
Why Use Python in SEO?
The SEO landscape involves analyzing vast amounts of data from multiple tools:Crawling tools, keyword research platforms, web analytics, backlink analysis, and more.
Python empowers SEO professionals to:
- Handle large datasets that would overwhelm spreadsheets
- Automate repetitive tasks, such as analyzing and categorizing keywords
- Accelerate data gathering, analysis when merge data from different tool data sets
- Implementation of changes when tied to a CMS API or creating a custom extension
- Create Real-World Applications
Most Processes with Data Without Using Python Revolve Around a Spreadsheet
Data can come from a number of tools, and this data is often exported or connected to a tool via an API. The data can be merged between sources. And whatever analysis is done, is by using a variety of spreadsheet formulas, filters, and pivot tables, and lookup tables to combine with other tables to help analysts understand the data in a more comprehensible manner at a larger scale where it is hard to do manually.
However when the data becomes too large, spreadsheets have a hard time to process it all. This is where a Python script can help. Doing spreadsheet work too often for a repetitive task that is always the same, can be time consuming, while an automated Python script can help speed up the process.
Vector Embedding and the Analysis of Content
In most cases, spreadsheets are used for analysis of quantitative data. And in SEO, there are a lot of tools that give out quantitative data. But it is rarely used on content. If ever it is used on content, it is more for counting number of characters, number of words, count of specific use of words, or the keyword density. But now that there are vector embeddings, several SEO tools can be used to compare text from words, sentences, paragraphs to whole page content and get a quantitative score that shows how related a word or group of words are to each other, or to a page. This will be beneficial in more analysis and automation when using Python in SEO.
Converting Manual Steps into Automated Steps in a Python Script
If you are not using Python today, and do a variety of steps in a spreadsheet, merging data from a number of different SEO tools, then identify these steps, document them and let Python do the steps for you. Either you learn to do it in Python or use this documentation as a guide for the Python developer you are working with. This documentation will help create a clear set of rules of what to do with the data to make you and your developer get on the same page.
Example of Automating Analysis Rules
Although there are many examples I can show, with a limited time of 20-minutes per speaker, I only focused on one. Which was automating the keyword research process.The repetitive task done by some is the process of narrowing down a list of keywords spit out by keyword research tools and deciding which keywords are best to target.
Keyword Research with Python
When you have keyword research tools, industries that are more popular, or companies that are more popular, and competitors that are more popular will come up with a lot of keywords. Using tools like SEMRush and their Keyword Gap tool and entering competitor websites, this can give keywords from a few hundred to literally millions of keywords. The more they are, the more tedious they are to handle manually. Even simply opening a large spreadsheet might be impossible sometimes.
And here are some of the steps in narrowing down keywords to target that I tasked Python to work on:
- Selecting a good balance of long tail and head term keywords
- To do this, the script counts the words per phrase as scale, mark all phrases with 4 or more words as long tails. Then the script makes sure it does a good balance between long tail and head terms. Although the script can be configured to choose the ratio between long tails and head terms, and the limit to how many words is considered a long tail can also be configured.
- Selecting a good balance of long tail and head term keywords
- Selecting a good balance of search intent. Some transactional, some informational, some navigational.
- By default, the script tries to keep the ratio 1:1:1 for all three, but this can be adjusted too if needed.
- Selecting a good balance of search intent. Some transactional, some informational, some navigational.
- Balance of high KEI and high Search Volume
- Like everything in our Python script, this ratio can also be changed. But aside from giving that task, KEI needs to be computed. And the Python script will compute this for all keywords, even if it is in the millions.
- Balance of high KEI and high Search Volume
- Remove keyword where only 1 competitor website ranks for it
- The Python script will count and indicate how many competitor websites are ranking for a keyword. If only 1 competitor site does, most likely it is too specific to the competitor, it might be a branded term, and is not industry relevant. And they are removed from the list.
- Remove keyword where only 1 competitor website ranks for it
- A good balance of different topics
- Using the machine learning script K-Means Clustering, keywords are grouped into relevant clusters.Then selected target keywords come from each cluster. This ensures a balance of topics and not just having keywords all about the same thing.
- A good balance of different topics
By giving distinct steps to narrow down the keywords and having Python do the rest will just speed up the keyword selection process. And since everyone may have different processes in choosing keywords to target, then just think of how this can be turned into mechanical commands Python can do so you do not have to.
Implementation of Changes with Python on Your Website
Continuing with the keyword research example, if ever you decide to use Python to automate implementation of some action items, this is also possible. Similar to my keyword research example, different SEOs might do these tasks differently and you can customize it to your liking. In this example here were the few steps I involved Python in using the targeted keywords from keyword research and adding these words on selected pages within their Title Tag.
- Group together words that are related to each other, because most likely they will end up on the same page. With Python using the ChatGPT API, I can make it create a category and subcategory for each selected targeted keyword.
- With crawl data from Screaming Frog, using their vector embedding feature, which also uses the ChatGPT API, each category and subcategory can be assigned to the most related page.
- Python can then check if the keyword exists in the current title, and is not, simply use a prompt and the ChatGPT API again rewrite the title using the keyword with the highest search volume within the subcategory group, make it sound natural and give you own character length limitations.
At this point, I suggest doing a manual review first of the output if it looks good. And if not, modify your prompt until you are satisfied with the result. From there you can decide to manually implement this, or modify your code to connect to your CMS API to update the content automatically.
Learning Python for SEO
If you do not know the basics, there are many places to learn, from YouTube, Coursera, Udemy and asking ChatGPT a bunch of questions. But if you want to learn Python with some SEO application, so far these are the ones I found online that have SEO specific examples:
My Key Takeaways
- Efficiency
Turn manual SEO processes into Python scripts for ongoing automation. - Scalability
Python handles millions of data points, enabling enterprise-level optimization. - Practical Tools
Leverage APIs, vector embeddings, and CMS integrations for seamless updates.
Final Thoughts
Python and data science practices are transforming SEO analysis into a faster, more accurate, and scalable process. The future of SEO is automated, and with Python, you can stay ahead.
Thank You
Thanks to those that approached my after my talk and said how useful and actionable my talk was. And for those that publicly mentioned this as well.
Thanks Parth Suba for attending my session and finding it insightful. I also saw your reply to someone that asked about this and you mentioned you took a ton of notes!
Currently attending @BenjArriola’s talk on ‘Using python and data science practices in SEO analysis of data’ at #BrightonSEO San Diego
— Parth Suba (@parthsuba77) November 19, 2024
Super insightful session! pic.twitter.com/IKuLLau2Hi
Thanks for attending my session Kelsey Libert and thanks for helping spread the word. 😊
When should you use python? 1. Handling large volumes of data 2. Repetitive processes 3. Faster data gathering, analysis, & implementation – @BenjArriola #BrightonSEO
— Kelsey Libert (@KelseyLibert) November 19, 2024
Thanks Jeff DeBoer I also shared the deck above for your convenience.
Thanks for mentioning my session as one of the standouts Grace Frohlich. It is flattering to be mentioned together with other known experts in our industry.
Industry Friends at the BrightonSEO Event
It was nice seeing several of my industry friends. Some of you I just was not able to talk to even if I saw you but was busy engaged in other conversations with other SEO people. I kept my peer interaction at a minimum mainly because I felt like I had a fever at the event and had a pounding headache. But for everyone else that I did see and interact with, it was nice seeing you again. Special thanks to my colleagues at 85Sixty (Karen Massey, Edith Lopez-Ramirez, and Joe McGuff) who supported me and took the photos and videos during my talk. And special mention to Ahref’s Rob Delory for lending a tripod to help my team shoot the video. Thanks!
Written by Benj Arriola
Senior Director SEO @ 85SIXTY