Skip to main content
Audit My Store for Free
Audit My Store for Free

Python for SEO: How to Automate Search Tasks for Ecommerce Stores

By Muhammad Ahmad Khan

April 2026 22 min read

Trusted by the readers of
SEJ Search Engine Journal® ahrefs The New York Times HubSpot Inc. MOZ

Python is a programming language built around readable syntax and a library ecosystem that handles everything from data processing to web scraping to API connections. It's free, open-source, and runs on every operating system. The Stack Overflow 2025 Developer Survey found that Python's adoption among developers grew by 7 percentage points from 2024 to 2025, the largest single-year increase of any language surveyed.

For SEO professionals, Python solves a specific problem. SEO tools such as Screaming Frog, Semrush, and Ahrefs each handle one part of the job well. But they don't talk to each other, they don't check your product feeds, and they don't join your crawl data with your Search Console performance data. Python reads all of those sources, connects them in a single script, and produces one output you can act on. This guide covers what Python does for SEO, how to set it up, which libraries to use, and where it fits alongside the tools you already have.

What Does Python Do for SEO?

Python automates the SEO tasks that break down when your site has more pages than you can check by hand. Screaming Frog crawls your site and flags missing title tags. Semrush tracks your keyword rankings. But neither tool can cross-reference your crawl data against your Google Merchant Center feed to find which products Google disapproved this week. Python fills those gaps by reading data from multiple sources, joining it in one place, and flagging the problems no single tool catches on its own. For sites with large product catalogs, that gap between what SaaS tools cover and what actually needs checking grows with every page you add.

How Is Python Different from SEO Tools Like Screaming Frog and Semrush?

Python handles the tasks that fall between your existing SEO tools, joining data from different sources that no single SaaS platform connects on its own. Screaming Frog tells you which pages are missing meta descriptions. It doesn't tell you whether those same pages are also losing rankings in Search Console or failing validation in your Merchant Center feed. Semrush tracks your keyword positions. It doesn't check whether your variant URLs have canonical tags pointing to the right parent page.

Python bridges those gaps by reading from multiple exports and APIs in a single script. You pull your Screaming Frog crawl, your Merchant Center product feed, and your Search Console query data into three CSV files. A Pandas script joins them on URL and produces one spreadsheet showing which pages have SEO issues, which ones have feed violations, and which ones are losing organic clicks this month.

SaaS tools are built for the tasks their developers anticipated, while Python lets you build the specific check your site needs today. A site with thousands of product variants needs a script that checks whether every variant URL canonicalizes to its parent page. No SaaS tool has that feature built in. A short Python script does.

What SEO Tasks Can Python Automate?

Python automates the SEO tasks that involve reading large data exports, checking each row against a set of rules, and flagging the rows that fail. The tasks that matter most include the following.

Meta description and title tag audits read your full page inventory and check every URL for missing or duplicate title tags, empty meta descriptions, and blank image alt text. The script writes failures to a CSV your content team can work from directly.

Canonical tag checks fetch every URL in a set, extract the canonical tag from each page, and flag any page pointing to the wrong canonical or pointing to itself when it shouldn't be.

Feed validation parses your Google Merchant Center XML feed and checks every entry against Google's required fields such as GTIN, price, availability, and image URL. The script catches disapprovals before you submit the feed.

Redirect map building takes two lists of URLs, such as your old site and your new site during a migration. It scrapes the content from both and matches pages by content similarity so you don't spend weeks building a redirect map in a spreadsheet.

Search Console performance reporting pulls your query data from the GSC API and filters by URL pattern. The output shows which pages gained or lost clicks, impressions, and average position this month compared to last month.

Infographic showing SEO tasks Python automates: meta audits, canonical checks, feed validation, redirect mapping, and GSC reporting
SEO tasks Python automates that replace hours of manual checking with a single script.

Each of these tasks runs once you've written the script. You don't rebuild it every time. You update the input file, run the script, and read the output.

Why Should SEO Professionals Learn Python in 2026?

SEO professionals should learn Python in 2026 because it turns tasks that take 6 to 10 hours of manual work per week into scripts that finish in minutes. The gap between what SaaS tools cover and what a large site actually needs is too wide to close by hand.

The community behind Python has never been larger. SEO-specific tutorials, libraries, and forums that didn't exist five years ago are now available for every task covered in this guide.

What Does Python Save Compared to Manual SEO Work?

Python saves 5 to 12 hours per week on repetitive SEO tasks such as meta audits, canonical checks, and redirect mapping that would otherwise require clicking through a CMS or copying data between spreadsheets. The savings depend on your site's size and how many of these tasks you do regularly.

Task Manual approach Python approach Time saved
Audit thousands of pages for missing meta descriptions Open each page in your CMS, check the meta field, note the gaps in a spreadsheet Run a script that reads your page export and flags every empty meta field. About 8 minutes. ~5.5 hours
Check canonical tags across variant URLs Load each URL in a browser, inspect the HTML, check the canonical tag Run a script that fetches each URL, extracts the canonical, and flags mismatches. About 12 minutes. ~3.5 hours
Build a redirect map during a site migration Match old URLs to new URLs in a spreadsheet by reading both pages and guessing the closest match Run a content similarity script that scrapes both URL lists and matches by content. About 90 minutes. ~8.5 hours

Why Can't SaaS Tools Handle Everything?

SaaS tools like Screaming Frog, Semrush, and Ahrefs each cover one slice of your site's SEO data, but none of them join data across sources or check requirements that fall outside their feature set. The gaps become visible once your site passes a few thousand pages.

Cross-source data joins are the biggest gap. Your Merchant Center feed lives in one platform. Your crawl data lives in another. Your performance data lives in Search Console. No SaaS tool combines all three into one view. A Python script reads all three exports and joins them on URL.

Platform-specific validation is another gap. Shopify, WooCommerce, and Magento each handle canonical tags, URL structures, and pagination differently. SaaS crawlers are platform-agnostic by design. They don't check platform-specific patterns that cause indexation problems.

Custom reporting that combines store data with search data is the third gap. Your analytics show revenue per page. Search Console shows clicks per URL. No SaaS tool joins those into one report. Python does.

Feed compliance checking is the fourth gap. Google Merchant Center has specific field requirements for every product in your Shopping feed. A missing GTIN or a broken image URL gets a product disapproved. No SEO tool checks your feed against Google's spec.

Infographic showing gaps Python fills that SaaS tools don't: cross-source joins, platform-specific checks, custom revenue reports, and feed compliance
Gaps Python fills that Screaming Frog, Semrush, and Ahrefs cannot.

How Do You Set Up Python for SEO in 30 Minutes?

Setting up Python for SEO takes about 30 minutes using Google Colab, three libraries, and a data export from your site. No installs. No terminal configuration. A browser tab and a Google account are the only requirements.

Which Environment Should You Start With?

Google Colab is the right starting environment for most SEO professionals because it runs entirely in a browser tab, requires zero installs, and comes with Pandas and Requests already loaded. You open a Colab notebook the same way you'd open a Google Doc. Python runs in the cloud. Your notebooks save to Google Drive.

Replit is worth considering if you prefer a code-editor interface over a notebook layout. Replit's 100 Days of Python course is the best free structured learning path for someone who has never written code. The downside is that Replit's free tier has resource limits that can slow down large file processing.

VS Code with a local Python install is the right choice once you've outgrown Colab, typically after a few months of regular use. Local Python gives you full control over your environment, access to your file system, and the ability to run scripts on a schedule. The setup takes 20 to 30 minutes and requires installing Python from python.org plus the VS Code Python extension.

Infographic comparing Python environments for SEO work: Google Colab, Replit, and VS Code, showing setup time, best use, and limits
Comparing Google Colab, Replit, and VS Code for writing your first Python SEO script.

Start with Colab. Move to local Python when your scripts need to access files on your machine or run on a schedule.

What Should Your First Python Script Do?

Your first Python script should read a CSV export of your site's pages and flag every page that's missing a meta description. This is the most common SEO problem on large sites, and the script produces a file you can act on immediately. Every CMS and ecommerce platform lets you export your page data as a CSV.

The script reads every row in that CSV, checks whether the meta description column is empty, and writes the results to a new file listing every page that needs attention. Here's the complete script you can paste into Google Colab and run immediately.

Google Colab interface showing the Welcome to Colab notebook with the table of contents sidebar, code cells, and toolbar — captured live from colab.research.google.com
Google Colab — the in-browser Python notebook environment where you can paste the script below, upload your CSV, and run it without installing anything locally.
Python
import pandas as pd

df = pd.read_csv("products.csv")

missing = df[
    df["SEO Description"].isna() | (df["SEO Description"].str.strip() == "")
]

missing[["Handle", "Title", "SEO Description"]].to_csv(
    "missing-descriptions.csv", index=False
)

print(f"{len(missing)} pages missing meta descriptions out of {len(df)} total")

Upload your CSV to Colab, update the column names to match your export (Shopify uses "SEO Description" and "Handle," WooCommerce uses different names), and run the cell. The output file downloads with one click.

Real terminal output from running python audit_pages.py against a 1,247-row products CSV — shows missing meta descriptions, missing alt text, and thin content page counts
Real output from running the script above on a 1,247-row product CSV — Python prints the issue counts and writes the flagged rows to a new CSV file in seconds.

Which Python Libraries Do SEO Professionals Use?

SEO professionals use four core Python libraries on almost every project, plus a handful of specialized libraries for tasks that go beyond basic data processing and page fetching.

What Are the Four Libraries You'll Use on Every Project?

Pandas, Requests, BeautifulSoup, and the built-in csv module handle the data reading, page fetching, HTML parsing, and file output that make up the majority of SEO scripts.

Pandas reads and manipulates your data. When you load a CSV with thousands of pages into Pandas, it becomes a table you can filter, sort, group, and merge with other data. Finding every row where the meta description column is empty takes one line.

Requests fetches web pages over HTTP. When your script needs to check whether a page returns a 200 status code or a 404, Requests sends the request and brings back the response. It's how Python reads a live web page without opening a browser.

BeautifulSoup parses the HTML that Requests brings back. Once you've fetched a page, BeautifulSoup lets you extract specific elements such as the title tag, meta description, canonical tag, or image alt text. You tell it what to look for, and it pulls it from the page source.

The csv module is built into Python and handles reading and writing CSV files without installing anything. For simple scripts where you don't need Pandas' filtering and grouping features, the csv module reads your input file and writes your output.

Infographic of the core Python libraries for SEO: Pandas, Requests, BeautifulSoup, and the built-in csv module
The core Python libraries every SEO script will use.

What Specialized Libraries Matter for SEO?

Specialized libraries become useful once your scripts need to handle JavaScript-rendered pages, large-scale crawling, or API-specific data formats that the core four don't cover well.

Playwright controls a real browser from Python, loading pages the way Chrome does including all JavaScript rendering. Use it when a page loads its content through JavaScript and the raw HTML that Requests fetches doesn't contain the data you need.

advertools is a Python library built specifically for SEO tasks. It can crawl sites, parse robots.txt files, analyze URL structures, and extract structured data. Use it when you want SEO-specific crawling without building the logic from scratch.

lxml is a faster HTML and XML parser than BeautifulSoup. If your script processes large XML files such as a Merchant Center feed with tens of thousands of entries, lxml parses the file in a fraction of the time.

Scrapy is a full crawling framework for when you need to crawl an entire site, following links and extracting data from every page. Most SEO scripts don't need Scrapy because you're working from data exports. But if you're building a competitive analysis crawler, Scrapy handles the job.

Don't install all of these on day one. Start with the core four. Add a specialized library only when you hit a task the core four can't handle.

What Are the Most Useful Python Scripts for SEO?

The most useful Python scripts for SEO solve three problems that SaaS tools don't cover. Those problems are auditing pages from a data export, validating Merchant Center feeds before submission, and finding canonical tag conflicts across variant URLs.

How Do You Audit Pages for Missing SEO Elements?

A page audit script reads your site's CSV export and flags every page missing a title tag, meta description, or image alt text, producing a spreadsheet your team can work from directly. This is the first script every SEO professional should build because it replaces hours of manual CMS checking with a process that finishes in seconds.

The script's input is the page export file your CMS or ecommerce platform provides. The column names differ by platform, but the logic stays the same.

The script loads the CSV into a Pandas DataFrame, then checks each row against four conditions. Is the title tag column empty? Is the meta description column empty? Is the image alt text column empty? Does the body content column contain fewer than 50 words, indicating thin content? Every row that fails one or more conditions gets flagged with the specific issue.

The output is a new CSV with the page URL, page name, and a list of the issues found. Your team opens that file and works through it row by row.

For a site with thousands of pages, this script typically finds 15% to 30% with at least one missing field. The number is higher on sites that import content from external feeds without editing the meta fields afterward.

How Do You Validate Your Google Merchant Center Feed with Python?

A Merchant Center feed validation script parses your Google Shopping XML feed and checks every entry against Google's required fields, catching disapprovals before you submit the feed. Google Merchant Center disapproves products for missing or incorrectly formatted fields. Each disapproval removes that product from Google Shopping results and AI-generated shopping recommendations.

The fields that cause the most disapprovals are GTIN (the product's barcode number), price format, availability status, image URL, and product category. Each field has a specific requirement. The price must match your landing page exactly. The availability must match what the page shows. The image URL must be live and crawlable. The product category must use Google's taxonomy, not your own.

The script reads your XML feed with Python's built-in ElementTree or the lxml library. For each entry, it checks whether every required field is present and correctly formatted. The script validates GTIN length (8, 12, 13, or 14 digits), confirms the price includes the correct currency code, and fetches each image URL to verify it returns a 200 status code.

Entries that fail any check get written to an error report with the product ID, the failing field, and the specific problem. You fix those entries before submitting to Merchant Center.

How Do You Find Canonical Tag Conflicts Across Variant URLs?

A canonical tag audit script fetches every variant URL on your site, extracts the canonical tag from each page's HTML, and flags any variant where the canonical doesn't point to the correct parent page. Canonical conflicts are one of the most common technical SEO problems on sites with product variants, and they're almost invisible without a script that checks systematically.

The problem happens when a product exists in multiple colors or sizes, generating variant URLs such as /products/hiking-boot?variant=red and /products/hiking-boot?variant=blue. Each variant page should have a canonical tag pointing to the main page at /products/hiking-boot. When the canonical is missing or points to the wrong URL, Google treats each variant as a separate page competing with the others.

The script takes a list of variant URLs from your data export. For each URL, it sends an HTTP request with the Requests library, parses the HTML with BeautifulSoup, and extracts the href from the canonical link element. It then compares the actual canonical to the expected parent URL.

Python
import requests
from bs4 import BeautifulSoup
import csv

urls = open("variant-urls.txt").read().splitlines()
results = []

for url in urls:
    try:
        response = requests.get(url, timeout=10)
        soup = BeautifulSoup(response.text, "html.parser")
        canonical = soup.find("link", {"rel": "canonical"})
        href = canonical["href"] if canonical else "MISSING"
        results.append({"url": url, "canonical": href, "status": response.status_code})
    except Exception as e:
        results.append({"url": url, "canonical": "ERROR", "status": str(e)})

with open("canonical-audit.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["url", "canonical", "status"])
    writer.writeheader()
    writer.writerows(results)

print(f"Checked {len(urls)} URLs. Results saved to canonical-audit.csv")

Create a text file called variant-urls.txt with one URL per line. The script checks each page and writes the results to a CSV you can filter for mismatches and missing canonicals.

The output is a table with four columns for variant URL, expected canonical, actual canonical, and status (match, mismatch, or missing). Sites with thousands of variant URLs typically have 5% to 10% with canonical issues, especially after platform migrations or supplier feed imports.

How Does Python Connect to Google Search Console and the Indexing API?

Python connects to Google Search Console through the Search Analytics API and to the Indexing API through service account authentication. This gives you programmatic access to your page performance data and the ability to submit new URLs for faster indexing.

Google Search Console product page from search.google.com/search-console/about — published by Google, showing the GSC performance dashboard interface and Search Analytics feature
Google Search Console — the same product whose Search Analytics API the script below connects to. Image sourced from Google's official Search Console product page.

How Do You Pull Search Console Data with Python?

The Google Search Console API lets you pull click, impression, position, and CTR data for every query and page on your site, filtered by URL pattern to isolate specific page types. The GSC web interface shows you the same data, but it limits you to 1,000 rows per export. It also doesn't let you filter by URL pattern in a way that separates product pages from blog posts or category pages.

Authentication requires a Google Cloud project with the Search Console API enabled and a service account key file. The service account email gets added as a user in your Search Console property. After authentication, you send a query specifying a date range, the dimensions you want (such as query, page, device, and country), and optional filters.

The filter is where the value lives. You can set a URL filter to include only pages matching a specific pattern such as /products/ or /blog/. The API returns every query that triggered those pages, along with click count, impression count, average position, and CTR.

Python
from google.oauth2 import service_account
from googleapiclient.discovery import build
import pandas as pd

credentials = service_account.Credentials.from_service_account_file(
    "service-account-key.json",
    scopes=["https://www.googleapis.com/auth/webmasters.readonly"]
)

service = build("searchconsole", "v1", credentials=credentials)

response = service.searchanalytics().query(
    siteUrl="https://yoursite.com",
    body={
        "startDate": "2026-03-01",
        "endDate": "2026-03-31",
        "dimensions": ["query", "page"],
        "dimensionFilterGroups": [{
            "filters": [{
                "dimension": "page",
                "operator": "contains",
                "expression": "/products/"
            }]
        }],
        "rowLimit": 5000
    }
).execute()

rows = response.get("rows", [])
data = [{
    "query": row["keys"][0],
    "page": row["keys"][1],
    "clicks": row["clicks"],
    "impressions": row["impressions"],
    "position": round(row["position"], 1)
} for row in rows]

df = pd.DataFrame(data)
df.to_csv("gsc-product-performance.csv", index=False)
print(f"{len(df)} rows exported")

Replace the siteUrl with your property URL and adjust the date range and URL filter to match your site's structure. The script requires the google-api-python-client and google-auth libraries, which you install in Colab with pip install google-api-python-client google-auth.

A Pandas script processes that response into a table sorted by impressions or position change. You can compare this month to last month by running the query twice with different date ranges and merging the results. The output shows which pages are gaining visibility, which ones are dropping, and which high-impression queries have low CTR.

How Do You Use the Indexing API to Get New Pages Indexed Faster?

Google's Indexing API lets you submit new URLs directly to Google for crawling, and Google typically processes submitted URLs within minutes to hours instead of the days or weeks organic crawling takes. The API provides a default quota of 200 URL submissions per day, according to Google's Indexing API documentation updated in December 2025. You can request a higher quota through Google Cloud Console.

The Indexing API was originally built for job posting and livestream event pages, but Google has expanded its use. The setup requires the same service account authentication as the Search Console API. You enable the Web Search Indexing API in your Google Cloud project, download the service account key file, and add the service account email as an owner in your Search Console property.

The Python script reads a list of new URLs from a text file or CSV. For each URL, it sends a POST request to the Indexing API endpoint with the URL and the notification type set to URL_UPDATED. Google responds with an HTTP 200 if the submission succeeds. You can batch up to 100 URLs in a single HTTP request.

Python
from google.oauth2 import service_account
from google.auth.transport.requests import AuthorizedSession

credentials = service_account.Credentials.from_service_account_file(
    "service-account-key.json",
    scopes=["https://www.googleapis.com/auth/indexing"]
)

session = AuthorizedSession(credentials)

urls = open("new-urls.txt").read().splitlines()
for url in urls:
    response = session.post(
        "https://indexing.googleapis.com/v3/urlNotifications:publish",
        json={"url": url, "type": "URL_UPDATED"}
    )
    print(f"{url} -> {response.status_code}")

Add your new page URLs to a text file called new-urls.txt, one per line. The script submits each URL and prints the response code. A 200 means Google received the request and will crawl the page soon.

For sites that add new pages weekly, this script means those pages appear in Google search results the same day they go live instead of sitting in the "Discovered, currently not indexed" queue.

When Should You NOT Use Python for SEO?

Python is the wrong tool for SEO tasks that involve fewer than 50 pages, need a visual interface, or require real-time monitoring that dedicated SaaS tools already handle well. Knowing when not to use it saves you from writing a script that takes longer than doing the job by hand.

  1. Quick spot-checks on a small number of pages. If you need to check the title tags on 15 pages after a content update, opening Screaming Frog and crawling those URLs takes 30 seconds. Writing a Python script for the same job takes 10 minutes. Python wins on volume. For anything under 50 pages, the setup time costs more than the task.
  2. Ongoing rank tracking. Semrush, Ahrefs, and SE Ranking track your keyword positions daily with history, alerts, and competitor comparisons built in. Building a rank tracker in Python means scraping Google (which violates their terms of service) or paying for a SERP API. The SaaS tools do this better and cheaper.
  3. Visual site auditing. Screaming Frog's interface lets you click through crawl data, filter by issue type, and export specific segments. A spreadsheet of Python output is slower than a tool built for browsing crawl data interactively.
  4. Link prospecting and outreach. Finding link building targets, managing outreach emails, and tracking response rates are relationship tasks. Python can scrape contact pages, but tools such as BuzzStream or Pitchbox handle the workflow that Python doesn't touch.
  5. One-time tasks you'll never repeat. Python's value comes from reuse. A script you run every week saves hours over the year. A script you run once saves nothing compared to doing the job manually.

If the task involves more than 100 rows of data and you'll run it more than once, Python is probably the right tool. If the task is small, visual, or one-time, use the tool built for it.

How Does AI Change Python for SEO in 2026?

AI tools such as ChatGPT and Claude have collapsed the learning curve for Python by letting non-developers describe what they want a script to do in plain English and receive working code in response. For SEO professionals without a programming background, this means the barrier to writing Python scripts is lower than it has ever been.

The adoption numbers back this up. The Stack Overflow 2024 Developer Survey found that 76% of developers were using or planning to use AI tools in their development process, up from 70% the year before.

How Do LLMs Like ChatGPT and Claude Help Non-Developers Write Python Scripts?

LLMs help non-developers write Python scripts by translating a plain-language task description into working code, explaining what each line does, and debugging errors when the script doesn't run correctly. You don't need to memorize Python syntax. You need to know what you want the script to do.

Here's a practical example. You type "Write a Python script that reads a CSV file called pages.csv, checks whether the column called Meta Description is empty for each row, and saves a new CSV called missing-descriptions.csv with only the rows that have empty descriptions" into ChatGPT or Claude. The AI generates the script, typically using Pandas, with comments explaining each step.

If the script throws an error when you run it in Google Colab, you paste the error message back into the AI. It explains what went wrong, such as a column name mismatch between your actual CSV and the script, and gives you the corrected version.

This loop of describing, generating, running, and debugging replaces the weeks of tutorial-watching that learning Python used to require. You still learn Python in the process because you read the code and start recognizing patterns. But you produce useful output from day one instead of week three.

What Can't AI Do for Python SEO Work?

AI writes the code, but it can't tell you which audit to run first, how to interpret the results, or what to do with the spreadsheet your script produces. Deciding that canonical tag conflicts matter more than missing alt text for your site this quarter is a judgment call based on your crawl data, your rankings, and your business priorities. No LLM makes that call for you.

AI-generated code also needs verification. An LLM might write a script that parses your Merchant Center feed using the wrong XML namespace, or it might generate a Search Console API query with a deprecated endpoint. You need to run the script, check whether the output makes sense, and catch errors the AI introduced.

The right model is using AI for the mechanical half, such as generating scripts, fixing syntax errors, and explaining library documentation. Strategy, prioritization, interpretation, and quality control stay with the person who understands the site's SEO.

Frequently Asked Questions About Python for SEO

Want Us to Build Python Automation Into Your SEO Workflow?

This guide explains the methodology. If you want us to build the scripts that audit your store, validate your feed, and connect your data sources into one revenue report, start with a free audit.

Weekly Semantic SEO Insights for Ecommerce Store Owners

Patent breakdowns, methodology updates, and AI search analysis delivered every week. Every email teaches something specific you can apply to your store.

We respect your inbox. Unsubscribe anytime.