PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts

Prompt2Tool
Prompt2Tool

Posted on

Python Script to Process Semrush Backlink CSV

Why You Need This Script
Semrush provides powerful backlink reports, but its CSV exports are often filled with duplicates and lengthy page-level URLs. For SEO work on prompt2tool.com, it is more useful to see a clean list of unique domains instead of dozens of repeated links. For example, the URL https://prompt2tool.com/tools/utilities/report-card-generator should be standardized into https://prompt2tool.com. This Python script automates that process and delivers a simple, sorted output.

Reading the Semrush Export
The script works directly with the CSV file exported from Semrush. It uses only two essential fields: Source URL and Page ascore. Each backlink row contains a page-level link and its score, which are enough for processing. By targeting just these values, the script avoids unnecessary complexity and handles large datasets efficiently.

Sorting by Page ascore
After reading the CSV, the script sorts backlinks by Page ascore in descending order. This ensures that the most authoritative pages are listed first, making high-value backlinks easy to identify. Sorting by numerical score gives a clear picture of which domains provide the strongest signals. This prioritization is particularly valuable when deciding where to focus outreach or link-building efforts.

Extracting and Standardizing Domains
Once sorted, the script extracts the root domain from each Source URL. Long addresses with paths or parameters are reduced to a clean root format, making the output consistent and easy to analyze. The normalization step ensures that every entry is represented as a standard HTTPS root domain, which is much more practical for SEO workflows.

Removing Duplicates
To avoid clutter, the script automatically removes duplicate domains. Backlink CSVs often contain multiple links from the same site, which can distort analysis if not filtered. Deduplication ensures that each root domain appears only once in the final output. This creates a concise dataset that highlights the true diversity of backlink sources.

Generating the Final TXT File
The cleaned and sorted results are saved into a TXT file. Each line contains one unique root domain in HTTPS format, ordered from the highest to lowest Page ascore. The output is lightweight, easy to review, and ready for SEO reporting or integration into other tools. This makes it much easier to turn raw Semrush data into actionable insights for prompt2tool.com.

check this to download the python script for semrush

Top comments (0)