How to Build an Amazon "Share of Search" Dashboard with Python and Streamlit

In e-commerce, appearing on page one is the difference between a thriving brand and a ghost town. Historically, brands focused on price tracking to stay competitive, but price is only half the battle. If your product is priced perfectly but appears at the bottom of the search results, no one will find it.

This is where Share of Search (SoS) comes in. Just as traditional retailers measure "Share of Shelf" in physical stores, digital brands use SoS to measure visibility on the "Digital Shelf." It represents the percentage of search results occupied by a specific brand for a given keyword.

This guide covers how to build a functional Share of Search dashboard using Python. This tool scrapes Amazon search results, differentiates between organic and sponsored listings, and visualizes which brands are winning the market using Streamlit. You can explore the Amazon Scrapers GitHub repository

Phase 1: Setup and Strategy

To build this dashboard, we need a stack that is fast and easy to visualize. We'll use httpx for making HTTP requests and selectolax for parsing HTML. selectolax is significantly faster than BeautifulSoup, which helps when processing multiple pages of results.

For the data engine, we'll use pandas, and for the frontend, we'll use streamlit.

Prerequisites

You should have Python 3.8+ installed. Install the necessary libraries using pip:

pip install httpx selectolax pandas streamlit plotly

Our strategy involves capturing four data points for every search result:

Rank Position: Where the product sits on the page (1, 2, 3...).
Product Title: To identify the item.
Brand Name: To categorize the owner of that slot.
Is Sponsored: To distinguish between earned organic rank and paid advertising.

Phase 2: Building the SERP Scraper

Amazon's Search Engine Results Page (SERP) is dynamic. To scrape it effectively, we must construct a URL that mimics a real user search. Amazon uses the k parameter for the search query and the page parameter for pagination.

The biggest hurdle is Amazon's anti-bot system. A bare request will likely return a 503 error, so we must include realistic headers.

import httpx

def fetch_amazon_page(keyword, page=1):
    url = f"https://www.amazon.com/s?k={keyword.replace(' ', '+')}&page={page}"

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://www.google.com/"
    }

    try:
        response = httpx.get(url, headers=headers, timeout=10)
        if response.status_code == 200:
            return response.text
        else:
            print(f"Failed to fetch: {response.status_code}")
            return None
    except Exception as e:
        print(f"Error occurred: {e}")
        return None

In a production environment, you should rotate these headers or use a proxy service to avoid IP bans. For this tutorial, we will focus on the dashboard logic.

Phase 3: Parsing the Digital Shelf

Once we have the HTML, we need to extract specific product details. Amazon marks search results with the attribute data-component-type="s-search-result".

One challenge is that Amazon doesn't always provide a dedicated brand tag in search results. A reliable workaround is "Brand Guessing" logic: if a dedicated brand field isn't found, we take the first word of the product title, as most Amazon listings lead with the brand name.

from selectolax.lexbor import LexborHTMLParser

def parse_search_results(html):
    parser = LexborHTMLParser(html)
    products = []

    # Find all search result containers
    results = parser.css('div[data-component-type="s-search-result"]')

    for rank, node in enumerate(results, start=1):
        # Extract Title
        title_node = node.css_first('h2 span')
        title = title_node.text().strip() if title_node else "Unknown"

        # Determine if Sponsored
        is_sponsored = "Sponsored" in node.text()

        # Extract Brand (Logic: First word of title if no specific tag)
        brand = title.split(' ')[0]

        # Price 
        price_node = node.css_first('.a-price-whole')
        price = price_node.text().strip() if price_node else "0"

        products.append({
            "rank": rank,
            "title": title,
            "brand": brand,
            "is_sponsored": is_sponsored,
            "price": float(price.replace(',', '')) if price != "0" else 0.0
        })

    return products

Phase 4: Analyzing the Data with Pandas

With our list of dictionaries, we can use pandas to calculate Share of Search metrics. We want to determine the percentage of the page each brand owns and their average organic rank.

import pandas as pd

def analyze_data(product_list):
    df = pd.DataFrame(product_list)

    # Calculate Share of Search (Total Visibility)
    sos = df['brand'].value_counts(normalize=True) * 100

    # Calculate Average Rank (Organic Only)
    avg_rank = df[df['is_sponsored'] == False].groupby('brand')['rank'].mean().sort_values()

    # Combine into a summary table
    summary = pd.DataFrame({
        'Share of Search (%)': sos,
        'Avg Organic Rank': avg_rank
    }).fillna(0)

    return df, summary

Using normalize=True in value_counts is a quick way to get the percentage distribution of brands across the results.

Phase 5: Visualizing the Dashboard

We'll use Streamlit to create an interactive UI. This allows users to enter a keyword and see the competitive landscape immediately.

import streamlit as st
import plotly.express as px

st.set_page_config(page_title="Amazon Share of Search", layout="wide")
st.title("🛒 Amazon Share of Search Dashboard")

keyword = st.text_input("Enter Search Keyword", "mechanical keyboard")

if st.button("Analyze Digital Shelf"):
    with st.spinner("Scraping Amazon..."):
        # Scrape the first 2 pages
        all_data = []
        for p in range(1, 3):
            html = fetch_amazon_page(keyword, p)
            if html:
                all_data.extend(parse_search_results(html))

        if all_data:
            df, summary = analyze_data(all_data)

            col1, col2 = st.columns(2)

            with col1:
                st.subheader("Visibility Share")
                fig_pie = px.pie(df, names='brand', title="Total Share of Results")
                st.plotly_chart(fig_pie)

            with col2:
                st.subheader("Organic Dominance")
                fig_bar = px.bar(summary.sort_values('Share of Search (%)', ascending=False).head(10), 
                                 y='Share of Search (%)', title="Top 10 Brands by Share")
                st.plotly_chart(fig_bar)

            st.subheader("Raw Competitive Data")
            st.dataframe(df)
        else:
            st.error("Could not fetch data. Amazon might be blocking the request.")

To run this, save the code to app.py and run streamlit run app.py in your terminal. You now have a live dashboard that converts raw HTML into market intelligence.

Interpreting the Results

When analyzing the dashboard, look for three key patterns:

The Ad-Heavy Challenger: A brand with a high total Share of Search but a poor Average Organic Rank. This brand is buying its way onto the shelf. If they stop their ad spend, they disappear.
The Organic Authority: A brand with a high Share of Search and a low Average Rank (e.g., 3.5). This brand dominates the SEO landscape and is likely the market leader.
The Fragmented Market: If no brand has more than 5–10% share, the keyword is highly competitive and open for disruption.

Next Steps

This dashboard serves as a foundation. To make it production-ready, consider these improvements:

Historical Tracking: Save results to a SQLite database daily to track how Share of Search trends over time.
Sentiment Analysis: Scrape review counts and ratings to see if high visibility correlates with customer satisfaction.
Proxy Integration: Use a proxy provider to ensure the scraper remains undetected during high-volume searches.

Monitoring Share of Search removes the guesswork from market positioning, allowing for data-driven decisions regarding advertising and SEO strategy.

How to Build an Amazon "Share of Search" Dashboard with Python and Streamlit

Phase 1: Setup and Strategy

Prerequisites

Phase 2: Building the SERP Scraper

Phase 3: Parsing the Digital Shelf

Phase 4: Analyzing the Data with Pandas

Phase 5: Visualizing the Dashboard

Interpreting the Results

Next Steps

Comments

More from this blog

From Generated Code to Production Pipeline: Hardening a Beautylish Scraper

Prompt-to-Schema: Ensuring Type-Safe JSON Extraction from Unstructured HTML

Handling E-Commerce A/B Testing: Resilient Selector Strategies for Zappos with Playwright

Hardening Your Costco Scraper: Detecting Soft Bans and Enforcing Data Quality with Pydantic

Enforcing Data Quality in Web Scrapers: Pydantic and the Dead Letter Queue

Command Palette

Phase 1: Setup and Strategy

Prerequisites

Phase 2: Building the SERP Scraper

Phase 3: Parsing the Digital Shelf

Phase 4: Analyzing the Data with Pandas

Phase 5: Visualizing the Dashboard

Interpreting the Results

Next Steps

Comments

More from this blog