How to Build an Amazon "Share of Search" Dashboard with Python and Streamlit

In e-commerce, appearing on page one is the difference between a thriving brand and a ghost town. Historically, brands focused on price tracking to stay competitive, but price is only half the battle. If your product is priced perfectly but appears at the bottom of the search results, no one will find it.
This is where Share of Search (SoS) comes in. Just as traditional retailers measure "Share of Shelf" in physical stores, digital brands use SoS to measure visibility on the "Digital Shelf." It represents the percentage of search results occupied by a specific brand for a given keyword.
This guide covers how to build a functional Share of Search dashboard using Python. This tool scrapes Amazon search results, differentiates between organic and sponsored listings, and visualizes which brands are winning the market using Streamlit. You can explore the Amazon Scrapers GitHub repository
Phase 1: Setup and Strategy
To build this dashboard, we need a stack that is fast and easy to visualize. We'll use httpx for making HTTP requests and selectolax for parsing HTML. selectolax is significantly faster than BeautifulSoup, which helps when processing multiple pages of results.
For the data engine, we'll use pandas, and for the frontend, we'll use streamlit.
Prerequisites
You should have Python 3.8+ installed. Install the necessary libraries using pip:
pip install httpx selectolax pandas streamlit plotly
Our strategy involves capturing four data points for every search result:
Rank Position: Where the product sits on the page (1, 2, 3...).
Product Title: To identify the item.
Brand Name: To categorize the owner of that slot.
Is Sponsored: To distinguish between earned organic rank and paid advertising.
Phase 2: Building the SERP Scraper
Amazon's Search Engine Results Page (SERP) is dynamic. To scrape it effectively, we must construct a URL that mimics a real user search. Amazon uses the k parameter for the search query and the page parameter for pagination.
The biggest hurdle is Amazon's anti-bot system. A bare request will likely return a 503 error, so we must include realistic headers.
import httpx
def fetch_amazon_page(keyword, page=1):
url = f"https://www.amazon.com/s?k={keyword.replace(' ', '+')}&page={page}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.google.com/"
}
try:
response = httpx.get(url, headers=headers, timeout=10)
if response.status_code == 200:
return response.text
else:
print(f"Failed to fetch: {response.status_code}")
return None
except Exception as e:
print(f"Error occurred: {e}")
return None
In a production environment, you should rotate these headers or use a proxy service to avoid IP bans. For this tutorial, we will focus on the dashboard logic.
Phase 3: Parsing the Digital Shelf
Once we have the HTML, we need to extract specific product details. Amazon marks search results with the attribute data-component-type="s-search-result".
One challenge is that Amazon doesn't always provide a dedicated brand tag in search results. A reliable workaround is "Brand Guessing" logic: if a dedicated brand field isn't found, we take the first word of the product title, as most Amazon listings lead with the brand name.
from selectolax.lexbor import LexborHTMLParser
def parse_search_results(html):
parser = LexborHTMLParser(html)
products = []
# Find all search result containers
results = parser.css('div[data-component-type="s-search-result"]')
for rank, node in enumerate(results, start=1):
# Extract Title
title_node = node.css_first('h2 span')
title = title_node.text().strip() if title_node else "Unknown"
# Determine if Sponsored
is_sponsored = "Sponsored" in node.text()
# Extract Brand (Logic: First word of title if no specific tag)
brand = title.split(' ')[0]
# Price
price_node = node.css_first('.a-price-whole')
price = price_node.text().strip() if price_node else "0"
products.append({
"rank": rank,
"title": title,
"brand": brand,
"is_sponsored": is_sponsored,
"price": float(price.replace(',', '')) if price != "0" else 0.0
})
return products
Phase 4: Analyzing the Data with Pandas
With our list of dictionaries, we can use pandas to calculate Share of Search metrics. We want to determine the percentage of the page each brand owns and their average organic rank.
import pandas as pd
def analyze_data(product_list):
df = pd.DataFrame(product_list)
# Calculate Share of Search (Total Visibility)
sos = df['brand'].value_counts(normalize=True) * 100
# Calculate Average Rank (Organic Only)
avg_rank = df[df['is_sponsored'] == False].groupby('brand')['rank'].mean().sort_values()
# Combine into a summary table
summary = pd.DataFrame({
'Share of Search (%)': sos,
'Avg Organic Rank': avg_rank
}).fillna(0)
return df, summary
Using normalize=True in value_counts is a quick way to get the percentage distribution of brands across the results.
Phase 5: Visualizing the Dashboard
We'll use Streamlit to create an interactive UI. This allows users to enter a keyword and see the competitive landscape immediately.
import streamlit as st
import plotly.express as px
st.set_page_config(page_title="Amazon Share of Search", layout="wide")
st.title("🛒 Amazon Share of Search Dashboard")
keyword = st.text_input("Enter Search Keyword", "mechanical keyboard")
if st.button("Analyze Digital Shelf"):
with st.spinner("Scraping Amazon..."):
# Scrape the first 2 pages
all_data = []
for p in range(1, 3):
html = fetch_amazon_page(keyword, p)
if html:
all_data.extend(parse_search_results(html))
if all_data:
df, summary = analyze_data(all_data)
col1, col2 = st.columns(2)
with col1:
st.subheader("Visibility Share")
fig_pie = px.pie(df, names='brand', title="Total Share of Results")
st.plotly_chart(fig_pie)
with col2:
st.subheader("Organic Dominance")
fig_bar = px.bar(summary.sort_values('Share of Search (%)', ascending=False).head(10),
y='Share of Search (%)', title="Top 10 Brands by Share")
st.plotly_chart(fig_bar)
st.subheader("Raw Competitive Data")
st.dataframe(df)
else:
st.error("Could not fetch data. Amazon might be blocking the request.")
To run this, save the code to app.py and run streamlit run app.py in your terminal. You now have a live dashboard that converts raw HTML into market intelligence.
Interpreting the Results
When analyzing the dashboard, look for three key patterns:
The Ad-Heavy Challenger: A brand with a high total Share of Search but a poor Average Organic Rank. This brand is buying its way onto the shelf. If they stop their ad spend, they disappear.
The Organic Authority: A brand with a high Share of Search and a low Average Rank (e.g., 3.5). This brand dominates the SEO landscape and is likely the market leader.
The Fragmented Market: If no brand has more than 5–10% share, the keyword is highly competitive and open for disruption.
Next Steps
This dashboard serves as a foundation. To make it production-ready, consider these improvements:
Historical Tracking: Save results to a SQLite database daily to track how Share of Search trends over time.
Sentiment Analysis: Scrape review counts and ratings to see if high visibility correlates with customer satisfaction.
Proxy Integration: Use a proxy provider to ensure the scraper remains undetected during high-volume searches.
Monitoring Share of Search removes the guesswork from market positioning, allowing for data-driven decisions regarding advertising and SEO strategy.




