Skip to main content

Command Palette

Search for a command to run...

Building a 'SaaS Competitor Radar': From Scraper to Streamlit Dashboard

Updated
5 min read
Building a 'SaaS Competitor Radar': From Scraper to Streamlit Dashboard
E
Technical Writing for Modern Development

In the fast-moving SaaS world, missing a competitor’s launch is a lost opportunity to adapt. While platforms like Product Hunt provide a firehose of new tools daily, manually checking them is a recipe for burnout. You need a way to filter the noise and track "Launch Velocity" automatically.

This guide covers how to build a SaaS Competitor Radar. This system scrapes Product Hunt, stores the results in a local database, and visualizes the findings in a Streamlit dashboard. By the end, you’ll have an automated pipeline that monitors market trends while you sleep.

Prerequisites

To follow along, you’ll need a basic grasp of Python and a terminal. We will use the following stack:

  • Python 3.8+

  • Playwright: For data extraction from dynamic pages.

  • Streamlit: To build the data visualization UI.

  • SQLite: For lightweight, zero-config data storage.

Set up your environment:

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install dependencies
pip install playwright streamlit pandas
playwright install chromium

Phase 1: Building the Scraper

Product Hunt is a dynamic web application. Unlike static sites, it loads content using JavaScript, so a simple requests call won't work. We'll use Playwright because it controls a real browser instance, allowing us to wait for elements to render and handle complex interactions.

We want to capture the product name, tagline, and current upvote count. Here is the scraper script:

from playwright.sync_api import sync_playwright
from datetime import datetime

def scrape_product_hunt():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        print("Navigating to Product Hunt...")
        page.goto("https://www.producthunt.com/topics/developer-tools")

        # Wait for the product cards to load
        page.wait_for_selector('[data-test="post-item"]')

        products = []
        items = page.locator('[data-test="post-item"]').all()

        for item in items[:15]:  # Grab the top 15
            try:
                name = item.locator('[data-test="post-name"]').inner_text()
                tagline = item.locator('[data-test="post-tagline"]').inner_text()

                # Upvotes are often formatted as '1.2k'; we need to clean this
                upvotes_raw = item.locator('[data-test="vote-button"]').inner_text()
                upvotes = parse_upvotes(upvotes_raw)

                products.append({
                    "name": name,
                    "tagline": tagline,
                    "upvotes": upvotes,
                    "scraped_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                })
            except Exception as e:
                print(f"Skipping an item due to error: {e}")

        browser.close()
        return products

def parse_upvotes(vote_str):
    vote_str = vote_str.lower().replace('▲', '').strip()
    if 'k' in vote_str:
        return int(float(vote_str.replace('k', '')) * 1000)
    return int(vote_str)

if __name__ == "__main__":
    data = scrape_product_hunt()
    print(f"Successfully scraped {len(data)} products.")

How the Scraper Works

We use page.locator() to find specific elements based on their data-test attributes. Product Hunt uses these for internal testing, making them more stable than randomized CSS classes. The parse_upvotes function converts strings like "▲ 1.1k" into the integer 1100 so we can perform calculations later.

Phase 2: The Data Layer

Scraping data is only half the battle. To track trends, we need to see how upvotes grow over time. We'll use SQLite because it’s a single file that lives in your project folder, requiring no server setup.

Create a file named database.py:

import sqlite3

def init_db():
    conn = sqlite3.connect('radar.db')
    c = conn.cursor()
    c.execute('''CREATE TABLE IF NOT EXISTS products
                 (id INTEGER PRIMARY KEY AUTOINCREMENT,
                  name TEXT,
                  tagline TEXT,
                  upvotes INTEGER,
                  scraped_at TIMESTAMP)''')
    conn.commit()
    conn.close()

def save_to_db(products):
    conn = sqlite3.connect('radar.db')
    c = conn.cursor()
    for p in products:
        c.execute('''INSERT INTO products (name, tagline, upvotes, scraped_at)
                     VALUES (?, ?, ?, ?)''', 
                  (p['name'], p['tagline'], p['upvotes'], p['scraped_at']))
    conn.commit()
    conn.close()

By saving a timestamp (scraped_at) with every entry, we can query the database to see which products are gaining traction the fastest. This approach maintains a historical record rather than just overwriting the current state.

Phase 3: The Streamlit Dashboard

Next, we'll use Streamlit to turn our SQLite data into an interactive dashboard. Streamlit is a Python framework that lets you build web apps without writing HTML or CSS.

Create app.py:

import streamlit as st
import pandas as pd
import sqlite3

st.set_page_config(page_title="SaaS Competitor Radar", layout="wide")

st.title("🚀 SaaS Competitor Radar")
st.write("Tracking the latest developer tools and their growth velocity.")

# Connect to DB and load data
conn = sqlite3.connect('radar.db')
df = pd.read_sql_query("SELECT * FROM products ORDER BY scraped_at DESC", conn)
conn.close()

if not df.empty:
    # Sidebar Filters
    selected_product = st.sidebar.selectbox("Select a Product to Track", df['name'].unique())

    # Main Metrics
    col1, col2 = st.columns(2)
    latest_data = df.groupby('name').first().reset_index()

    with col1:
        top_product = latest_data.iloc[0]
        st.metric(label="Current #1 Product", value=top_product['name'], delta=f"{top_product['upvotes']} upvotes")

    # Growth Chart
    st.subheader(f"Upvote Trend: {selected_product}")
    product_history = df[df['name'] == selected_product].sort_values('scraped_at')
    st.line_chart(data=product_history, x='scraped_at', y='upvotes')

    # Raw Data Table
    st.subheader("Latest Scraped Data")
    st.dataframe(latest_data[['name', 'tagline', 'upvotes', 'scraped_at']], use_container_width=True)
else:
    st.warning("No data found. Run the scraper first!")

To run the dashboard, use the command: streamlit run app.py. You’ll see a clean interface showing the top products and a line chart of their growth.

Phase 4: Automation

A radar isn't useful if you have to trigger it manually. You can use GitHub Actions to run the scraper every 24 hours for free.

Create a folder .github/workflows/ and add a file named daily_scrape.yml:

name: Daily Product Hunt Scrape

on:
  schedule:
    - cron: '0 0 * * *' # Runs at midnight every day
  workflow_dispatch: # Allows manual trigger

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: |
          pip install playwright pandas
          playwright install chromium
      - name: Run Scraper
        run: python main.py
      - name: Commit and Push changes
        run: |
          git config --global user.name 'RadarBot'
          git config --global user.email 'bot@radar.com'
          git add radar.db
          git commit -m "Update daily scrape data"
          git push

This workflow automates the entire lifecycle: it wakes up, installs the environment, scrapes the data, saves it to the SQLite file, and commits that file back to your repository.

To Wrap Up

You’ve just built a production-ready data pipeline. We moved from raw web data to a persistent database and finally to a visual dashboard, all while automating the repetitive parts.

Key Takeaways:

  • Playwright handles modern, JavaScript-heavy sites where standard libraries fail.

  • SQLite is the simplest way to add "memory" to your scripts without managing database infrastructure.

  • Streamlit turns data scripts into shareable tools instantly.

  • GitHub Actions transforms a local script into a reliable, automated service.

To take this further, try adding Slack notifications for when a product hits a certain upvote threshold or use a sentiment analysis library to monitor the comments section. The foundation is now in place; what you track next is up to you.

More from this blog

Dev Notes Hub

8 posts