Web Scraper Platform

Production-ready web scraping platform with PostgreSQL, WebUI, REST API, and price monitoring.

Features

Feature	Description
Multi-site scraping	Scrape any e-commerce site with CSS selectors
Auto-detect	Detect selectors automatically from a URL
Stealth mode	Bypass Akamai, Cloudflare, and PerimeterX bot protection
WebUI	Configure and monitor via web interface
REST API	Programmatic data access with API key authentication
Price alerts	Discord notifications for price drops
PostgreSQL	Production-grade database with connection pooling
Docker	Run everything with a single `docker compose up`
Proxy support	Optional SOCKS5/HTTP proxy

Quick Start

# 1. Create a minimal .env file
cat > .env <<'EOF'
DOCKER=/path/to/docker/data
DOMAIN=example.com
TZ=Europe/Stockholm
EOF

# 2. Create required directories
mkdir -p /path/to/docker/data/scraper/{postgres,logs,playwright-cache,credentials}

# 3. (Optional) Add Discord webhook for price alerts
echo "https://discord.com/api/webhooks/..." \
  > /path/to/docker/data/scraper/credentials/discord_webhook

# 4. Start
docker compose up -d

# 5. Check logs for generated credentials (first start only)
docker compose logs postgres   # → database password
docker compose logs scraper    # → API key

# 6. Open WebUI
# http://localhost:3000

Services (Docker)

Container	Port	Description
`postgres`	5432 (internal)	PostgreSQL database
`scraper`	3000	Web UI, API, scraper engine, alerts

Credentials

All credentials are auto-generated on first startup and stored in DOCKER/scraper/credentials/:

File	Generated by	Description
`db_password`	postgres container	Database password (logged once on first start)
`api_key`	scraper container	API key for REST access (logged once on first start)
`discord_webhook`	you	Webhook URL from Discord — create manually if you want alerts

Credentials can be changed at any time under Configuration → Advanced settings → Database credentials in the WebUI.

# Retrieve the API key after first start
cat /path/to/docker/data/scraper/credentials/api_key

API Examples

All endpoints except /health and /docs require an X-API-Key header.

# Get all products
curl -H "X-API-Key: ${API_KEY}" http://localhost:8000/products

# Search products
curl -H "X-API-Key: ${API_KEY}" "http://localhost:8000/products?search=RTX"

# Get price drops
curl -H "X-API-Key: ${API_KEY}" "http://localhost:8000/deals?min_drop_percent=10"

# Export to CSV
curl -H "X-API-Key: ${API_KEY}" http://localhost:8000/export/csv > products.csv

API Documentation: http://localhost:8000/docs

Configuration (.env)

Only three variables are required:

DOCKER=/path/to/docker/data   # where volumes are stored
DOMAIN=example.com             # used for reverse proxy labels
TZ=Europe/Stockholm            # timezone

All other settings (scrape interval, alert thresholds, proxy, stealth, etc.) are configured in the WebUI under Advanced settings and stored in the database.

Optional: Scheduled Database Backups

Add this service to your docker-compose.yml for automatic daily pg_dump backups (kept 7 days):

  pgdump:
    image: postgres:16-alpine
    container_name: scraper_pgdump
    restart: unless-stopped
    entrypoint: ["/bin/sh", "-c"]
    command: |
      while true; do
        PGPASSWORD=$(cat /run/secrets/scraper_password)
        pg_dump -h postgres -U scraper scraper -Fc \
          -f "/backup/scraper_$(date +%Y%m%d_%H%M).dump" \
          && find /backup -name '*.dump' -mtime +7 -delete \
          && echo "[$(date '+%T')] pg_dump ok" \
          || echo "[$(date '+%T')] pg_dump failed" && sleep 3600 && continue
        sleep 86400
      done
    secrets:
      - scraper_password
    volumes:
      - ${DOCKER}/scraper/backup:/backup
    depends_on:
      postgres:
        condition: service_healthy
    logging:
      driver: json-file
      options:
        max-size: "5m"
        max-file: "2"

Troubleshooting

Postgres won't start

sudo chown -R 999:999 ${DOCKER}/scraper/postgres

API returns 401 Unauthorized

curl -H "X-API-Key: ${API_KEY}" http://localhost:8000/products

No products are scraped

# Test selectors via WebUI (Detect button)
docker compose logs scraper --tail 50

Database Schema

products (
  id SERIAL PRIMARY KEY,
  url TEXT UNIQUE,
  title TEXT,
  current_price INTEGER,
  first_seen TIMESTAMP,
  last_updated TIMESTAMP,
  site_config_id INTEGER
)

price_history (
  id SERIAL PRIMARY KEY,
  product_id INTEGER REFERENCES products(id),
  price INTEGER,
  timestamp TIMESTAMP
)

scraper_config (
  id SERIAL PRIMARY KEY,
  name TEXT UNIQUE,
  base_url TEXT,
  product_selector TEXT,
  title_selector TEXT,
  price_selector TEXT,
  link_selector TEXT,
  enabled INTEGER DEFAULT 1,
  use_stealth INTEGER DEFAULT 0,
  max_pages INTEGER DEFAULT 10,
  min_price INTEGER DEFAULT 0,
  max_price INTEGER DEFAULT 999999
)

License

MIT - see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
.github		.github
alerts		alerts
api		api
docs		docs
scraper		scraper
webui		webui
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
release-please-config.json		release-please-config.json
requirements.txt		requirements.txt
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper Platform

Features

Quick Start

Services (Docker)

Credentials

API Examples

Configuration (.env)

Optional: Scheduled Database Backups

Troubleshooting

Postgres won't start

API returns 401 Unauthorized

No products are scraped

Database Schema

License

About

Uh oh!

Releases 36

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Web Scraper Platform

Features

Quick Start

Services (Docker)

Credentials

API Examples

Configuration (.env)

Optional: Scheduled Database Backups

Troubleshooting

Postgres won't start

API returns 401 Unauthorized

No products are scraped

Database Schema

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 36

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages