Production-ready web scraping platform with PostgreSQL, WebUI, REST API, and price monitoring.
| Feature | Description |
|---|---|
| Multi-site scraping | Scrape any e-commerce site with CSS selectors |
| Auto-detect | Detect selectors automatically from a URL |
| Stealth mode | Bypass Akamai, Cloudflare, and PerimeterX bot protection |
| WebUI | Configure and monitor via web interface |
| REST API | Programmatic data access with API key authentication |
| Price alerts | Discord notifications for price drops |
| PostgreSQL | Production-grade database with connection pooling |
| Docker | Run everything with a single docker compose up |
| Proxy support | Optional SOCKS5/HTTP proxy |
# 1. Create a minimal .env file
cat > .env <<'EOF'
DOCKER=/path/to/docker/data
DOMAIN=example.com
TZ=Europe/Stockholm
EOF
# 2. Create required directories
mkdir -p /path/to/docker/data/scraper/{postgres,logs,playwright-cache,credentials}
# 3. (Optional) Add Discord webhook for price alerts
echo "https://discord.com/api/webhooks/..." \
> /path/to/docker/data/scraper/credentials/discord_webhook
# 4. Start
docker compose up -d
# 5. Check logs for generated credentials (first start only)
docker compose logs postgres # → database password
docker compose logs scraper # → API key
# 6. Open WebUI
# http://localhost:3000| Container | Port | Description |
|---|---|---|
postgres |
5432 (internal) | PostgreSQL database |
scraper |
3000 | Web UI, API, scraper engine, alerts |
All credentials are auto-generated on first startup and stored in DOCKER/scraper/credentials/:
| File | Generated by | Description |
|---|---|---|
db_password |
postgres container | Database password (logged once on first start) |
api_key |
scraper container | API key for REST access (logged once on first start) |
discord_webhook |
you | Webhook URL from Discord — create manually if you want alerts |
Credentials can be changed at any time under Configuration → Advanced settings → Database credentials in the WebUI.
# Retrieve the API key after first start
cat /path/to/docker/data/scraper/credentials/api_keyAll endpoints except /health and /docs require an X-API-Key header.
# Get all products
curl -H "X-API-Key: ${API_KEY}" http://localhost:8000/products
# Search products
curl -H "X-API-Key: ${API_KEY}" "http://localhost:8000/products?search=RTX"
# Get price drops
curl -H "X-API-Key: ${API_KEY}" "http://localhost:8000/deals?min_drop_percent=10"
# Export to CSV
curl -H "X-API-Key: ${API_KEY}" http://localhost:8000/export/csv > products.csvAPI Documentation: http://localhost:8000/docs
Only three variables are required:
DOCKER=/path/to/docker/data # where volumes are stored
DOMAIN=example.com # used for reverse proxy labels
TZ=Europe/Stockholm # timezoneAll other settings (scrape interval, alert thresholds, proxy, stealth, etc.) are configured in the WebUI under Advanced settings and stored in the database.
Add this service to your docker-compose.yml for automatic daily pg_dump backups (kept 7 days):
pgdump:
image: postgres:16-alpine
container_name: scraper_pgdump
restart: unless-stopped
entrypoint: ["/bin/sh", "-c"]
command: |
while true; do
PGPASSWORD=$(cat /run/secrets/scraper_password)
pg_dump -h postgres -U scraper scraper -Fc \
-f "/backup/scraper_$(date +%Y%m%d_%H%M).dump" \
&& find /backup -name '*.dump' -mtime +7 -delete \
&& echo "[$(date '+%T')] pg_dump ok" \
|| echo "[$(date '+%T')] pg_dump failed" && sleep 3600 && continue
sleep 86400
done
secrets:
- scraper_password
volumes:
- ${DOCKER}/scraper/backup:/backup
depends_on:
postgres:
condition: service_healthy
logging:
driver: json-file
options:
max-size: "5m"
max-file: "2"sudo chown -R 999:999 ${DOCKER}/scraper/postgrescurl -H "X-API-Key: ${API_KEY}" http://localhost:8000/products# Test selectors via WebUI (Detect button)
docker compose logs scraper --tail 50products (
id SERIAL PRIMARY KEY,
url TEXT UNIQUE,
title TEXT,
current_price INTEGER,
first_seen TIMESTAMP,
last_updated TIMESTAMP,
site_config_id INTEGER
)
price_history (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
price INTEGER,
timestamp TIMESTAMP
)
scraper_config (
id SERIAL PRIMARY KEY,
name TEXT UNIQUE,
base_url TEXT,
product_selector TEXT,
title_selector TEXT,
price_selector TEXT,
link_selector TEXT,
enabled INTEGER DEFAULT 1,
use_stealth INTEGER DEFAULT 0,
max_pages INTEGER DEFAULT 10,
min_price INTEGER DEFAULT 0,
max_price INTEGER DEFAULT 999999
)MIT - see LICENSE