- Compress prices-YYYY-MM-DD.ndjson for past UTC days after each successful scrape (atomic .gz.tmp rename, ~10x size reduction on real data). - Optional logging.file: tee INFO+ records to a low-noise log file via a small slog multi-handler so stdout can stay at DEBUG independently. - Bump default request_timeout 30s -> 60s after observing real API slowness. - Add unit test for CompressOlder covering atomicity, today-file skipping, and existing .gz preservation. - README with deploy, operations, and analysis snippets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| cmd/fuel-history | ||
| pkg | ||
| .dockerignore | ||
| .gitignore | ||
| config.example.yaml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| go.mod | ||
| go.sum | ||
| README.md | ||
fuel_history
A small Go service that periodically scrapes the GOV.UK Fuel Finder PFS fuel-prices API and writes the results as flattened NDJSON to a daily-rotated file for later analysis.
- One row per (station × fuel type) per scrape — denormalised for time-series analysis.
- Daily UTC file rotation (
prices-YYYY-MM-DD.ndjson); previous days are gzipped automatically (~10× compression) once the next day's scrape begins. - Sequential paginated fetch with OAuth2 client-credentials, in-memory token cache + 401/403 retry, configurable scrape interval.
- Distroless Docker image (~9 MB) and a ready-to-go
docker-compose.yml.
Prerequisites
- For deployment: Docker + Docker Compose v2.
- For local development: Go 1.24+.
- A client_id / client_secret pair from the Fuel Finder developer portal.
Configuration
Copy the example and fill in your credentials:
cp config.example.yaml config.yaml
$EDITOR config.yaml
config.yaml is gitignored. Field reference:
api:
base_url: https://www.fuel-finder.service.gov.uk # production server
client_id: YOUR_CLIENT_ID
client_secret: YOUR_CLIENT_SECRET
scrape:
interval: 30m # Go duration, e.g. 10m, 30m, 1h
request_timeout: 60s # per-HTTP-request timeout
storage:
dir: ./data # NDJSON output directory
logging:
file: ./data/fuel-history.log # optional; INFO+ tee'd here. Empty disables.
Deploying with Docker Compose
cp config.example.yaml config.yaml # then edit credentials
mkdir -p data
docker compose up -d --build
If id -u on your server is not 1001, edit the user: line in
docker-compose.yml accordingly so files in ./data are owned by your host
user. Also ensure the data/ directory itself is writable by that uid:
sudo chown 1001:1001 data (or whichever uid you set).
Operations
docker compose logs -f # stream container logs
docker compose ps # service status
docker compose run --rm fuel-history --once # one-shot scrape
docker compose down # stop and remove
Container stdout logs are capped at 5 × 10 MB by Docker's json-file driver
(see logging: block in docker-compose.yml). The application's own
fuel-history.log file is unbounded — it's low-noise (a few lines per
scrape) so a year of operation is a few MB.
Running locally (without Docker)
go build -o fuel-history ./cmd/fuel-history
./fuel-history --config config.yaml --once # one scrape, then exit
./fuel-history --config config.yaml --debug # continuous, verbose stdout
./fuel-history --help
CLI flags:
--config PATH— path to YAML config (defaultconfig.yaml).--once— perform a single scrape and exit (useful for cron / smoke tests).--debug— DEBUG-level logging on stdout. The log file always uses INFO+.
Output format
NDJSON, one JSON object per line:
{
"scrape_time": "2026-05-04T17:20:24.230Z",
"node_id": "0028acef…",
"trading_name": "Alex Fuel Station",
"public_phone_number": "+448003234040",
"fuel_type": "E10",
"price": 132.9,
"price_last_updated": "2026-02-17T16:03:04.938Z",
"price_change_effective_timestamp": "2026-02-17T16:00:00.000Z"
}
Files in data/:
prices-YYYY-MM-DD.ndjson— current day, append-only.prices-YYYY-MM-DD.ndjson.gz— previous days, gzip-compressed in place.fuel-history.log— application log (iflogging.fileis set).
Analysing the data
# Pretty-print one record
head -1 data/prices-2026-05-04.ndjson | jq .
# Count distinct stations and fuel types
jq -r .node_id data/prices-*.ndjson | sort -u | wc -l
jq -r .fuel_type data/prices-*.ndjson | sort -u
# Cheapest E10 right now (top 5)
jq -c 'select(.fuel_type=="E10")' data/prices-*.ndjson \
| jq -s 'sort_by(.price)[:5]'
# Read compressed and uncompressed together
zcat -f data/prices-*.ndjson*
# DuckDB SQL across all days at once (handles .gz automatically)
duckdb -c "SELECT fuel_type, COUNT(*), AVG(price)
FROM read_json_auto('data/prices-*.ndjson*', format='newline_delimited')
GROUP BY fuel_type"
Project layout
cmd/fuel-history/main.go flags, signal handling, slog wiring
pkg/config/ YAML loader + defaults + validation
pkg/fuelapi/ token cache + paginated fetch + 401-retry
pkg/store/ daily NDJSON sink + gzip-on-rollover
pkg/logging/ slog multi-handler (per-sink levels)
pkg/scraper/ ticker loop + flatten + compress orchestration
config.example.yaml
docker-compose.yml
Dockerfile
Tests
go test ./...
go vet ./...
Notes / limitations
- API quirks worth knowing about:
- Token endpoint takes a JSON body (
{client_id, client_secret}) — not standard OAuth2 form-encoded, despite what the spec implies. - The PFS endpoint returns a bare JSON array (the documented
{data: [...]}wrapper is absent). - End of pagination is signalled by HTTP 404 with a "Requested batch X is not available" body — not an empty array.
- Periodic 504 Gateway Timeout responses with a "Maintenance" HTML page are common; the scraper logs the error and waits for the next tick.
- Token endpoint takes a JSON body (
- No retention policy yet — old
.gzfiles accumulate forever. Add afind data -name 'prices-*.ndjson.gz' -mtime +90 -deletecron job if you want a sliding window. - Rate limit: 429 responses say "try again in 5 minutes". A scrape that hits one is aborted; the next ticker fire retries the whole thing. No sophisticated back-off yet.