Artisan commands
All commands use the configurable prefix CRAWLER_COMMAND_PREFIX (default datahelm). In the Docker stack, prefix each with docker compose run --rm.
| Command | Description |
|---|---|
datahelm:scrap:generate | Auto-detect a site and generate a scrape blueprint |
datahelm:scrap:run | Run a blueprint and export items (JSON / JSONL / CSV / Markdown) |
datahelm:scrap:shell | Interactive CSS/XPath selector shell against a live URL |
datahelm:scrap:validate | Validate a blueprint JSON file |
datahelm:robot:{name} | Run a site-specific robot (scaffolded with --robot) |
datahelm:scrap:generate
php artisan datahelm:scrap:generate <url> [options]
Fetches the URL and auto-detects the item list, pagination and field selectors, emitting a blueprint. See Generate a blueprint for the full option table. Highlights:
--get-detail=true— also detect detail-page fields.--save— save the blueprint under the host name (fordatahelm:scrap:run <host>).--json— print raw blueprint JSON to stdout instead of saving.--robot/--robot-name=/--force— scaffold a robot command.--transport=— pin the transport (also baked into the blueprint).--preset=— detection heuristics profile (generic,ecommerce,auctions,properties). See Presets.--output-format=— output format baked into the blueprint (json,jsonl,csv,markdown).--api-endpoint=/--api-method=/--api-items-path=— force API mode.--search-filters=— crawl multiple categories.--get-primary-image/--get-all-images/--get-gallery-images/--hash-names— image URL options.--header="K: V"/--cookie="a=1; b=2"— replay captured session values.
datahelm:scrap:run
php artisan datahelm:scrap:run <blueprint-path-or-host> [--limit=N] [--output=PATH]
Loads a blueprint (a file path, or a host saved with --save), crawls, and writes the items.
--limit=N— stop after N items.--output=PATH— write to a path;--output=-prints to stdout. Default:storage/app/scrapes/<name>.json.
See Run a scrape.
datahelm:scrap:shell
php artisan datahelm:scrap:shell <url> [--timeout=N] [--user-agent="..."]
An interactive REPL to test CSS / XPath selectors against a live page. See Selector shell for the command list.
datahelm:scrap:validate
php artisan datahelm:scrap:validate <blueprint.json>
Validates a blueprint JSON file — checks it parses and is structurally sound. Run it after hand-editing a blueprint.
datahelm:robot:{name}
php artisan datahelm:robot:exampleauctions [--limit=N] [--output=PATH]
Runs a site-specific robot scaffolded with --robot. The blueprint JSON is embedded in the command file, and the per-item callback handles image downloading and persistence. Supports the same --limit / --output flags as datahelm:scrap:run. See Robots.

