Skip to content
On this page

Artisan commands

All commands use the configurable prefix CRAWLER_COMMAND_PREFIX (default datahelm). In the Docker stack, prefix each with docker compose run --rm.

CommandDescription
datahelm:scrap:generateAuto-detect a site and generate a scrape blueprint
datahelm:scrap:runRun a blueprint and export items (JSON / JSONL / CSV / Markdown)
datahelm:scrap:shellInteractive CSS/XPath selector shell against a live URL
datahelm:scrap:validateValidate a blueprint JSON file
datahelm:robot:{name}Run a site-specific robot (scaffolded with --robot)

datahelm:scrap:generate

bash
php artisan datahelm:scrap:generate <url> [options]

Fetches the URL and auto-detects the item list, pagination and field selectors, emitting a blueprint. See Generate a blueprint for the full option table. Highlights:

  • --get-detail=true — also detect detail-page fields.
  • --save — save the blueprint under the host name (for datahelm:scrap:run <host>).
  • --json — print raw blueprint JSON to stdout instead of saving.
  • --robot / --robot-name= / --force — scaffold a robot command.
  • --transport= — pin the transport (also baked into the blueprint).
  • --preset= — detection heuristics profile (generic, ecommerce, auctions, properties). See Presets.
  • --output-format= — output format baked into the blueprint (json, jsonl, csv, markdown).
  • --api-endpoint= / --api-method= / --api-items-path= — force API mode.
  • --search-filters= — crawl multiple categories.
  • --get-primary-image / --get-all-images / --get-gallery-images / --hash-namesimage URL options.
  • --header="K: V" / --cookie="a=1; b=2" — replay captured session values.

datahelm:scrap:run

bash
php artisan datahelm:scrap:run <blueprint-path-or-host> [--limit=N] [--output=PATH]

Loads a blueprint (a file path, or a host saved with --save), crawls, and writes the items.

  • --limit=N — stop after N items.
  • --output=PATH — write to a path; --output=- prints to stdout. Default: storage/app/scrapes/<name>.json.

See Run a scrape.

datahelm:scrap:shell

bash
php artisan datahelm:scrap:shell <url> [--timeout=N] [--user-agent="..."]

An interactive REPL to test CSS / XPath selectors against a live page. See Selector shell for the command list.

datahelm:scrap:validate

bash
php artisan datahelm:scrap:validate <blueprint.json>

Validates a blueprint JSON file — checks it parses and is structurally sound. Run it after hand-editing a blueprint.

datahelm:robot:{name}

bash
php artisan datahelm:robot:exampleauctions [--limit=N] [--output=PATH]

Runs a site-specific robot scaffolded with --robot. The blueprint JSON is embedded in the command file, and the per-item callback handles image downloading and persistence. Supports the same --limit / --output flags as datahelm:scrap:run. See Robots.

Released under the MIT License.