Skip to content
On this page

Getting started

There are two ways to use DataHelm Crawler: install the package into an existing Laravel app, or run the full Docker stack from the reference environment. Pick whichever fits.

Install the package

bash
composer require datahelm/crawler

Publish the config (optional — only if you want to tweak detectors, transports, etc.):

bash
php artisan vendor:publish --tag=crawler-config

The package auto-registers DataHelm\Crawler\CrawlerServiceProvider. Requirements:

  • PHP ^8.3
  • Laravel ^11.0 || ^12.0 || ^13.0
  • guzzlehttp/guzzle ^7.0, symfony/dom-crawler ^7.0 || ^8.0

The default transport is plain HTTP (guzzle) and needs no extra infrastructure.

Quick start

bash
# auto-detect a listing and scaffold a site robot
php artisan datahelm:scrap:generate "https://example.com/listing" --get-detail=true --robot

# run the scaffolded robot, capped at 10 items
php artisan datahelm:robot:example --limit=10

A complete, public example you can run immediately:

bash
php artisan datahelm:scrap:generate https://books.toscrape.com/ --get-detail=true --save
php artisan datahelm:scrap:run books.toscrape.com --limit=20

Docker invocation

In the reference Docker stack, Artisan runs through an on-demand artisan service, so every command above is prefixed with docker compose run --rm:

bash
docker compose run --rm artisan datahelm:scrap:generate https://books.toscrape.com/ --get-detail=true --save

This documentation uses the bare php artisan … form; prefix it when running in Docker.

Run the full Docker stack

The reference environment bundles everything (nginx, PHP, PostgreSQL, Redis, Supervisor, browserless, FlareSolverr). Use the public datahelm/environment repository:

bash
git clone https://github.com/datahelm/environment.git
cd environment
cp .env.example .env
export UID=$(id -u) GID=$(id -g)   # containers own files as your user
docker compose up -d

Common stack commands:

bash
docker compose up -d              # start the stack
docker compose build              # rebuild images after Dockerfile changes
docker compose ps                 # status
docker compose down               # stop

# on-demand tools (profile: tools)
docker compose run --rm artisan migrate
docker compose run --rm artisan tinker
docker compose run --rm composer install
docker compose run --rm npm install

See the Docker stack reference for the full service/port table.

Optional: anti-bot services only

If you only need headless Chrome and Cloudflare solving (not the whole stack), start just those two services from the package:

bash
docker compose -f vendor/datahelm/crawler/docker/compose.services.yml up -d

Stop them when done — each runs a full Chromium and uses RAM/CPU:

bash
docker compose -f vendor/datahelm/crawler/docker/compose.services.yml stop

You only need these for the browser, flaresolverr or auto transports — see HTTP transports & bot protection.


Next: Core concepts →

Released under the MIT License.