Scaffold a robot
A robot is a self-contained Artisan command with the blueprint JSON embedded directly in the file — so it needs no external storage. It is also where per-item logic lives: image downloading, image processing, and persistence (Eloquent, queue, webhook).
Pass --robot to the generator to turn a detected blueprint into one:
php artisan datahelm:scrap:generate \
https://www.exampleauctions.com/real-estate/apartments --get-detail=true --robot --robot-name=ExampleAuctions
# -> creates app/Console/Commands/RobotsCommand/RobotExampleAuctions.php
php artisan datahelm:robot:exampleauctions --limit=20
The robot name defaults to the host (exampleauctions → RobotExampleauctions); pass --robot-name= for exact casing, and --force to overwrite an existing file. Edit the embedded BLUEPRINT JSON in the generated command to refine selectors.
Anatomy of a robot
The generated handle() method loads the embedded blueprint, builds a CallbackSink that runs once per scraped item, and streams items into it:
class RobotExampleMarket extends Command
{
use ScrapesToConsole;
/** Any Laravel filesystem disk: 'storage' (local), 'public', 's3', 'gcs', … */
protected string $imageDisk = 'storage';
/** Subfolder inside the disk where images for this site will be stored. */
protected string $imageFolder = 'scrapes/images/www.example-market.com';
public function handle(CrawlEngine $engine, ImageStore $images): int
{
$blueprint = ScrapeBlueprint::fromJson(self::BLUEPRINT);
$hashNames = $blueprint->hashNames; // from blueprint JSON
$sink = new CallbackSink(function (ScrapedItem $item) use ($images, $hashNames): void {
// "primary_image" is the URL the engine resolved (falls back to "image").
$imageUrl = $item->get('primary_image') ?? $item->get('image');
if (is_array($imageUrl)) {
$imageUrl = $imageUrl[0] ?? null;
}
// Download to $this->imageDisk / $this->imageFolder.
$imagePath = is_string($imageUrl) && $imageUrl !== ''
? $images->store($imageUrl, $this->imageDisk, $this->imageFolder, $hashNames)
: null;
// Optional: resize/watermark/convert via processImage().
// Uncomment to also store the full gallery from "gallery_images":
// foreach ((array) $item->get('gallery_images') as $url) { ... }
// ... build your record and persist it (Eloquent, queue, webhook, …)
}, $this->imageFolder);
$this->crawlToSink($engine, $blueprint, $sink, (int) $this->option('limit'));
return self::SUCCESS;
}
}
The two lines you usually change
| Property | Purpose |
|---|---|
$imageDisk | Any Laravel filesystem disk — 'storage', 'public', 's3', 'gcs', … |
$imageFolder | Subfolder inside that disk where this site's images go |
Cloud disks just need their Flysystem adapter installed and configured in config/filesystems.php. See RobotExampleMarket in the reference project for a complete worked example.
Per-item persistence
Inside the CallbackSink closure you have the full ScrapedItem. This is where you:
- download images (
$images->store(...)) — see Images; - run
processImage()to resize / watermark / convert; - build your record and persist it however you like — Eloquent model, dispatched job, webhook POST, etc.
Because it is plain PHP inside a Laravel command, anything your app can do, a robot can do per item.
Run options
Every generated robot supports the same run-time flags as datahelm:scrap:run:
php artisan datahelm:robot:exampleauctions --limit=20 # cap items
php artisan datahelm:robot:exampleauctions --output=storage/app/out.json # custom path
php artisan datahelm:robot:exampleauctions --output=- > out.json # stdout
Next: Selector shell →

