Images

There are two distinct stages: getting image URLs into each item's JSON (controlled by blueprint flags during detection/extraction), and downloading + processing those images (done in your robot's PHP). They are deliberately separate.

Getting image URLs (`get-*` flags)

These flags decide which image URLs are written into each item's JSON. They do not download anything — the engine just gets the URLs and puts them in the item; the actual saving is your robot's job. Three independent switches:

Flag	Blueprint key	What lands in the item
`--get-primary-image=true`	`get_primary_image`	A single `primary_image` URL — the most relevant photo per item
`--get-all-images=true`	`get_all_images`	Every image URL in `all_images`, plus `primary_image` and `gallery_images` when detail is scraped
`--get-gallery-images=true`	`get_gallery_images`	The detail-page gallery as the `gallery_images` array (implies `--get-detail`)

Primary image

Whenever get_primary_image or get_all_images is on, every item gets a primary_image field — the single most relevant URL, chosen by:

The image field (the thumbnail from the list card) — preferred.
The first real photo in the gallery_images array (detail-page gallery) — fallback when the list shows no image.

Icon/badge URLs are down-scored, so a real photo wins over a small badge even when the badge appears first. primary_image always holds the URL; the stored path (after you download it) is whatever you choose to record in your callback.

Where images come from

By default one image is taken from the list row (a single image field, multiple: false). When a record has several photos, set scrape_detail: true and use a detail field with multiple: true (named images) — it returns an array of every matching image URL from the detail page. Set attribute to whichever holds the URL (src, data-src, …).

Saving images

The blueprint only gets image URLs into the JSON — it never writes files. Downloading happens inside the robot command, in a per-item CallbackSink callback. The generated robot already wires this up. Two protected properties at the top of the class set the target disk and folder — the only two lines you normally need to change:

php

class RobotExampleMarket extends Command
{
    use ScrapesToConsole;

    /** Any Laravel filesystem disk: 'storage' (local), 'public', 's3', 'gcs', … */
    protected string $imageDisk = 'storage';

    /** Subfolder inside the disk where images for this site will be stored. */
    protected string $imageFolder = 'scrapes/images/www.example-market.com';

    public function handle(CrawlEngine $engine, ImageStore $images): int
    {
        $blueprint = ScrapeBlueprint::fromJson(self::BLUEPRINT);
        $hashNames = $blueprint->hashNames;   // from blueprint JSON

        $sink = new CallbackSink(function (ScrapedItem $item) use ($images, $hashNames): void {
            $imageUrl = $item->get('primary_image') ?? $item->get('image');
            if (is_array($imageUrl)) {
                $imageUrl = $imageUrl[0] ?? null;
            }

            $imagePath = is_string($imageUrl) && $imageUrl !== ''
                ? $images->store($imageUrl, $this->imageDisk, $this->imageFolder, $hashNames)
                : null;

            // Uncomment to also store the full gallery from "gallery_images":
            // foreach ((array) $item->get('gallery_images') as $url) { ... }

            // ... build your record and persist it (Eloquent, queue, webhook, …)
        }, $this->imageFolder);

        $this->crawlToSink($engine, $blueprint, $sink, (int) $this->option('limit'));

        return self::SUCCESS;
    }
}

Cloud disks just need their Flysystem adapter installed and configured in config/filesystems.php.

`--hash-names`

When set, stored images are renamed to a unique content hash on download (hash_names: true in the blueprint). Prevents collisions and gives content-addressable filenames.

Image folder override

Override the default storage path in the blueprint:

json

"image_folder": "scrapes/images/exampleauctions/2026"

Default when null: scrapes/images/{host}/.

Image processing (resize / watermark / convert)

The crawler stores images as-is; processing is application logic, so it lives in your robot's PHP, not in the blueprint JSON. Every generated robot ships with a processImage() hook that runs after each image is downloaded — empty by default. Fill it in with Intervention Image (GD/Imagick, both in the Docker image):

bash

composer require intervention/image

php

use Intervention\Image\ImageManager;
use Intervention\Image\Drivers\Gd\Driver;
use Illuminate\Support\Facades\Storage;

protected function processImage(?string $path): void
{
    if ($path === null) {
        return;
    }

    $manager = new ImageManager(new Driver());
    $image   = $manager->read(Storage::disk($this->imageDisk)->path($path));

    $image->scaleDown(width: 800);                                       // resize
    // $image->place('storage/app/watermark.png', 'bottom-right', 10, 10); // watermark

    Storage::disk($this->imageDisk)->put($path, (string) $image->encodeByExtension());
}

The full Intervention API (crop, cover, blur, text, format conversion, …) is available here — far more than a fixed JSON schema could express. The hook is called automatically from the per-item callback in handle().

In API mode

Everything above works the same: the image / images fields just hold URLs pulled from the JSON by dot-path instead of from HTML. See API mode.

Continue to the Reference section.

Images #

Getting image URLs (get-* flags) #

Primary image #

Where images come from #

Saving images #

--hash-names #

Image folder override #

Image processing (resize / watermark / convert) #

In API mode #