Crawling multiple categories in one robot
Point the positional URL at the site base and pass a JSON array of category pages with --search-filters. URLs are resolved against the base (relative or absolute), and each entry's tag (e.g. category) is stamped onto every item from that page. One robot crawls them all into a single output, with dedup, the item limit, and result_filters shared across all of them.
Field detection runs on the first filter (which must be a real listing).
php artisan datahelm:scrap:generate \
"https://www.example-fashion.com" \
--get-detail=true --get-primary-image=true --hash-names=true \
--robot-name=example-fashion \
--search-filters='[
{"url": "mens/knitwear-sweaters/", "category": "knitwear-sweaters-men"},
{"url": "womens/dresses/", "category": "dress-women"}
]'
WARNING
The --search-filters value must be a single-quoted JSON string on one logical argument. Use any tag key you like (category, type, …) — every key besides url is copied onto each item.
Each scraped item then carries the tag:
{ "title": "Grey Sweater", "price": "$ 149.90", "category": "knitwear-sweaters-men" }
{ "title": "Floral Dress", "price": "$ 199.90", "category": "dress-women" }
In the blueprint
The base stays in url and each filter keeps its relative suffix under url_sufix (resolved against url at crawl time):
"url": "https://www.example-fashion.com",
"search_filters": [
{ "url_sufix": "mens/knitwear-sweaters/", "category": "knitwear-sweaters-men" },
{ "url_sufix": "womens/dresses/", "category": "dress-women" }
]
On input you may use url_sufix or url for the path, and a bare string entry ("mens/knitwear-sweaters/") crawls that page without tagging. An absolute suffix (https://…) is used as-is.
API mode
API mode uses its own api.endpoint, so search_filters is for HTML crawls.
Per-filter limit
Add a limit to a filter to cap how many items that category contributes (0 / omitted = unlimited). This is the per-category quota the global --limit can't give you — --limit is a single total shared across all filters, so the first category would otherwise consume it entirely:
"search_filters": [
{ "url_sufix": "shop/wd/mens", "category": "mens", "limit": 40 },
{ "url_sufix": "shop/wd/womens", "category": "womens", "limit": 40 },
{ "url_sufix": "shop/wd/womens-bottoms-wide-leg-jeans-jeans", "category": "jeans", "limit": 40 }
]
→ up to 40 items from each category (120 total). limit is a control key, not an item tag. A global --limit still applies on top as an overall cap.
search_filters vs. result_filters
These are different tools:
| Chooses | ||
|---|---|---|
search_filters | which URLs to crawl | tags items, per-category limits |
result_filters | which items to keep | see Result filters |
result_filters then apply to the items from all of the search filters.
Next: Result filters →

