Markdown / LLM-ready output
Turn any page into clean Markdown instead of a wall of HTML — the feature Firecrawl and Crawl4AI are known for, now in the Laravel world. The result drops straight into an LLM context window or a RAG index with no HTML noise.
There are two ways to use it, and they pair naturally: extract an article body as a Markdown field, then export the whole crawl as a Markdown document.
1 · The markdown field type
Render one element's content (an article body, a product description — any long-form block) as Markdown by setting the field's type to "markdown". The css selector locates the element; its content is converted:
{
"name": "description",
"css": ".product-description",
"type": "markdown"
}
What the converter preserves and strips:
| Preserved | Stripped |
|---|---|
Headings (#–######), paragraphs, <br> | <script>, <style>, <noscript>, <template> |
| Nested & ordered lists | Forms, buttons, inputs |
| Links and images | Media embeds (<iframe>, <video>, <svg>, …) |
Bold / italic / inline code | Site chrome: <nav>, <header>, <footer>, <aside> |
Fenced code blocks with language (```php) | |
Tables (with | escaping) and blockquotes |
Relative URLs are resolved
Links and images inside the converted content are resolved against the page they were scraped from — [Foo](/wiki/Foo) becomes [Foo](https://example.com/wiki/Foo) — so the Markdown stays valid outside its origin site.
Use an empty css ("") in detail_fields[] to convert the whole detail page context, or point it at the main content container for article-only output.
Note: regex is not applied to markdown fields — it would corrupt multi-line output.
2 · The markdown output format
Export a whole crawl as a single Markdown document, one section per item. Set the blueprint's output_config.format:
"output_config": {
"format": "markdown"
}
php artisan datahelm:scrap:run example --output=storage/app/scrapes/example.md
Each item becomes a section:
- Heading — the most title-like field (
title,name,heading,headline,label), falling back toItem N. - Body — the first long-form field (
markdown,content,body,description,text), rendered as-is. - Metadata — every remaining scalar field as a bullet list (
- **price:** 150000). - Items are separated by
---horizontal rules.
Example output for a quotes site:
## "The world as we have created it is a process of our thinking."
"The world as we have created it…" by Albert Einstein [(about)](https://quotes.toscrape.com/author/Albert-Einstein)
- **author:** Albert Einstein
- **tags:** change, deep-thoughts, thinking, world
---
## "It is our choices, Harry, that show what we truly are…"
…
Using the converter standalone
The engine behind both features, DataHelm\Crawler\Markdown\HtmlToMarkdown, is dependency-free (only ext-dom, bundled with PHP) and works on its own:
use DataHelm\Crawler\Markdown\HtmlToMarkdown;
// Convert an HTML fragment
$markdown = (new HtmlToMarkdown())->convert($html);
// Resolve relative links/images against the page URL
$markdown = (new HtmlToMarkdown())->convert($html, 'https://example.com/article');
// Keep nav/header/footer/aside instead of stripping them
$markdown = (new HtmlToMarkdown(stripChrome: false))->convert($html);
// Convert a live DOM node (e.g. matched by symfony/dom-crawler)
$markdown = (new HtmlToMarkdown())->convertElement($node, $pageUrl);

