Skip to content
On this page

Publishing the package

The installable Laravel library lives in packages/datahelm/crawler/ (datahelm/crawler on Packagist). The reference repo is the development sandbox (Laravel app, Docker, site-specific robots). The public package repo (GitHub → Packagist) is a separate directory that receives only the files required for composer require datahelm/crawler.

Repository layout

DataHelm is split across three repositories:

RepositoryVisibilityContents
datahelm/crawlerPublic (Packagist)Laravel package — composer require datahelm/crawler
datahelm/environmentPublicFull Docker stack: nginx, PHP, PostgreSQL, Redis, browserless, FlareSolverr, Supervisor, …
DataHelmCrawlerPrivate / demoDevelopment sandbox: Laravel app, site-specific robots, examples
datahelm/crawler              Packagist / GitHub
├── README.md                 transport table, env vars, quick start
├── docker/
│   └── compose.services.yml  optional: browserless + FlareSolverr only
├── composer.json
├── config/
└── src/

datahelm/environment          separate Git repo
└── docker-compose.yml        full stack to run DataHelm locally

DataHelmCrawler (this repo)   private demo / dev sandbox
└── docker-compose.yml        same idea as datahelm/environment

What gets synced

scripts/sync-package.include is an rsync manifest. It copies only:

PathPurpose
composer.jsonPackage metadata and PSR-4 autoload
README.mdInstall guide, transport table, env vars
docker/compose.services.ymlOptional browserless + FlareSolverr only
config/Published config (crawler.php)
src/DataHelm\Crawler\ library code

Files that stay only in the publishable repo (not overwritten by sync): .git, LICENSE, phpunit.xml, .github/, etc.

Running the sync

From the root of the monorepo:

bash
cd /path/to/DataHelmCrawler

# Default destination: /home/murilo/Docker/DataHelm.dev
./scripts/sync-package.sh

Other options:

bash
# Custom destination
./scripts/sync-package.sh /path/to/DataHelm.dev

# Include unit tests (optional)
./scripts/sync-package.sh --with-tests

# Environment variable for destination
PACKAGE_DST=/home/murilo/Docker/DataHelm.dev ./scripts/sync-package.sh

# Help
./scripts/sync-package.sh --help

If you get Permission denied:

bash
chmod +x scripts/sync-package.sh
./scripts/sync-package.sh

First-time setup (publishable repo)

bash
mkdir -p /home/murilo/Docker/DataHelm.dev
cd /home/murilo/Docker/DataHelm.dev
git init
# Add once: LICENSE, .gitignore, phpunit.xml, .github/workflows/…
# (README.md and docker/compose.services.yml are synced automatically)

Then run ./scripts/sync-package.sh from the monorepo whenever you want to push an update.

After sync — commit and release

bash
cd /home/murilo/Docker/DataHelm.dev
git status
git add -A
git commit -m "Sync from monorepo"
git tag v1.0.0
git push && git push --tags

Packagist picks up new versions from Git tags. The sync script only copies files; it does not commit or push for you.

What package users need

composer require datahelm/crawler works without any Docker extras. The default transport is guzzle (plain HTTP). Optional infrastructure is only required for JS-heavy sites or bot protection:

NeedSolution
Plain HTML / public APIsNothing extra — CRAWLER_TRANSPORT=guzzle
JS / SPA renderingbrowser transport → browserless
Cloudflare challengesflaresolverr transport → FlareSolverr
Hardest WAFs (Akamai, PerimeterX)scraping_api + paid API key
Hands-off escalationCRAWLER_TRANSPORT=auto

Released under the MIT License.