Heritrix is an open-source, extensible, web-scale, archival-quality web crawler
100K+
Built from the Heritrix Maven release binaries using these build scripts. Please report issues or contributions to the Heritrix Github repository.
mkdir jobs
docker run --init --rm -d -p 8443:8443 -e "USERNAME=admin" -e "PASSWORD=admin" -v ./jobs:/opt/heritrix/jobs iipc/heritrix
See Running Heritrix under Docker in the Heritrix operating guide for more details. Please make sure to read the security considerations.
For the standard distribution:
For the contrib distribution which includes some extra contributed modules:
Content type
Image
Digest
sha256:853503395…
Size
239.3 MB
Last updated
about 1 month ago
docker pull iipc/heritrix