Heritrix



(返回www.opendocs.net)

介绍

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

文档

• An Introduction To Heritrix

链接

• http://crawler.archive.org/
• http://archive-access.sourceforge.net/
• http://en.wikipedia.org/wiki/Heritrix/
• http://download.www.opendocs.net/heritrix/