Tag Archives: Web Crawl

Open Source Crawl

Common Crawl is an open source project which makes available to the community crawl data based on Google’s MapReduce  algorithm. On their blog they have information concerning accessing that data including “Hello World” java example applications demonstrating how to use the technology. Check out their blog here.