Python Web Crawler

Python Web Crawler

A Python Reimplementation of PHP Web Crawler. Cleaner code, more efficient and faster.

  • Language: Python
  • Released: Mar 28, 2011
    Last Update: Jun 25, 2011

For customized crawling and scraping services check out Crawley Cloud

Python Web Crawler is a program that searches for links on the web and save them in a MySql data base.


  • Multi-processed crawling to improve speed
  • MySql database to save the links
  • Easy to extend
  • Clean and readable Pythonic code
  • Url validator via regular expressions

Here's more information about it:

Here's the original PHP web crawler this is based on.


Getting Started

Tested on ubuntu 10.10


apt-get install python-MySQLdb 


To configure the crawler do edit the config.ini file. I.E:

host = localhost
user = root
pass = root
db = testDB

start_urls =,,
max_depth = 1
log = 1

The connection section indicates the common connection configuration to a Mysql DB.

The params section contain:

  • START_URLS: A list of urls (must be the complete url!. Don't forget to indicate http:// or https:// whichever is applicable) to start the crawl. The list must be separated by commas.

  • MAX_DEPTH: The depth to crawl. 0 only crawls the start urls. 1 crawls the start_urls and all the urls inside the given urls. 2 All the urls inside the urls given by previous and so on… Warning: A factor of 3 or greater can take for hours, days, month or years!

  • LOG: Indicates if the application shows the crawled urls in the console.


~$ python
You need to log-in or create an account
  • Create an account
  • Log-in

Please use your real name.

Activation link will be sent to this address.

Minimum 8 characters

Enter your password again

Clicking this button confirms you read and agreed to the terms of use and privacy policy.


Save your watchlist

Fill your details below to receive project updates from your watch list - including new versions, price changes and discounts.

I agree to the terms of use and privacy policy.

2 licenses, starting from From » $9.99 View Licenses 14 day money-back guarantee
Post a comment

Or enter your name and Email
  • AS Adil Sheikh 1 year ago
    Looks like a great piece of software - do you have a demo that I can view. Thanks !