- Universal Information Crawler is a medium between data gathering, data processing and data visualization.
The version 0.5 and 0.6 are still a python based crawler. The uicralwer 0.5 has been replaced with Harvestman Crawler which has many more features then 0.5 uicrawler.
- The new development is happening in trunk branch and UICrawler is now becoming a new project that allows you to easily manage and transfer data from one source to another.
- Import a svn copy:
svn co https://uicrawler.svn.sourceforge.net/svnroot/uicrawler/trunk/
- Get the code using rsync
rsync -av uicrawler.svn.sourceforge.net::svn/uicrawler/trunk/* .
- Issue this command to install the program
python setup.py install
- To see if your template got installed run this command:
paster create --list-templates
- You should see uicrawler as one of them
uicrawler: Template for creating basic datamining, crawling,converting package.
- Start your new project:
paster create -t 'uicrawler'
- After you answer all questions you should see file structure like this:
- I called my project somedata.
somedata |-- __init__.py |-- somedata | |-- Convert | | `-- mydata.txt | |-- HDF5 | |-- RoughData | | |-- download.sh | | `-- download_list.txt | |-- __init__.py | `-- model | `-- model.template |-- somedata.egg-info | |-- PKG-INFO | |-- SOURCES.txt | |-- dependency_links.txt | |-- entry_points.txt | |-- not-zip-safe | |-- paster_plugins.txt | `-- top_level.txt |-- setup.cfg `-- setup.py
- In Convert folder you will find a sample file where you can write your convert script
- In model you can write your file deffinition using sqlalchemy.
- in data you download the actual data.