luojiehua
/
iepy-develop


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
							From 0 to IEPY
==============

In this tutorial we will guide you through the steps to create your first
Information Extraction application with IEPY.
Be sure you have a working :doc:`installation <installation>`.

IEPY internally uses `Django <https://www.djangoproject.com/>`_ to define the database models,
and to provide a web interface. You'll see some components of Django around the project, such as the
configuration file (with the database definition) and the ``manage.py`` utility. If you're familiar
with Django, you will move faster in some of the steps.


0 - Creating an instance of IEPY
--------------------------------

To work with IEPY, you'll have to create an *instance*.
This is going to be where the configuration, database and some binary files are stored.
To create a new instance you have to run:

.. code-block:: bash

    iepy --create <project_name>

Where *<project_name>* is something that you choose.
This command will ask you a few things such as database name, its username and its password.
When that's done, you'll have an instance in a folder with the name that you chose.

Read more about the instantiation process :doc:`here <instantiation>`.


1 - Loading the database
------------------------

The way we load the data into the database is importing it from a *csv* file. You can use the script **csv_to_iepy**
provided in your application folder to do it.


.. code-block:: bash

    python bin/csv_to_iepy.py data.csv

This will load **data.csv** into the database, from which the data will subsequently be accessed.

Learn more about the required CSV file format `here <instantiation.html#csv-importer>`_.


.. note::

    You might also provide a *gziped csv file.*


2 - Pre-processing the data
---------------------------

Once you have your database with the documents you want to analyze, you have to
run them through the pre-processing pipeline to generate all the information needed by IEPY's core.

The pre-processing pipeline runs a series of steps such as 
text tokenization, sentence splitting, lemmatization, part-of-speech tagging,
and named entity recognition

:doc:`Read more about the pre-processing pipeline here. <preprocess>`

Your IEPY application comes with code to run all the pre-processing steps.
You can run it by doing:

.. code-block:: bash

    python bin/preprocess.py

This *will* take a while, especially if you have a lot of data.


3 - Open the web interface
--------------------------

To help you control IEPY, you have a web user interface.
Here you can manage your database objects and label the information
that the active learning core will need.

To access the web UI, you must run the web server. Don't worry, you have everything
that you need on your instance folder and it's as simple as running:

.. code-block:: bash

    python bin/manage.py runserver

Leave that process running, and open up a browser at `http://127.0.0.1:8000 <http://127.0.0.1:8000>`_ to view
the user interface home page.

Now it's time for you to *create a relation definition*. Use the web interface to create the relation that you
are going to be using.

IEPY
----

Now, you're ready to run either the :doc:`active learning core <active_learning_tutorial>`
or the :doc:`rule based core <rules_tutorial>`.


Constructing a reference corpus
-------------------------------

To test information extraction performance, IEPY provides a tool for labeling the entire corpus "by hand"
and the check the performance experimenting with that data.

If you would like to create a labeled corpus to test the performance or for other purposes, take a look at
the :doc:`corpus labeling tool <corpus_labeling>`