A template specifies a pattern of words in a document. BioRAT can apply such a template to a document to find a set of interesting facts. As with comparable information extraction systems, one potential bottleneck is the process of designing new templates. Therefore, BioRAT now includes a simple tool, with a graphical-interface, to allow simple templates to be created and tested, while requiring the mininum of technical or linguistic knowledge. Even more than the rest of BioRAT, this tool is still under developement. Be warned!
Below are further details about creating templates and editing gazetteers.From the BioRAT start up window, select "Create new template". The following window will appear:
When a text file is displayed, find a phrase of interest, and click on any word. The selected word will be shown in the top-right hand block of the window, along with associated information: the part of speech (e.g. noun, verb etc.), the stem (i.e. the root form of the word), any gazetteer matches (in the form "<major type, minor type>". Clicking on any of these will append the selection to the current pattern, which is displayed in the bottom section of the window.
For example, the phrase "the axons were isolated from the frontal cortex of 10 mice" might interest you if you want to know about cell biology in mammals generally. So the pattern you would want to create would (informally) be: <cell type> <word>* <mammal> I.e. you want to find phrases that name a type of cell, followed by some other words, followed by the name of a mammal. To create a suitable template, load the text containing the phrase, first click on "axons" and select the gazetter entry "cells". Then click on the "any" drop-down box and select "WORD" to match any word at all, and click "Optional" to add a "?" to the " Once familiar with building patterns and templates, you may want to use some alternative convenient methods. Right-clicking on a word in the text area will show the word's features (gazetteer entries, parts of speech etc.) which can be selected directly. Alternatively, a whole phrase can be selected by clicking and dragging, as shown in the figure below. This causes a new pattern to appear in a separate pop-up dialogue box. Each word in the selected phrase is shown, with drop-down boxes containing the features of each word. Select the required features, including wildcards. When you're happy, just click "Create pattern" to create a new pattern based on the selection, which will overwrite any pattern currently loaded. Or click cancel.
Each gazetteer is a text file stored (by default) in the biorat/data/gazetteer directory, with a corresponding entry in the lists.def file. See GATE documentation for further details. Depending on your sources, it may be easier to create gazetters outside of BioRAT, e.g. saving a database table as a suitable text file, and editing lists.def directly. But sometimes, especially when editing a template, it may be more convenient to view and edit gazetteers through BioRAT.
To add a word to a gazetteer: Select the word in document, and click "Edit gazetteers..." A list of available gazetteers will be shown. Pick one of these by clicking on it, or create a new one (see below). When selected, click "Edit gazetteer", and the current contents will be displayed. To add a new word, click "Add entry..." The word you selected in the document (several clicks ago!) will be shown - just click "OK" to add it, or type in a new word and "OK" to add that. When you've added all the words/phrases you want to, click "Done". Words can be deleted from gazetteers through the same display.
To create a new gazetteer: click "Create gazetteer..." and type in the filename and the two-level description: major category and minor (sub) category. The minor category is optional. If the file already exists, then its contents will be displayed, otherwise an empty list will be displayed. In either case, you can add new entries as described above.
To change which gazetteers will be used: Select a gazetter from the list, and use the "Add gazetteer to list" or "Remove gazetteer from list" as required. The "list" in question is the lists.def file, which lists all the gazetteers that will be applied when you use the template match utilities. A different list file can be chosen using the "Select list file" button. From the command line, the gazetter list file can be specified with the "-g" option; from the GUI interface, the biorat.ini file must be edited, and the GAZETTEER_LOC entry modified. Currently, only one gazetteer list can be used at a time.
After editing gazetteers: After you finished editing the gazetteers, click "Done". If you have loaded a document into the template design tool, then BioRAT will re-apply the gazetteers to the document. This may take a few seconds. Please be patient!
Editing Gazetteers
Selecting a gazetteer
Viewing / editing contents of one gazetteer
BioRAT home
David Corney home
Bioinformatics home