Fusion Rule Technology

The Nature of Fusion Rules

Fusion rules are expressed in a form of scripting language, FusionRuleML, but they can also be displayed in a way that makes their logical form easier to understand. Their basic components are conditions and actions, which in turn are comprised of schema variables, constants, and logical variables. A rule has zero or more conditions and one or more actions. The conditions are evaluated for truth/falsity using a knowledgebase containing background knowledge for the subject in question. If all of the conditions of a rule are found to be true (or if the rule has no conditions) then the action is executed to produce part of a merged report.

FusionRuleML

For reasons of persistance, fusion rules are marked up in XML. XML fusion rule files can then be loaded into a fusion engine where they are transformed into Java objects before being grounded with textentries extracted from the input news reports. The DTD for FusionRuleML is as follows:

<!ELEMENT rulefile (rule+)> <!ELEMENT rule (condition*, action+)> <!ELEMENT condition (conditionname, (constant | variable | logicalvariable)+)> <!ELEMENT action (actionname, (variable | constant | logicalvariable)*, constant+)> <!ELEMENT constant (#PCDATA)> <!ELEMENT variable ((source | set), branch)> <!ELEMENT source (#PCDATA)> <!ELEMENT set (#PCDATA)> <!ELEMENT branch (#PCDATA)> <!ELEMENT conditionname (#PCDATA)> <!ELEMENT actionname (#PCDATA)> <!ELEMENT logicalvariable (#PCDATA)> <!ATTLIST conditionname sign (positive | negative) #REQUIRED> <!ATTLIST branch extract (textentry | subtree | nestedlist) #REQUIRED> <!ATTLIST rule code CDATA #REQUIRED status (foundational | optional) #REQUIRED>

A simple rule, marked up using these tags, looks like this:

<rule code="2" status="optional"> <condition> <conditionname sign="positive"> equivalentterms </conditionname> <variable> <set>1</set> <branch extract="textentry">weatherreport/today</branch> </variable> </condition> <condition> <conditionname sign="positive"> preferredterm </conditionname> <variable> <set>1</set> <branch extract="textentry">weatherreport/today</branch> </variable> <logicalvariable type="singleton"> X </logicalvariable> </condition> <action> <actionname>AddText</actionname> <logicalvariable type="singleton"> X </logicalvariable> <constant>weatherreport/today</constant> </action> </rule>

Complete rulesets marked up in this way can be found in the weather reports and bioinformatics cases studies.

It's not easy to discern the content or purpose of this rule. For this reason, we have also developed an alternative notation which better displays the logical form of a fusion rule.

A Logical Notation for Fusion Rules

We look, first, at the logical components of a fusion rule, and how they are represented. Schema variables indicate which textentry (or textentries) is to be extracted to ground or instantiate a condition so that it in turn can be evaluated by the knowledgebase. Schema variables have two components: (1) a number prefix indicating which individual input source is to be used to ground or instantiate the variable. Alternatively, if (as is more likely) we wish to collect textentries from all of the input information sources, we indicate this by using the prefix set. (2) a sequence of tag or element names separated by a forward slash, "/", indicating the branch in the report from which the textentry is to be taken. So for example, suppose we wished to merge some weather reports that had a root node <weatherreport> and a child element <today> that tagged the textentry for today's weather. If we wished to use the textentry for today's weather from information source number 2, we would use the variable:

2//weatherreport/today

Alternatively, if we wished to extract the textentries for today's weather from all of the input information sources, then we would use the variable:

set//weatherreport/today

Constants are not ground or instantiated by textentries, rather they are used to refer to the branches or to the individual tags or elements of information sources. Most commonly, they are used to indicate to which part of the merged output a particular piece of merged information is to be attached, and so typically (though not exclusively) they occur in actions rather than conditions. A constant, then, simply specifies either a branch or an individual tag in a report. Thus the constant

weatherreport/temp/max

specifies the branch in a weather report that will have as a leaf node the textentry for the maximum temperature.

The knowledgebase not only evaluates whether a condition is true, it also provides information to be included in the merged output. So for example, we might wish to know which of all of the textentries for today's weather is the one that occurs most commonly, and then use that textentry in the merged output. To do this the fusion rule must contain a logical variable which is provided with a binding by the knowledgebase. This binding will then be used either as input to a further condition, or as input to an action so that it can be included in the merged output. Logical variables are indicated by uppercase letters "W", "X", "Y", "Z", etc.

Finally, we can combine variables, constants, and logical variables, into conditions and actions. The name of a condition or action occurs immediately to the left of the round brackets "()", and the variables, constants, and logical variables, within the scope of a condition or an action are placed within those brackets. Multiple conditions or actions within a rule are conjoined by "AND", while the conditions (if there are any) are prepended to the actions by "IMPLIES". In the following rule, all of the textentries for today's weather are checked to see if they belong to the same equivalence class (as determined by a knowledge engineer), e.g. "cloud", "cloudy", "mostly cloudy", etc. If they do all belong to that class, the knowledgebase selects the preferred term, and uses it to bind the logical variable "X". This in turn is used as the textentry to be added to the weatherreport/today branch in the merged output.

equivalentterms(set//weatherreport/today) AND preferredterm(set//weatherreport/today, X) IMPLIES AddText(X, weatherreport/today)

This is the same rule that was displayed above using FusionRuleML, but it's much easier to read using this notation.

A number of actions have been developed for constructing the output merged information. In the following examples the term "variable" should be understood to denote either a schema variable or a logical variable.

AddText(Variable, Constant) -- the textentry that instantiates the variable is attached to the element or tag at the end of the branch specified by the constant.

AddNode(Constant₁, Constant₂) -- the tag or element specified by Constant₁ is added to the merged output information as a child of the tag or element at the end of the branch specified by Constant₂.

AddAtomicTree(Variable, Constant₁, Constant₂) -- the textentry that instantiates the variable is added to the tag or element specified by Constant₁, and this whole atomic tree is then added as a child of the tag or element at the end of the branch specified by Constant₂.

AddAtomicTrees(Variable, Constant₁, Constant₂) -- in this case the textentry that instantiates the variable must be in the form of a list. For each member of that list an atomic tree is then created and added to the output merged information in the same way as for the action AddAtomicTree.

Initialize(Constant) -- Constant specifies the skeleton (i.e. the set of tags or elements, together with their structure) of (part of) the output merged information. For example, if Constant was weatherreport(source,date,city,today,temp(max,min)), then the action Initialize(weatherreport(source,date,city,today,temp(max,min))) would produce a skeleton for the merged output that looked like this:

<weatherreport> <source></source> <date></date> <city></city> <today></today> <temp> <max></max> <min></min> </temp> </weatherreport>

We can now explain a more complex fusion rule taken from the bioinformatics case study.

semanticgeneralization(set//protein/function, X) AND arerequiredspecificity(X) AND getannotations(X, Y) AND findancestors(X, W) AND getkeywords(W, Z) IMPLIES AddText(Conjunction(Y), biofusionAnalysis/commonfunction) AND AddNode(keywords, biofusionAnalysis) AND AddAtomicTrees(Z, keyword, biofusionAnalysis/keywords)

For reasons of computational efficiency this rule combines two separate tasks. (1) The first condition extracts all of the functional annotations (or rather their associated Gene Ontology ID numbers) of the input protein domains and finds their semantic generalization--that is, the functional annotation that is general enough to cover all of the input annotations, but no more general than is necessary. This least upper bound is returned by the knowledgebase as the binding for the logical variable X. The second condition checks that this least upper bound is not so general as to be uninformative. The third condition gets the actual textual functional annotation corresponding to the GO ID number of the least upper bound (or bounds). This textentry is used to bind the logical variable Y. The first action then adds this textentry, or the conjunction of textentries if there is more than one least upper bound, to the <commonfunction> tag in the output biofusionAnalysis report.

(2) The fourth condition takes the least upper bound or bounds, and finds the ancestors of these in the GO functional hierarchy. These ancestors are used to bind the logical variable W. The final condition extracts the key words and phrases from the textual functional annotations corresponding to these ancestors, duplicate words or phrases being removed. These keywords are used to bind the logical variable Z. The second action creates a new tag <keywords> in the merged output, attaching it to the root tag <biofusionAnalysis>. The third action takes this list of keywords or phrases that binds the logical variable Z, and for each member of that list a new tag or element <keyword> is created, that list member is attached as a textentry to that new tag, and this new atomic tree is attached as a child element to the <biofusionAnalysis/keywords> branch in the output merged information.

Combined with an intial rule that sets up the skeleton of the output merged information, an example of a resulting report looks like this:

<biofusionAnalysis> <selectedproteins>1c4tA0, 1c4tC0, 1dpc00, 1eaf00 and 3cla00</selectedproteins> <commonfunction>acyltransferase activity (GO:0008415)</commonfunction> <keywords> <keyword>transferring acyl groups</keyword> <keyword>transferring groups other than amino-acyl groups</keyword> <keyword>Gene_Ontology</keyword> <keyword>acyltransferase activity</keyword> <keyword>catalytic activity</keyword> <keyword>molecular_function</keyword> <keyword>transferase activity</keyword> </keywords> </biofusionAnalysis>

Contact a.hunter@cs.ucl.ac.uk or +44 20 7679 7295.

Back to Fusion Rule Technology homepage.