Fusion rules are expressed in a form of scripting language, FusionRuleML, but they can also be displayed in
a way that makes their logical form easier to understand. Their basic components are conditions and actions, which
in turn are comprised of schema variables, constants, and logical variables. A rule has zero or more conditions and
one or more actions. The conditions are evaluated for truth/falsity using a knowledgebase containing background knowledge
for the subject in question. If all of the conditions of a rule are found to be true (or if the rule has no conditions)
then the action is executed to produce part of a merged report.
For reasons of persistance, fusion rules are marked up in XML. XML fusion rule files can then be loaded into a fusion engine where they are transformed into Java objects before being grounded with textentries extracted from the input news reports. The DTD for FusionRuleML is as follows:
A simple rule, marked up using these tags, looks like this:
Complete rulesets marked up in this way can be found in the weather reports and bioinformatics cases studies.
It's not easy to discern the content or purpose of this rule. For this reason, we have also developed an alternative notation which better displays the logical form of a fusion rule.
We look, first, at the logical components of a fusion rule, and how they are represented. Schema variables indicate which textentry (or textentries) is to be extracted to ground or instantiate a condition so that it in turn can be evaluated by the knowledgebase. Schema variables have two components: (1) a number prefix indicating which individual input source is to be used to ground or instantiate the variable. Alternatively, if (as is more likely) we wish to collect textentries from all of the input information sources, we indicate this by using the prefix set. (2) a sequence of tag or element names separated by a forward slash, "/", indicating the branch in the report from which the textentry is to be taken. So for example, suppose we wished to merge some weather reports that had a root node <weatherreport> and a child element <today> that tagged the textentry for today's weather. If we wished to use the textentry for today's weather from information source number 2, we would use the variable:
Alternatively, if we wished to extract the textentries for today's weather from all of the input information sources, then we would use the variable:
Constants are not ground or instantiated by textentries, rather they are used to refer to the branches or to the individual tags or elements of information sources. Most commonly, they are used to indicate to which part of the merged output a particular piece of merged information is to be attached, and so typically (though not exclusively) they occur in actions rather than conditions. A constant, then, simply specifies either a branch or an individual tag in a report. Thus the constant
specifies the branch in a weather report that will have as a leaf node the textentry for the maximum temperature.
The knowledgebase not only evaluates whether a condition is true, it also provides information to be included in the merged output. So for example, we might wish to know which of all of the textentries for today's weather is the one that occurs most commonly, and then use that textentry in the merged output. To do this the fusion rule must contain a logical variable which is provided with a binding by the knowledgebase. This binding will then be used either as input to a further condition, or as input to an action so that it can be included in the merged output. Logical variables are indicated by uppercase letters "W", "X", "Y", "Z", etc.
Finally, we can combine variables, constants, and logical variables, into conditions and actions. The name of a condition or action occurs immediately to the left of the round brackets "()", and the variables, constants, and logical variables, within the scope of a condition or an action are placed within those brackets. Multiple conditions or actions within a rule are conjoined by "AND", while the conditions (if there are any) are prepended to the actions by "IMPLIES". In the following rule, all of the textentries for today's weather are checked to see if they belong to the same equivalence class (as determined by a knowledge engineer), e.g. "cloud", "cloudy", "mostly cloudy", etc. If they do all belong to that class, the knowledgebase selects the preferred term, and uses it to bind the logical variable "X". This in turn is used as the textentry to be added to the weatherreport/today branch in the merged output.
This is the same rule that was displayed above using FusionRuleML, but it's much easier to read using this notation.
A number of actions have been developed for constructing the output merged information. In the following examples the term "variable" should be understood to denote either a schema variable or a logical variable.
We can now explain a more complex fusion rule taken from the bioinformatics case study.
For reasons of computational efficiency this rule combines two separate tasks. (1) The first condition extracts all of the functional annotations (or rather their associated Gene Ontology ID numbers) of the input protein domains and finds their semantic generalization--that is, the functional annotation that is general enough to cover all of the input annotations, but no more general than is necessary. This least upper bound is returned by the knowledgebase as the binding for the logical variable X. The second condition checks that this least upper bound is not so general as to be uninformative. The third condition gets the actual textual functional annotation corresponding to the GO ID number of the least upper bound (or bounds). This textentry is used to bind the logical variable Y. The first action then adds this textentry, or the conjunction of textentries if there is more than one least upper bound, to the <commonfunction> tag in the output biofusionAnalysis report.
(2) The fourth condition takes the least upper bound or bounds, and finds the ancestors of these in the GO functional hierarchy. These ancestors are used to bind the logical variable W. The final condition extracts the key words and phrases from the textual functional annotations corresponding to these ancestors, duplicate words or phrases being removed. These keywords are used to bind the logical variable Z. The second action creates a new tag <keywords> in the merged output, attaching it to the root tag <biofusionAnalysis>. The third action takes this list of keywords or phrases that binds the logical variable Z, and for each member of that list a new tag or element <keyword> is created, that list member is attached as a textentry to that new tag, and this new atomic tree is attached as a child element to the <biofusionAnalysis/keywords> branch in the output merged information.
Combined with an intial rule that sets up the skeleton of the output merged information, an example of a resulting report looks like this: