Fusion Rule Technology

Weather Reports Case Study

The Weather Reports Case Study is a trial of fusion rule technology for merging weather reports obtained from websites including the BBC and The Weather Channel. In this trial, a knowledgebase and several sets of logical fusion rules for merging weather reports were developed that could handle a wide range of input weather reports obtained from the selected sites.

The weather reports obtained from the websites are already in the form of structured text though not in the form of XML. They were marked up in XML by hand, though it would be straightforward to automate the construction of each XML file from the HTML file. We use the following DTD for the weather reports:

<!ELEMENT weatherreport (source, date, city, today, temp, windspeed*, relativehumidity*, daylighthours*, pressure*, visibility*, visibilitydistance*, airpollutionindex*, sunindex*, sunindexrating*, dewpoint*)> <!ELEMENT temp (max, min)> <!ELEMENT daylighthours (sunrise, sunset)> <!ELEMENT pressure (absolutevalue, directionofchange)> <!ELEMENT source (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT today (#PCDATA)> <!ELEMENT max (#PCDATA)> <!ELEMENT min (#PCDATA)> <!ELEMENT windspeed (#PCDATA)> <!ELEMENT relativehumidity (#PCDATA)> <!ELEMENT sunrise (#PCDATA)> <!ELEMENT sunset (#PCDATA)> <!ELEMENT absolutevalue (#PCDATA)> <!ELEMENT directionofchange (#PCDATA)> <!ELEMENT visibility (#PCDATA)> <!ELEMENT visibilitydistance (#PCDATA)> <!ELEMENT airpollutionindex (#PCDATA)> <!ELEMENT sunindex (#PCDATA)> <!ELEMENT sunindexrating (#PCDATA)> <!ELEMENT dewpoint (#PCDATA)>

In addition to the DTD, the textentries for the structured news reports were delineated as follows: Cities Major U.K. cities; Accepted Sources Main U.K. TV and radio broadcasters, and national newspapers; Today Today's weather conditions in the form of a word or phrase from a selection of 66 possibilities (e.g. cloudy, scattered showers, mostly wet, prolonged rain, etc); Date Acceptable Date Formats for day (e.g. 1, 1st, etc.), month (e.g. 1, Jan, January, etc.), year (e.g. 1998, 98, etc.), and these can be separated by " , ", " . ", " - ", " / ", " \ " and " "; Visibility Quantitative (e.g. 1 Mile, 1 km, etc.) with km converted to miles; Visibility Qualitative from the selection of good, fair, very good, excellent, unlimited, moderate, and poor; Pressure Absolute Value (e.g. 1021.0 mb, etc.); Pressure Direction of Change from the selection of rising, falling, and steady; Max and Min Temperatures (e.g. -1C, 23.9F, etc.) with F converted to C; Windspeed (e.g. 15mph, 23kph, etc.) with KPH converted to MPH; Humidity (e.g. 80\%, etc.); Time (e.g. 10.13, 22.13, etc.) with 24hr converted to 12hr; Sunindex from the selection of low, high, and medium.

Some example sets of weather reports and the Prolog knowledgebase (this uses several built-in predicates from LPA WIN-PROLOG 4300) can be found at the following links:

Weather reports

Prolog knowledgebase

Conflicts between information sources can be handled in a variety of ways. The first set of rules deals with conflicts by using one of a number of simple voting procedures, popular sharing, that selects the most frequently occurring term or terms. The second set of rules deals with conflicts by using a weighted voting procedure, weighted firstpastthepost. Each source is given a weighting between 0 and 1, and this weighting is transferred to the textentry from each report. The textentry with the highest total is selected, but if two or more textentries tie, none is selected. (Rules embodying other approaches to voting are available here.) The third set of rules resolves conflicts by selecting textentries from reports with preferred sources. The fourth set of rules deals with conflicts by selecting (if possible) the most general textentry, where that generality is defined using a hierarchical semantic network in the knowledgebase. (Because of the nature of the reports, only rule 3, dealing with conflicts between textentries for today's weather, is amenable to this approach--all the other rules in this set are the same as for the first, voting, approach.) The fourth set of rules aggregates conflicting textentries simply by taking the disjunction of the input textentries. Because no attention is paid to the meaning of the textentries, we refer to this approach as "syntactic generalization".

The fusion rules for merging weather reports, in both logical form (these are easier to read) and in XML (for the mark-up language), can be found at the following links. (For an explanation of how to read a fusion rule click here.)

Rules that resolve conflicts by a simple voting procedure (popular sharing)

Fusion rules in logical form

Fusion rules in XML

Rules that resolve conflicts using a weighted voting procedure (weighted first-past-the-post)

Fusion rules in logical form

Fusion rules in XML

Rules that resolve conflicts by using preferences over sources

Fusion rules in logical form

Fusion rules in XML

Rules that resolve conflicts by using most general term/least upper bound (semantic generalization)

Fusion rules in logical form

Fusion rules in XML

Rules that resolve conflicts by syntactic generalization (disjunction)

Fusion rules in logical form

Fusion rules in XML

(For an explanation of how to read a fusion rule click here.)

The Merged Reports

The merged reports can be found at the links bellow. For ease of comparison, the merged reports that resulted from using the rules embodying the four different approaches have been combined into a single file. In the case of the preference rules, the linear preference ordering selected was purely for the sake of illustration, and so is completely arbitrary. That ordering is (with most preferred first): The Weather Channel, BBCi, BBC London, CNN. The weights selected for the weighted voting rules are equally arbitrary. Those weights are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.

The reports for March 26th nicely illustrate some differences between the approaches: Because three out of the four reports said that the weather was cloudy or mostly cloudy, merging the reports using the rules for simple voting (popular sharing) determined that the textentry for today is "cloud". Weighted voting produced the same result, given that the two BBC sources were given the most weight, and they agreed that the weather was cloudy. However, the lone report that disagreed was the Weather Channel (the most preferred source), so merging the reports using the rules that resolved conflicts by preferences over sources resulted in the textentry for today being "sunny". Because "cloudy" and "sunny" cannot usefully be subsumed under any more general kind of weather, merging the reports using the third set of rules produced the textentry "no least upper bound". Finally, the logical generalization of the four textentries is "cloudy, mostly cloudy, or sunny."

A second conflict between these reports concerns their textentries for daylighthours. While the two BBC sources are almost in agreement, CNN's times are an hour later (presumably the result of mistaken data entry, since local time on this date still corresponded to Greenwich Mean Time). Resolving this conflict by voting, then, results in using the interval of the two BBC reports (6.20 - 6.21, for sunset). However, in resolving the conflict by preferences over sources there is no textentry for daylighthours from the most preferred source (The Weather Channel), so the fusion rules select the text entry from the most preferred source that has a text entry, namely BBCi (6.20). The least upper bound rules do not apply to this case, but the logical or syntactic generalization produces the textentry "6.21, 6.20 or 7.24".

A more fine-grained approach to resolving conflicts via preferences over sources is to have different preference orderings depending on the subject; BBCi may be preferred to The Weather Channel for windspeed, say, whilst The Weather Channel may be preferred to BBCi for relativehumidity. Such preferences may very well be arbitrary in this particular case, but in merging news reports about war, for example, one might easily prefer one source for casualty figures about one side while preferring another source for casualty figures about the other side. Predicates for resolving conflicts in this way can be found in the knowledgebase, though they have not been embodied in rules for this case study.

Merged reports for December 3rd 2002

Merged reports for December 4th 2002

Merged reports for December 5th 2002

Merged reports for December 6th 2002

Merged reports for December 9th 2002

Merged reports for December 10th 2002

Merged reports for December 11th 2002

Merged reports for March 24th 2003

Merged reports for March 26th 2003

Merged reports for March 27th 2003

Table 1 summarizes the results of using the five different sets of rules to merge the textentries for the report element today. The reports for December 4th best illustrate the differences between the approaches: simple voting by popular sharing produces a tie, so the textentry "rain or showers" is selected. "rain" is the textentry selected by weighted voting because, even though as many reports predict showers, the BBC reports have been assigned the most weight. Because the fourth report (The Weather Channel) was selected as the preferred source, resolving the conflict in this way results in the textentry "showers" being chosen. And because (arguably) RAIN is a more general concept than SHOWERS, "rain" is the textentry selected by using the semantic generalization of the textentries. The fourth approach, syntactic generalization of the textentries, in this case produces the same result, "rain or showers", as the first approach. By contrast, the reports for December the 6th do not conflict, hence none of these four approaches to resolving conflicts is required, the preferred term for the equivalence class to which all the textentries belong being used instead. Finally, it is worth noting that the great variety of textentries for today's weather means that, in most cases, there is no more general term that subsumes all the textentries yet is at the same time specific enough to be informative, and so in most cases there is no semantic generalization or no least upper bound.

Table 1: Results of five approaches to merging the textentry for today
Date Report Textentries
(BBCi, BBC LDN, CNN, Weather Channel) Aggregation Function

Simple Voting (PopularSharing) Weighted Voting^* (FirstPastThePost) Preferred Source⁺ Semantic Generalization Syntactic Generalization

3/12/02 showers, showers, cloudy, cloudy showers or cloud showers cloudy no least upper bound showers or cloudy

4/12/02 rain, rain, showers, showers rain or showers rain showers rain rain or showers

5/12/02 cloudy, cloudy, sunny, mostly cloudy cloud cloud mostly cloudy no least upper bound cloudy, sunny or mostly cloudy

6/12/02 cloudy, cloudy, cloudy, cloudy cloud cloud cloud cloud cloud

9/12/02 sunny, sunny, sunny, partly cloudy sun sun partly cloudy no least upper bound sunny or partly cloudy

10/12/02 cloudy, cloudy, snow, rain/snow cloud cloud rain/snow no least upper bound cloudy, snow or rain/snow

11/12/02 sunny, sunny, cloudy, mostly cloudy sun or cloud sun mostly cloudy no least upper bound sunny, cloudy or mostly cloudy

24/3/03 sunny, sunny, p/cloudy, fair sun sun fair no least upper bound sunny, p/cloudy or fair

26/3/03 cloudy, cloudy, mostly cloudy, sunny cloud cloud sunny no least upper bound cloudy, mostly cloudy or sunny

27/3/03 sunny, sunny, cloudy, mostly sunny sun sun mostly sunny no least upper bound sunny, cloudy or mostly sunny

Table 1: Results of five approaches to merging the textentry for `today`
Date	Report Textentries (BBCi, BBC LDN, CNN, Weather Channel)	Aggregation Function
Simple Voting (PopularSharing)	Weighted Voting^* (FirstPastThePost)	Preferred Source⁺	Semantic Generalization	Syntactic Generalization
3/12/02	showers, showers, cloudy, cloudy	showers or cloud	showers	cloudy	no least upper bound	showers or cloudy
4/12/02	rain, rain, showers, showers	rain or showers	rain	showers	rain	rain or showers
5/12/02	cloudy, cloudy, sunny, mostly cloudy	cloud	cloud	mostly cloudy	no least upper bound	cloudy, sunny or mostly cloudy
6/12/02	cloudy, cloudy, cloudy, cloudy	cloud	cloud	cloud	cloud	cloud
9/12/02	sunny, sunny, sunny, partly cloudy	sun	sun	partly cloudy	no least upper bound	sunny or partly cloudy
10/12/02	cloudy, cloudy, snow, rain/snow	cloud	cloud	rain/snow	no least upper bound	cloudy, snow or rain/snow
11/12/02	sunny, sunny, cloudy, mostly cloudy	sun or cloud	sun	mostly cloudy	no least upper bound	sunny, cloudy or mostly cloudy
24/3/03	sunny, sunny, p/cloudy, fair	sun	sun	fair	no least upper bound	sunny, p/cloudy or fair
26/3/03	cloudy, cloudy, mostly cloudy, sunny	cloud	cloud	sunny	no least upper bound	cloudy, mostly cloudy or sunny
27/3/03	sunny, sunny, cloudy, mostly sunny	sun	sun	mostly sunny	no least upper bound	sunny, cloudy or mostly sunny

^*Weights assigned are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.
⁺Order of preferences is (most preferred first): The Weather Channel, BBCi, BBC London, CNN.

Many other voting functions could be defined. Table 2 summarizes the results of applying rules embodying eight voting functions for merging the textentry for today's weather. Informally, we may define these functions as follows:

Majority Voting

-- The most frequently occurring term is selected, provided that frequency is greater than 0.5.

Popular Sharing

-- The most frequently occurring term or terms are selected.

First Past the Post

-- The most frequently occurring term is selected. If two or more terms are tied for the greatest frequency, no term is selected.

Threshold Voting

-- The most frequently occurring term or terms are selected, provided that frequency is equal to or greater than a specified threshold.

Each of these voting functions can be applied in one of two ways:

Simple Voting

-- The textentry from each report is assigned one vote, and the sum of votes for each (kind of) textentry is used to calculate the frequency with which it occurs.

Weighted Voting

-- The source of each report is assigned a numerical value between 0 and 1. The textentry from each report is then assigned this numerical value and the sum of these values is then used to calculate the frequency with which the textentry occurs.

Combining the voting functions with the ways they can be applied yields eight aggregation functions for merging conflicting information. Sets of fusion rules embodying these aggregation functions may be found on the page concerned with aggregation by voting.

Table 2: Results of eight voting functions for merging the textentry for today
Date Report Textentries
(BBCi, BBC LDN, CNN, Weather Channel) Voting Aggregation Function

PopularSharing FirstPastThePost Majority Threshold⁺

Simple Voting Weighted Voting^* Simple Voting Weighted Voting^* Simple Voting Weighted Voting^* Simple Voting Weighted Voting^*

3/12/02 showers, showers, cloudy, cloudy showers or cloud showers no first past the post showers no text entry has a majority showers no textentry has 75% of the votes no textentry has 75% of the votes

4/12/02 rain, rain, showers, showers rain or showers rain no first past the post rain no text entry has a majority rain no text entry has 75% of the votes no textentry has 75% of the votes

5/12/02 cloudy, cloudy, sunny, mostly cloudy cloud cloud cloud cloud cloud cloud cloud cloud

6/12/02 cloudy, cloudy, cloudy, cloudy cloud cloud cloud cloud cloud cloud cloud cloud

9/12/02 sunny, sunny, sunny, partly cloudy sun sun sun sun sun sun sun sun

10/12/02 cloudy, cloudy, snow, rain/snow cloud cloud cloud cloud no text entry has a majority cloud no text entry has 75% of the votes no textentry has 75% of the votes

11/12/02 sunny, sunny, cloudy, mostly cloudy sun or cloud sun no first past the post sun no text entry has a majority sun no text entry has 75% of the votes no textentry has 75% of the votes

24/3/03 sunny, sunny, p/cloudy, fair sun sun sun sun sun sun sun sun

26/3/03 cloudy, cloudy, mostly cloudy, sunny cloud cloud cloud cloud cloud cloud cloud cloud

27/3/03 sunny, sunny, cloudy, mostly sunny sun sun sun sun sun sun sun sun

^*Weights assigned are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.
⁺Threshold is 0.75.

Table 2: Results of eight voting functions for merging the textentry for `today`
Date	Report Textentries (BBCi, BBC LDN, CNN, Weather Channel)	Voting Aggregation Function
PopularSharing	FirstPastThePost	Majority	Threshold⁺
Simple Voting	Weighted Voting^*	Simple Voting	Weighted Voting^*	Simple Voting	Weighted Voting^*	Simple Voting	Weighted Voting^*
3/12/02	showers, showers, cloudy, cloudy	showers or cloud	showers	no first past the post	showers	no text entry has a majority	showers	no textentry has 75% of the votes	no textentry has 75% of the votes
4/12/02	rain, rain, showers, showers	rain or showers	rain	no first past the post	rain	no text entry has a majority	rain	no text entry has 75% of the votes	no textentry has 75% of the votes
5/12/02	cloudy, cloudy, sunny, mostly cloudy	cloud	cloud	cloud	cloud	cloud	cloud	cloud	cloud
6/12/02	cloudy, cloudy, cloudy, cloudy	cloud	cloud	cloud	cloud	cloud	cloud	cloud	cloud
9/12/02	sunny, sunny, sunny, partly cloudy	sun	sun	sun	sun	sun	sun	sun	sun
10/12/02	cloudy, cloudy, snow, rain/snow	cloud	cloud	cloud	cloud	no text entry has a majority	cloud	no text entry has 75% of the votes	no textentry has 75% of the votes
11/12/02	sunny, sunny, cloudy, mostly cloudy	sun or cloud	sun	no first past the post	sun	no text entry has a majority	sun	no text entry has 75% of the votes	no textentry has 75% of the votes
24/3/03	sunny, sunny, p/cloudy, fair	sun	sun	sun	sun	sun	sun	sun	sun
26/3/03	cloudy, cloudy, mostly cloudy, sunny	cloud	cloud	cloud	cloud	cloud	cloud	cloud	cloud
27/3/03	sunny, sunny, cloudy, mostly sunny	sun	sun	sun	sun	sun	sun	sun	sun

It should be added that the data in both tables is actual data obtained from the websites of the sources on the dates indicated. The merged textentries are the results the Fusion Rules did yield, not (just) what they ought to yield. Obviously, greater divergence in the results obtained could have been achieved by increasing the number of reports merged and changing the threshold required in the case of threshold voting. As a general comment, we may say that adopting some form of weighted voting procedure made it more likely that a single, determinate result was obtained, but, of course, different weighting assignments would have produced drastically different results.

Contact a.hunter@cs.ucl.ac.uk or +44 20 7679 7295.

Back to Fusion Rule Technology homepage.