Fusion Rule Technology





Weather Reports Case Study

The Weather Reports Case Study is a trial of fusion rule technology for merging weather reports obtained from websites including the BBC and The Weather Channel. In this trial, a knowledgebase and several sets of logical fusion rules for merging weather reports were developed that could handle a wide range of input weather reports obtained from the selected sites.

The weather reports obtained from the websites are already in the form of structured text though not in the form of XML. They were marked up in XML by hand, though it would be straightforward to automate the construction of each XML file from the HTML file. We use the following DTD for the weather reports:

<!ELEMENT weatherreport (source, date, city, today, temp, windspeed*, relativehumidity*,
daylighthours*, pressure*, visibility*, visibilitydistance*, airpollutionindex*,
sunindex*, sunindexrating*, dewpoint*)>
<!ELEMENT temp (max, min)>
<!ELEMENT daylighthours (sunrise, sunset)>
<!ELEMENT pressure (absolutevalue, directionofchange)>
<!ELEMENT source (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT today (#PCDATA)>
<!ELEMENT max (#PCDATA)>
<!ELEMENT min (#PCDATA)>
<!ELEMENT windspeed (#PCDATA)>
<!ELEMENT relativehumidity (#PCDATA)>
<!ELEMENT sunrise (#PCDATA)>
<!ELEMENT sunset (#PCDATA)>
<!ELEMENT absolutevalue (#PCDATA)>
<!ELEMENT directionofchange (#PCDATA)>
<!ELEMENT visibility (#PCDATA)>
<!ELEMENT visibilitydistance (#PCDATA)>
<!ELEMENT airpollutionindex (#PCDATA)>
<!ELEMENT sunindex (#PCDATA)>
<!ELEMENT sunindexrating (#PCDATA)>
<!ELEMENT dewpoint (#PCDATA)>

In addition to the DTD, the textentries for the structured news reports were delineated as follows: Cities Major U.K. cities; Accepted Sources Main U.K. TV and radio broadcasters, and national newspapers; Today Today's weather conditions in the form of a word or phrase from a selection of 66 possibilities (e.g. cloudy, scattered showers, mostly wet, prolonged rain, etc); Date Acceptable Date Formats for day (e.g. 1, 1st, etc.), month (e.g. 1, Jan, January, etc.), year (e.g. 1998, 98, etc.), and these can be separated by " , ", " . ", " - ", " / ", " \ " and " "; Visibility Quantitative (e.g. 1 Mile, 1 km, etc.) with km converted to miles; Visibility Qualitative from the selection of good, fair, very good, excellent, unlimited, moderate, and poor; Pressure Absolute Value (e.g. 1021.0 mb, etc.); Pressure Direction of Change from the selection of rising, falling, and steady; Max and Min Temperatures (e.g. -1C, 23.9F, etc.) with F converted to C; Windspeed (e.g. 15mph, 23kph, etc.) with KPH converted to MPH; Humidity (e.g. 80\%, etc.); Time (e.g. 10.13, 22.13, etc.) with 24hr converted to 12hr; Sunindex from the selection of low, high, and medium.

Some example sets of weather reports and the Prolog knowledgebase (this uses several built-in predicates from LPA WIN-PROLOG 4300) can be found at the following links:

  • Weather reports
  • Prolog knowledgebase
  • Conflicts between information sources can be handled in a variety of ways. The first set of rules deals with conflicts by using one of a number of simple voting procedures, popular sharing, that selects the most frequently occurring term or terms. The second set of rules deals with conflicts by using a weighted voting procedure, weighted firstpastthepost. Each source is given a weighting between 0 and 1, and this weighting is transferred to the textentry from each report. The textentry with the highest total is selected, but if two or more textentries tie, none is selected. (Rules embodying other approaches to voting are available here.) The third set of rules resolves conflicts by selecting textentries from reports with preferred sources. The fourth set of rules deals with conflicts by selecting (if possible) the most general textentry, where that generality is defined using a hierarchical semantic network in the knowledgebase. (Because of the nature of the reports, only rule 3, dealing with conflicts between textentries for today's weather, is amenable to this approach--all the other rules in this set are the same as for the first, voting, approach.) The fourth set of rules aggregates conflicting textentries simply by taking the disjunction of the input textentries. Because no attention is paid to the meaning of the textentries, we refer to this approach as "syntactic generalization".

    The fusion rules for merging weather reports, in both logical form (these are easier to read) and in XML (for the mark-up language), can be found at the following links. (For an explanation of how to read a fusion rule click here.)

    Rules that resolve conflicts by a simple voting procedure (popular sharing)
  • Fusion rules in logical form

  • Fusion rules in XML

  • Rules that resolve conflicts using a weighted voting procedure (weighted first-past-the-post)
  • Fusion rules in logical form

  • Fusion rules in XML

  • Rules that resolve conflicts by using preferences over sources
  • Fusion rules in logical form

  • Fusion rules in XML

  • Rules that resolve conflicts by using most general term/least upper bound (semantic generalization)
  • Fusion rules in logical form

  • Fusion rules in XML

  • Rules that resolve conflicts by syntactic generalization (disjunction)
  • Fusion rules in logical form

  • Fusion rules in XML
  • (For an explanation of how to read a fusion rule click here.)


    The Merged Reports

    The merged reports can be found at the links bellow. For ease of comparison, the merged reports that resulted from using the rules embodying the four different approaches have been combined into a single file. In the case of the preference rules, the linear preference ordering selected was purely for the sake of illustration, and so is completely arbitrary. That ordering is (with most preferred first): The Weather Channel, BBCi, BBC London, CNN. The weights selected for the weighted voting rules are equally arbitrary. Those weights are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.

    The reports for March 26th nicely illustrate some differences between the approaches: Because three out of the four reports said that the weather was cloudy or mostly cloudy, merging the reports using the rules for simple voting (popular sharing) determined that the textentry for today is "cloud". Weighted voting produced the same result, given that the two BBC sources were given the most weight, and they agreed that the weather was cloudy. However, the lone report that disagreed was the Weather Channel (the most preferred source), so merging the reports using the rules that resolved conflicts by preferences over sources resulted in the textentry for today being "sunny". Because "cloudy" and "sunny" cannot usefully be subsumed under any more general kind of weather, merging the reports using the third set of rules produced the textentry "no least upper bound". Finally, the logical generalization of the four textentries is "cloudy, mostly cloudy, or sunny."

    A second conflict between these reports concerns their textentries for daylighthours. While the two BBC sources are almost in agreement, CNN's times are an hour later (presumably the result of mistaken data entry, since local time on this date still corresponded to Greenwich Mean Time). Resolving this conflict by voting, then, results in using the interval of the two BBC reports (6.20 - 6.21, for sunset). However, in resolving the conflict by preferences over sources there is no textentry for daylighthours from the most preferred source (The Weather Channel), so the fusion rules select the text entry from the most preferred source that has a text entry, namely BBCi (6.20). The least upper bound rules do not apply to this case, but the logical or syntactic generalization produces the textentry "6.21, 6.20 or 7.24".

    A more fine-grained approach to resolving conflicts via preferences over sources is to have different preference orderings depending on the subject; BBCi may be preferred to The Weather Channel for windspeed, say, whilst The Weather Channel may be preferred to BBCi for relativehumidity. Such preferences may very well be arbitrary in this particular case, but in merging news reports about war, for example, one might easily prefer one source for casualty figures about one side while preferring another source for casualty figures about the other side. Predicates for resolving conflicts in this way can be found in the knowledgebase, though they have not been embodied in rules for this case study.

  • Merged reports for December 3rd 2002
  • Merged reports for December 4th 2002
  • Merged reports for December 5th 2002
  • Merged reports for December 6th 2002
  • Merged reports for December 9th 2002
  • Merged reports for December 10th 2002
  • Merged reports for December 11th 2002
  • Merged reports for March 24th 2003
  • Merged reports for March 26th 2003
  • Merged reports for March 27th 2003
  • Table 1 summarizes the results of using the five different sets of rules to merge the textentries for the report element today. The reports for December 4th best illustrate the differences between the approaches: simple voting by popular sharing produces a tie, so the textentry "rain or showers" is selected. "rain" is the textentry selected by weighted voting because, even though as many reports predict showers, the BBC reports have been assigned the most weight. Because the fourth report (The Weather Channel) was selected as the preferred source, resolving the conflict in this way results in the textentry "showers" being chosen. And because (arguably) RAIN is a more general concept than SHOWERS, "rain" is the textentry selected by using the semantic generalization of the textentries. The fourth approach, syntactic generalization of the textentries, in this case produces the same result, "rain or showers", as the first approach. By contrast, the reports for December the 6th do not conflict, hence none of these four approaches to resolving conflicts is required, the preferred term for the equivalence class to which all the textentries belong being used instead. Finally, it is worth noting that the great variety of textentries for today's weather means that, in most cases, there is no more general term that subsumes all the textentries yet is at the same time specific enough to be informative, and so in most cases there is no semantic generalization or no least upper bound.

    Table 1: Results of five approaches to merging the textentry for today
    DateReport Textentries
    (BBCi, BBC LDN, CNN, Weather Channel)
    Aggregation Function
    Simple Voting (PopularSharing)Weighted Voting* (FirstPastThePost)Preferred Source+Semantic GeneralizationSyntactic Generalization
    3/12/02showers, showers, cloudy, cloudyshowers or cloudshowerscloudyno least upper boundshowers or cloudy
    4/12/02rain, rain, showers, showersrain or showersrainshowersrainrain or showers
    5/12/02cloudy, cloudy, sunny, mostly cloudycloudcloudmostly cloudyno least upper boundcloudy, sunny or mostly cloudy
    6/12/02cloudy, cloudy, cloudy, cloudycloudcloudcloudcloudcloud
    9/12/02sunny, sunny, sunny, partly cloudysunsunpartly cloudyno least upper boundsunny or partly cloudy
    10/12/02cloudy, cloudy, snow, rain/snowcloudcloudrain/snowno least upper boundcloudy, snow or rain/snow
    11/12/02sunny, sunny, cloudy, mostly cloudysun or cloudsunmostly cloudyno least upper boundsunny, cloudy or mostly cloudy
    24/3/03sunny, sunny, p/cloudy, fairsunsunfairno least upper boundsunny, p/cloudy or fair
    26/3/03cloudy, cloudy, mostly cloudy, sunnycloudcloudsunnyno least upper boundcloudy, mostly cloudy or sunny
    27/3/03sunny, sunny, cloudy, mostly sunnysunsunmostly sunnyno least upper boundsunny, cloudy or mostly sunny

    *Weights assigned are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.
    +Order of preferences is (most preferred first): The Weather Channel, BBCi, BBC London, CNN.

    Many other voting functions could be defined. Table 2 summarizes the results of applying rules embodying eight voting functions for merging the textentry for today's weather. Informally, we may define these functions as follows:

  • Majority Voting
  • -- The most frequently occurring term is selected, provided that frequency is greater than 0.5.

  • Popular Sharing
  • -- The most frequently occurring term or terms are selected.

  • First Past the Post
  • -- The most frequently occurring term is selected. If two or more terms are tied for the greatest frequency, no term is selected.

  • Threshold Voting
  • -- The most frequently occurring term or terms are selected, provided that frequency is equal to or greater than a specified threshold.

    Each of these voting functions can be applied in one of two ways:

  • Simple Voting
  • -- The textentry from each report is assigned one vote, and the sum of votes for each (kind of) textentry is used to calculate the frequency with which it occurs.

  • Weighted Voting
  • -- The source of each report is assigned a numerical value between 0 and 1. The textentry from each report is then assigned this numerical value and the sum of these values is then used to calculate the frequency with which the textentry occurs.

    Combining the voting functions with the ways they can be applied yields eight aggregation functions for merging conflicting information. Sets of fusion rules embodying these aggregation functions may be found on the page concerned with aggregation by voting.

    Table 2: Results of eight voting functions for merging the textentry for today
    DateReport Textentries
    (BBCi, BBC LDN, CNN, Weather Channel)
    Voting Aggregation Function
    PopularSharingFirstPastThePostMajorityThreshold+
    Simple VotingWeighted Voting*Simple VotingWeighted Voting*Simple VotingWeighted Voting*Simple VotingWeighted Voting*
    3/12/02showers, showers, cloudy, cloudyshowers or cloudshowersno first past the postshowers no text entry has a majority showers no textentry has 75% of the votesno textentry has 75% of the votes
    4/12/02rain, rain, showers, showersrain or showersrainno first past the postrain no text entry has a majority rain no text entry has 75% of the votes no textentry has 75% of the votes
    5/12/02cloudy, cloudy, sunny, mostly cloudycloudcloudcloudcloud cloud cloud cloud cloud
    6/12/02cloudy, cloudy, cloudy, cloudycloudcloudcloudcloud cloud cloud cloud cloud
    9/12/02sunny, sunny, sunny, partly cloudysunsunsunsun sun sun sun sun
    10/12/02cloudy, cloudy, snow, rain/snowcloudcloudcloudcloud no text entry has a majority cloud no text entry has 75% of the votes no textentry has 75% of the votes
    11/12/02sunny, sunny, cloudy, mostly cloudysun or cloudsunno first past the postsun no text entry has a majority sun no text entry has 75% of the votes no textentry has 75% of the votes
    24/3/03sunny, sunny, p/cloudy, fairsunsunsunsun sun sun sun sun
    26/3/03cloudy, cloudy, mostly cloudy, sunnycloudcloudcloudcloud cloud cloud cloud cloud
    27/3/03sunny, sunny, cloudy, mostly sunnysunsunsunsun sun sun sun sun
    *Weights assigned are: BBCi 0.9, BBC London 0.9, CNN 0.5, The Weather Channel 0.75.
    +Threshold is 0.75.

    It should be added that the data in both tables is actual data obtained from the websites of the sources on the dates indicated. The merged textentries are the results the Fusion Rules did yield, not (just) what they ought to yield. Obviously, greater divergence in the results obtained could have been achieved by increasing the number of reports merged and changing the threshold required in the case of threshold voting. As a general comment, we may say that adopting some form of weighted voting procedure made it more likely that a single, determinate result was obtained, but, of course, different weighting assignments would have produced drastically different results.


    Contact a.hunter@cs.ucl.ac.uk or +44 20 7679 7295.

    Back to Fusion Rule Technology homepage.