Contingency Table
     Forecast
    yes no  Total
Observed yes hits misses observed yes
  no false alarms correct negatives observed no
Total   forecast yes forecast no total

The contingency table is a useful way to see what sorts of errors are being made. A perfect forecast system would produce only hits and correct negatives, and no misses or false alarms.

A large variety of categorical statistics are computed from the elements in the contingency table to describe particular aspects of forecast performance.

Some of the categorical statistics that can be computed from the yes/no contingency table are:

- - - - - - - - - - -

Bias score

Measures the ratio of the frequency of forecast events to the frequency of observed events.

Range: 0 to infinity.  Perfect score: 1.

Characteristics: Indicates whether the forecast system has a tendency to underforecast (BIAS<1) or overforecast (BIAS>1) events. Does not measure how well the forecast corresponds to the observations, only measures relative frequencies.

- - - - - - - - - - -
Probability of detection

Measures the fraction of observed events that were correctly forecast.

Range: 0 to 1.  Perfect score: 1.

Characteristics: Sensitive to hits, good for rare events. Ignores false alarms. Can be artificially improved by issuing more "yes" forecasts to increase the number of hits.

- - - - - - - - - - -
False alarm ratio

Measures the fraction of "yes" forecasts in which the event did not occur.

Range: 0 to 1.  Perfect score: 0.

Characteristics: Sensitive to false alarms. Ignores misses. Can be artificially improved by issuing more "no" forecasts to reduce the number of false alarms.

- - - - - - - - - - -
Threat score (critical success index)

Measures the fraction of observed and/or forecast events that were correctly forecast.

Range: 0 to 1, 0 indicates no skill. Perfect score: 1.

Characteristics: Sensitive to hits, penalizes both misses and false alarms. Does not distinguish source of forecast error. Depends on climatological frequency of events (worse scores for rarer events) since some hits can occur purely due to random chance.

- - - - - - - - - - -
Equitable threat score where 

Measures the fraction of observed and/or forecast events that were correctly , adjusted for hits associated with random chance. This score is used in the verification of rainfall in numerical weather prediction models because its "equitability" allows scores to be compared more fairly across different regimes.

Range: -1/3 to 1, 0 indicates no skill.   Perfect score: 1.

Characteristics: Sensitive to hits, penalizes both misses and false alarms, accounts for climatological event frequency. Does not distinguish source of forecast error.

- - - - - - - - - - -
Hanssen and Kuipers discriminant (true skill statistic)

Measures the ability of the forecast to separate the "yes" cases from the "no" cases. Can also be interpreted as Accuracy(events) + Accuracy(non-events) - 1.

Range: -1 to 1, 0 indicates no skill. Perfect score: 1.

Characteristics: Uses all elements in contingency table. Does not depend on climatological event frequency. For rare events HK is unduly weighted toward the first term (same as POD). Can be expressed in a form similar to the ETS except the hitsrandom term is unbiased.

- - - - - - - - - - -
Heidke skill score

Measures the fraction of correct forecasts after eliminating those forecasts which would be correct due purely to random chance. This is an (algebraically simplified) form of a generalized skill score, where the score in the numerator is the number of correct forecasts, and the reference forecast in this case is random chance.

Range: minus infinity to 1, 0 indicates no skill.  Perfect score: 1.

Characteristics: Measures improvement over random chance. Random chance is usually not the best forecast to compare to - it may be better to use climatology (long-term average value) or persistence (forecast = most recent observation, i.e., no change) or some other standard.

- - - - - - - - - - -