| Forecast | ||||
| yes | no | Total | ||
| Observed | yes | hits | misses | observed yes |
| no | false alarms | correct negatives | observed no | |
| Total | forecast yes | forecast no | total |
The contingency table is a useful way to see what sorts of errors are being made. A perfect forecast system would produce only hits and correct negatives, and no misses or false alarms.
A large variety of categorical statistics
are computed from the elements in the contingency table to describe particular
aspects of forecast performance.
Some of the categorical statistics that can be computed
from the yes/no contingency table are:
Bias score - Measures the ratio of the frequency of
forecast events to the frequency of observed events.
Range: 0 to infinity. Perfect
score: 1.
Characteristics: Indicates whether
the forecast system has a tendency to underforecast (BIAS<1)
or overforecast (BIAS>1) events. Does not measure how well the forecast
corresponds to the observations, only measures relative frequencies.
- - - - - - - - - - -
Measures the fraction of observed events
that were correctly forecast.
Range: 0 to 1. Perfect
score: 1.
Characteristics: Sensitive to hits,
good for rare events. Ignores false alarms. Can be artificially improved
by issuing more "yes" forecasts to increase the number of hits.
- - - - - - - - - - -
Measures the fraction of "yes" forecasts
in which the event did not occur.
Range: 0 to 1. Perfect
score: 0.
Characteristics: Sensitive to false
alarms. Ignores misses. Can be artificially improved by issuing more "no"
forecasts to reduce the number of false alarms.
- - - - - - - - - - -
Measures the fraction of observed and/or
forecast events that were correctly forecast.
Range: 0 to 1, 0 indicates no skill.
Perfect
score: 1.
Characteristics: Sensitive to hits,
penalizes both misses and false alarms. Does not distinguish source of
forecast error. Depends on climatological frequency of events (worse scores
for rarer events) since some hits can occur purely due to random chance.
- - - - - - - - - - -
Measures the fraction of observed and/or
forecast events that were correctly , adjusted for hits associated with
random chance. This score is used in the verification of rainfall in numerical
weather prediction models because its "equitability" allows scores to be
compared more fairly across different regimes.
Range: -1/3 to 1, 0 indicates no
skill. Perfect score: 1.
Characteristics: Sensitive to hits,
penalizes both misses and false alarms, accounts for climatological event
frequency. Does not distinguish source of forecast error.
- - - - - - - - - - -
Measures the ability of the forecast to
separate the "yes" cases from the "no" cases. Can also be interpreted as
Accuracy(events)
+ Accuracy(non-events) - 1.
Range: -1 to 1, 0 indicates no skill.
Perfect
score: 1.
Characteristics: Uses all elements
in contingency table. Does not depend on climatological event frequency.
For rare events HK is unduly weighted toward the first term (same
as POD). Can be expressed in a form similar to the ETS except
the hitsrandom term is unbiased.
- - - - - - - - - - -
Measures the fraction of correct forecasts
after eliminating those forecasts which would be correct due purely to
random chance. This is an (algebraically simplified) form of a generalized
skill score, where the score in the numerator is the
number of correct forecasts, and the reference forecast in this case is
random chance.
Range: minus infinity to 1, 0 indicates
no skill. Perfect score: 1.
Characteristics: Measures improvement
over random chance. Random chance is usually not the best forecast to compare
to - it may be better to use climatology (long-term average value) or persistence
(forecast = most recent observation, i.e., no change) or some other standard.
- - - - - - - - - - -
Probability of detection
-
False alarm ratio -
Threat score (critical success index)
-
Equitable threat score -
where
Hanssen and Kuipers discriminant
(true skill statistic) -
Heidke skill score -