What does "hedging" a forecast mean, and how do some scores encourage hedging?

Laurie Wilson, Research en Prevision Numerique, Dorval, Quebec, Canada
Beth Ebert, Bureau of Meteorology Research Centre, Melbourne, Australia

The Oxford English dictionary defines "to hedge" as, "To avoid a definite decision or commitment" or "to reduce one’s risk of loss on a bet or speculation by compensating transactions on the other side". In more general terms, to "hedge one’s bets" is "to protect oneself against loss or error by supporting more than one side in a contest, an argument etc". In the context of forecasting , hedging would mean the avoidance of a definite (specific or categorical) forecast, opting instead for a probabilistic forecast. The forecaster, instead of "putting all his eggs in one basket" and forecasting a single outcome with 100% probability, hedges his bets by assigning non-zero probability to more than one possible outcome.

In terms of forecast verification, probability forecasts which avoid the extremes of the probability range often receive more favorable scores, especially with quadratic scoring rules such as the root mean square error or Brier Score. This is because an extreme forecast is heavily penalized if it is incorrect. Thus a forecaster can minimize his risk of obtaining an unfavorable score on a particular forecast by hedging. At the same time, he lowers the maximum positive score he can obtain if he is correct.

In terms of forecast attributes as defined by Murphy (1993), hedged forecasts lack sharpness; they are smooth; probabilities are assigned over a broader range of possible outcomes. In the context of spatial forecasts, hedging means a tendency to underforecast the intensity of systems or to forecast the location of sharply defined features such as fronts with less precision.

"Hedging" has been given a somewhat different, but consistent interpretation by Murphy and Epstein (1967) and Murphy (1978). These authors considered only probabilistic forecasts, and defined hedging by the statement, "Hedging is said to occur whenever a forecaster’s judgment and forecast differ". For example, if the forecaster truly believes the probability of an event is 50%, but issues a forecast probability of 20%, motivated, for example, by a desire to improve his verification score, then he is said to have hedged his forecast.

Apparent inconsistencies between this definition and the above arise because of differences in restrictions placed on forecasts vis-à-vis hedging. A categorical forecast may be required of a forecaster (he must make a decision), and he is not given the opportunity to use probabilities other than the categorical values of 0 and 100%. In such cases, given the known uncertainty of future weather, a categorical forecast can be almost always viewed as a hedged forecast in the sense that it does not correspond with the forecaster’s true belief about the likelihood of the chosen outcome. At the same time, hedging by avoiding forecasts of extreme probabilities, when this option is available, has been demonstrated many times to generally improve verification scores, as stated above. However, this arises because the forecaster is then allowed to forecast according to his true belief, not because he has been able to "play the score".

Some verification scores may actually encourage forecasters to "play the score". Three screening criteria can be used to judge whether performance measures encourage hedging (Murphy, 1996):

Equitability - A score is "equitable" if it gives the same score for two types of unskilled forecasts: random chance, and constant forecasts of the same category. In other words, forecasts of random chance, "always yes", "always no", "always category 3", etc. should produce the same (bad) score.

Consistency - In terms of performance measures, a score is "consistent" when it is the appropriate score for judging the quality of a deterministic or single-category forecast. If in reality the forecaster's judgement is represented by a probability distribution, but he/she must issue a single forecast according to some decision rule (for example, "forecast the mean of the distribution", or "forecast yes when the probability exceeds a certain threshold"), a consistent score gives an optimum value when the decision rule is followed (Mason, 1979, 2003). In the case of binary (yes/no) forecasts, the probability threshold that gives a consistent score depends on the score being used.

Propriety - A score is "proper" when the score is optimized (has a maximum or minimum value, whichever is appropriate) if the forecast corresponds to the best judgement of the forecaster. Propriety is a special case of consistency, and applies only to probability forecasts. If there is only one unique maxima, then the score is "strictly proper" (For further mathematical development on strictly proper scoring, see Murphy and Epstein (1967), Wilks (1995, p.267), or Harold Brooks' lecture notes.)

Several binary verification scores are listed below, along with their equitability and optimal threshold probability (Mason, 1979, 2003). In the expressions for optimal threshold probability, s refers to the base rate (also known as the sample climatology, or marginal probability of the observed event, equal to (hits + misses) / N), and N is the total number of forecasts made. In some cases the optimal decision threshold depends on the value of the score itself, which would make it difficult to specify a decision threshold in advance.
 
Score Equitable?
Optimal threshold probability
Accuracy (fraction correct), ACC
No
0.5
Probability of detection (hit rate), POD
No
0.0
False alarm ratio, FAR
No
1.0
Probability of false detection (false alarm rate), POFD
No
1.0
Threat score (critical success index), TS
No
TS / (1+TS)
Equitable threat score (Gilbert skill score), ETS
Yes
[s + (1-s)ETS] / (1+ETS)
Hanssen and Kuipers discriminant (true skill statistic, Peirces's skill score), HK
Yes
(Ns+1) / (N+2)
Heidke skill score, HSS
Yes
s + (1-2s)*(HSS/2)
Odds ratio skill score (Yule's Q), ORSS
Yes
s / [s + (1-s)*OR*POFD2/POD2]

References:

Mason, I.B., 1979: On reducing probability forecsts to yes/no forecasts. Mon. Wea. Rev., 107, 207-211.
Mason, I.B., 2003: Binary events. In Forecast Verification. A Practioner's Guide in Atmospheric Science (eds. I.T. Jolliffe and D.B. Stephenson). Wiley and Sons Ltd, 37-76.
Murphy, A.H., 1978: Hedging and the mode of expression of weather forecasts. Bull. Amer. Met. Soc., 59, 371-373.
Murphy, A.H., 1993: What is a good forecast? An essay on the nature of goodness in weather forecasting. Wea. Forecasting, 8, 281-293.
Murphy, A.H., 1996: The Finley affair: A signal event in the history of forecast verification. Wea. Forecasting, 11, 3-20.
Murphy, A.H. and E.S. Epstein, 1967: A note on probability forecasts and "hedging". J. Appl. Meteor. 6, 1002-1004.
Wilks, D.S., 1995: Statistical Methods in the Atmospheric Sciences. An Introduction.  Academic Press, San Diego, 467 pp.