Forecasting tournament

Let the games begin

Do you think you are a good forecaster? A bad one? There is only one way to find out: a competition! We are going to try to forecast forage production in the annual grasslands of California based on weather covariates. The idea is pretty simple:

Each group downloads:
- The forage production response data for years 1936-2008.
- Monthly weather covariates that span the training data set and then continue for the 2009-2018 period.
Fit whatever kind of model you want, using any method you want, with the 1936-2008 training data. You CANNOT use covariates from any other source.
Once you have finalized your model, you will generate a forecast for years 2009-2018 (using the covariates for those years). Since we are hindcasting, the covariates are known, and we can ignore uncertainty in the covariates which would be present if we were relying on climate projections.
Each group will also prepare a 5 minute presentation for the rest of the class describing their model, and the choices they made to arrive at that model.
Submit your forecast by emailing Peter your R script and a .csv file attachment containing your predictions (see formatting instructions below).
I will evaluate forecast accuracy by comparing your predictions to the observed 2009-2018 values using mean absolute error (MAE).
The winning team will bask in eternal glory, and a cheap candy prize. An additional prize will go to the team making the most accurate forecast that also quantifies forecast uncertainty.

Formatting

To make it easy for me to calculate the accuracy of your forecast, please follow these formatting guidelines. Create a data frame with two to four columns with the following names: Year, Forecast, LowerCI, UpperCI. The Year column should contain the integers 2009 to 2018, in order. The “Forecast” column contains your point forecasts for each year. The point forecasts should be in the original units! If you fit on a transformed scale, please back transform.

The last two (CI) columns are optional. If you do report confidence intervals, please calculate the 95% intervals. Again, these should be on the same scale as the observations.

Write your data frame to a .csv file using the following line of code, substituting in the name of your data frame and the filename you want to use (your group name?):

write.csv(your_data_frame, your_file.csv, header=T)

Email the .csv file to Peter as an attachment.

Background

As we’ve discussed, mechanistic knowledge can improve forecasts. The forage data come from the San Joaquin Experimental Range, in the Sierra Nevada foothills of California. Here is the citation for the data, which USDA NRCS and University of California Extension have generously made public for our use:

Dennis Dudley, USDA NRCS Rangeland Specialist, Madera County; Neil McDougald, UCCE Livestock, Range, and Natural Resources Advisor Emeritus, Madera County

This is an annual grassland, so measuring aboveground annual production (forage production) is straightforward: biomass is clipped to ground level, dried, and weighed. This is usually done in June at the end of the spring growing season. For more information about the factors that determine productivity in these grasslands, see this report.

I downloaded the weather data from the PRISM Explorer. You can find some metadata here. “ppt” refers to precipitation, “t” to temperature, and “vpd” to vapor pressure deficit.