GEOS-CF Atmospheric Composition Forecast

This data is harvested daily from the Goddard Earth Observing System composition forecast (GEOS-CF). We are obtaining predictions for the next five days (at hourly intervals) for three air-quality parameters: carbon monoxide (CO), nitrogen dioxide (NO2), and fine particulate matter (PM2.5).

This dataset is still under active development and should be considered to be in "beta".


This dataset was extracted from NASA's GEOS-CF as a collaboration between NASA's Goddard Space Flight Center and the Western Pennsylvania Regional Data Center, to provide easy access to 5-day air-quality forecasts for Allegheny County.

An excerpt from the motivation for the source dataset: "The five-day predictions are intended as research datasets, which have numerous applications: to study the science of composition forecasting, to help guide NASA’s field campaigns, and to examine global predictions of surface air quality in context of emissions and transport."


NASA has papers that document how they model the atmosphere and calculate the base forecasts.


After extracting the GEOS-CF forecast values for locations inside Allegheny County, we calculate and add a column for the Air Quality Index (AQI) value that corresponds to each measurement. These AQI values make the air-quality parameters easier to interpret. This document from the EPA describes the Air Quality Index and how to calculate it.

Each record is a prediction for a particular air quality parameter, for a particular day and hour (within five days after the prediction), and for a particular cell in a 25-km grid overlaying Allegheny County. For some locations, we've also added the name of a place (e.g., "Pittsburgh" or "Penn Hills") within that cell, to make look-ups easier.

Known Uses

Recommended Uses

This data could provide an early-warning system for certain kinds of unhealthy air-quality events, such as dangerously high PM2.5 levels from wildfire-induced smog.

Known Limitations/Biases

The GEOS-CF forecast predicts air quality at ground level and does not account for significant variations in altitude. It can not predict incidents resulting from, for instance, emission of gases from industrial facilities, before they happen. It is pretty good at modelling the movement of existing pollutants through the atmosphere. Its spatial resolution (currently 25 km) is another limitation; it can only provide an average over a large area and isn't sensitive to smaller-scale variations. More detailed descriptions of the limitations are provided in this paper.

Also, sometimes the model results can not be computed on the expected schedule. (These delays are reported on the "geos-cf-users" mailing list.) In these instances, our automated processes fall back to the previous day's forecasts; the date_of_prediction field provides the date that the forecast was made.

Further Documentation and Resources

All of the air quality parameters forecast in this dataset are measured by the Allegheny County Health Department, and those measurements (as well as many others) are published in the Allegheny County Air Quality dataset.

