Revealing hidden relationships in data

A method to visualize hidden statistical structure helps make sense of environmental data.

5247_1.jpg

Huang Huang (left) and Ying Sun have developed a method for visualizing the spatio-temporal covariance properties of a dataset, which will help make sense of environmental data.

Prediction of climate and weather relies on statistical models that can capture variability at one location over time as well as the relationship with other geographical locations. Sometimes future conditions at one location can be predicted from the current conditions at another location, while in other cases there may be no such correlation. The assumption of whether two sites are ‘covariant’ in one way or another can have profound implications for the accuracy of the statistical model, and so the choice of space-time covariance is crucial.

Ying Sun and her student Huang Huang from KAUST have now developed a method for visualizing the spatio-temporal covariance properties of a dataset, greatly simplifying an important modeling step that previously demanded painstaking exploratory data analysis.

“We propose an easy and convenient way to visualize the properties of the covariance structure in the data, which will help practitioners choose appropriate statistical models for covariances,” says Sun. “In particular, this method is useful for data that are observed sparse in space and dense in time, which is often the case for weather station observations for example.”

Sun and Huang considered two key types of covariance—symmetry and separability. Symmetry implies that the spatial-temporal processes are reversible in time, while separability indicates that the correlation in time does not interact with that in space.

“Assuming a fully symmetric or a separable covariance leads to a much simpler model and thus fast computations,” says Sun. “However, this model assumption may be violated in many real applications, leading to less accurate estimation and prediction.”

Huang and Sun used a functional data analysis approach to construct test functions from the covariances in time series data between location pairs. These test functions effectively summarize the properties of separability or symmetry and can be displayed as boxplots that show the degree of non-separability or asymmetry.

“We applied this approach to meteorological observations and simulated weather data from some commonly used climate models,” says Huang. “In the reported examples for a study area in the North Atlantic Ocean, this method showed that wind speed and surface temperature have different covariance structures in different seasons.”

The visualization can be computed relatively quickly for a handful of monitoring stations, and the researchers note that the computational efficiency can be improved for larger numbers of stations by dividing the problem into sub-regions. Nevertheless, the method provides a valuable tool that will greatly assist practitioners.