Homework 0
Visit The Software Web Page And Look At The Available R Packages For Spatial Analysis. Visit the page on the CRAN Task View for the analysis of spatial data. Download GeoR; Fields And SpBayes And Become Familiar With Them.
Homework 1
(due 4/17/19)
- Prove the conditions for the existence of a Gaussian process
- Consider an isotropic correlation function. Consider a transformation that produces geometric anisotropy. Prove that the resulting correlation function is positive definite.
- Plot all the covariograms and variograms in the tables of the second set of slides. Take the variance to be 1, and take the range parameter to be such that the correlation is .05 at a distance of one unit
- Assume that the correlation functions in the previous point correspond to one dimensional Gaussian processes. Simulate one 100-points realization of the process corresponding to each of the plotted functions.
- Write explicitly the correlation function of a Matern with nu=1/2,3/2 and 5/2.
Homework 2
(due 4/26/19)
- Prove the results about the smoothness of the members of the Matèrn family
- Use the spectral representation to show that the product of two valid correlation functions is a valid correlation function.
- The spectral density of a correlation in the Matèrn family has tails whose thickness depends on the smoothness parameter. Conjecture: the smoothness of the corresponding random field depends on the number of moments of the spectral density. What can you say about this conjecture?
- Use the K-L representation to approximate the exponential correlation for range parameter equal to 1. Plot the approximation for several orders and compare to the actual correlation.
- Repeat for the approximation given on Page 13 of the fifth set of slides.
- Generate 100 realizations of a univariate Gaussian process with exponential correlation with range parameter 1. Compare the empirically estimated eigenvalues and eigenfunctions to the ones given by the K-L and the approximation on Page 12.
Homework 3
(due 5/08/19)
Albedo Data
To do this problem you need to: (a) coordinate your efforts with the other students; (b) download and process data using a package to process NetCDF files; (c) perform the analysis. The goal is to study a set of data on albedo in the Americas. There are two gzip files, one for each of two satellites and corresponding to July 1, 2000. The file with the label 075 corresponds to one of the two GOES satellites, the file with the label 135 to the other. These are placed at different angles with respect to the surface that they are recording. Notice that, because of this, they do not cover the same areas, but there is substantial overlap. The files are very large, they take 156 Mb. You need the package ncdf4 in R to read those files, and use "nc_open" to read them into R. Then gather the variable "BHRiso" using the function "ncvar_get" to obtain the albedo measurements. The variables "longitude" and "latitude" are also available.
- Start by choosing an area in the Americas consiting of land, no ocean, that covers at least 3 by 3 degress. Each one of you needs to come up with an area, but for each area there needs to be two people, one looking at the GOES 075 data and the other looking at the GOES 135 data. In that fashion we will have each of the areas analysed with data from the two satellites. Please make sure that the whole of the Americas is covered by the class. The coordination is up to you. GOES 075; GOES 135
- You are dealing with a ton of data, so be clever when operating with it.
- Perform a graphical exploration of the data. Is there evidence of a first or second order trend function of location? Is there evidence that a transformation is needed in order to make the data closer to normality?
- Obtain the residuals after fitting the trend function resulting from the previous question, if any. Plot the variogram. Explore possible anisotropies using a directional variogram.
- Use least squares to fit the covariograms in the Matèrn family with smoothness equal to .5; 1; 1.5; 2.5. Plot the results. Use the plots and the values of the LSE to select the best fit.
- Plot the likelihood function for the sill and the range corresponding to each of the correlations in the previous point. If a nugget is needed, you can plug an estimated value.
- Plot the marginal likelihood for the range parameter for each of the examples above.
Homework 4
(due 5/15/19)
- To perform this part you need to retrieve the variable "Probability Threshold" from the NetCDF file. This corresponds to an idex that indicates the quality of the retrieval of the information. The idenx has ten possible values that corrrespond to probabilities: 0.9500 0.9000 0.8000 0.7000 0.6000 0.5000 0.4000 0.3000 0.2000 0.1000.
- Obtain new albedo values multiplying the observations by their corresponding probabilities. You now have two variables: the recorded albedo and the weighted albedo.
- Perform a full Bayesian analysis that uses Gaussian random fields to obtain an estimated alebdo surface for the whole area that your data correspond to. Do the analysis for both cases, the retrieved albedo and the weighted albedo, and compare.
- You can use any priors you think are sensible, but you do score points by using the reference prior for the range parameter.
- You need to convince the reader of the report that your model does a reasonable job at fitting the data, so perform a goodness of fit analysis.
- Write a report no longer than 8 pages, including abstract, conclusions, references, figures and tables.
Homework 5
(due 5/29/19)
- To perform this part you need to divide the area that corresponds to North America (Mexico, US and Canada) into three non-overlapping blocks of about the same size. Choose one block. Please make sure all three blocks are considered by someone in the class.
- Obtain the albedo values corresponding to your block for both satellites. Obtain a random sample of the land locations of size 1,000.
- Fit a model using predictive GP to the data from both satellites. Provide predicted values for the whole area. Repeat using modified predictive processes. Perform a model assessment in both cases and make comments about the differences between the two methods.
Using the model that provides the best fit to the data, obtain 100 samples from the predictive distribution for each satellite and use them to compute the mean difference. Perform a descriptive analysis of the mean difference, then fit a model that provide predictions over the whole area. Use your results to obtain a unified field of albedo that uses the information from both satellites.
Homework 6
(due 6/12/19)
Consider the same data as before. Let Y_1(s) be the albedo from one satellite and Y_2(s) the albedo from the other. Consider the model y_1(s) = x(s)'b_1 + m(s) + e_1(s), and y_2(s) = x(s)'b_2 + m(s) + d(s) + e_2(s), where x(s) is a set of covariates, m(s) corresponds to the true common albedo, and d(s) to the discrepancy between the two satellites. Write it in vector form as Y(s) = X(s) B + L M(s) + E(s) for an appropriate lower triangular matrix L, and M(s) = (m(s), d(s)). Explore and discuss the need to have different coefficients b_1 and b_2 for the two satellites.
- Use process convolutions with spherical Bezier kernels and MRF priors to fit m(s) and d(s). Perform an assessment of the goodness of fit.
- Obtain predicive surface and quantify their uncertainty for m(s) and d(s) over the whole area.
- Obtain the results from some of the other students that have worked on the other two blocks in the North American region and produce a plot for the whole area, for m(s) and d(s).