gdptools example without the OPeNDAP catalog - accessing and processing Merra-2 data on NOAA GESDISC#
In this example, we demonstrate a use-case that involves accessing behind the EarthData, in this case [Merra-2 data] (https://disc.gsfc.nasa.gov/datasets?keywords=merra-2&page=1). This rather complicated use-case requires user input to login to EarthData. In addition, there is no aggregated OPeNDAP or THREDDS endpoint so the user must supply a list of OPeNDAP URLs to each dataset. The primary requirements of gdptools is a gridded dataset that can be opened in xarray and a user defined set of polygon spatial features.
Login to Earthdata#
# https://disc.gsfc.nasa.gov/information/howto?keywords=prerequisite&title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files
from netrc import netrc
from subprocess import Popen
from getpass import getpass
import platform
import os
import shutil
earth_creds = True
if earth_creds:
urs = 'urs.earthdata.nasa.gov' # Earthdata URL to call for authentication
prompts = ['Enter NASA Earthdata Login Username \n(or create an account at urs.earthdata.nasa.gov): ',
'Enter NASA Earthdata Login Password: ']
homeDir = os.path.expanduser("~") + os.sep
with open(homeDir + '.netrc', 'w') as file:
file.write('machine {} login {} password {}'.format(urs, getpass(prompt=prompts[0]), getpass(prompt=prompts[1])))
file.close()
with open(homeDir + '.urs_cookies', 'w') as file:
file.write('')
file.close()
with open(homeDir + '.dodsrc', 'w') as file:
file.write('HTTP.COOKIEJAR={}.urs_cookies\n'.format(homeDir))
file.write('HTTP.NETRC={}.netrc'.format(homeDir))
file.close()
print('Saved .netrc, .urs_cookies, and .dodsrc to:', homeDir)
# Set appropriate permissions for Linux/macOS
if platform.system() != "Windows":
Popen('chmod og-rw ~/.netrc', shell=True)
else:
# Copy dodsrc to working directory in Windows
shutil.copy2(homeDir + '.dodsrc', os.getcwd())
print('Copied .dodsrc to:', os.getcwd())
!chmod 0600 ~/.netrc
Per guidance here: https://disc.gsfc.nasa.gov/information/howto?keywords=prerequisite&title=How to Generate Earthdata Prerequisite Files
run chmod 0600 .netrc
before continuing.
import warnings
import xarray as xr
import pandas as pd
import numpy as np
import pyproj
from pynhd import NLDI
from pynhd import WaterData
import hvplot.xarray
import hvplot.pandas
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import UserCatData
warnings.filterwarnings("ignore")
Load Merra-2 Monthly data#
There is no aggregated OPeNDAP endpoint so we load the data via teh xarray.open_mfdataset() function. The following code sets up a list of formated URLs.
y_time_series = pd.date_range(pd.to_datetime("1980"), periods=1, freq="1Y")
f_time_series = pd.date_range(pd.to_datetime("1980-01-01"), periods=12, freq="MS")
year = [t.strftime("%Y") for t in y_time_series]
month = [t.strftime("%m") for t in f_time_series]
urllist = []
for y in year:
for m in month:
urllist.append(f"https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2_MONTHLY/M2TMNXRAD.5.12.4/{y}/MERRA2_100.tavgM_2d_rad_Nx.{y+m}.nc4")
urllist
data = xr.open_mfdataset(urllist)
data
Create GeoDataFrame of Delaware River Basin HUC12s#
In this cell we use clients to several https://labs.waterdata.ugsgs.gov tools:
Tool: USGS NLDI Client: pynhd.NLDI
Tool: USGS GeoServer Client: pynhd.WaterData
More examples of using the HyRiver pynhd package can be found here
# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("huc12").bygeom(del_basins.geometry[0])
from holoviews.element.tiles import EsriTerrain
drb = huc12_basins.hvplot(
geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb
Calculate weights#
To determine an area-weighted statistic for each HUC12 polygon, the area of all grid cells (a Shapely Polygon object representing each cell of the grid).
The weight of each cell contributing to a HUC12 polygon is the cell-intersected-area / HUC12 polygon-area.
Not all all data readable by xarray are made the same. Therefor, when using non-OPeNDAP catalog endpoints, the user must supply some metadata about the xarray data and GeoDataFrame data.
# metadata for calculating weights
data_crs = 4326
x_coord = "lon"
y_coord = "lat"
t_coord = "time"
sdate = "1980-01-01"
edate = "1980-12-01"
var = ["SWGDN"]
shp_crs = 4326
shp_poly_idx = "huc12"
wght_gen_crs = 6931
user_data = UserCatData(
ds=data,
proj_ds=data_crs,
x_coord=x_coord,
y_coord=y_coord,
t_coord=t_coord,
var=var,
f_feature=huc12_basins,
proj_feature=shp_crs,
id_feature=shp_poly_idx,
period=[sdate, edate]
)
wght_gen = WeightGen(
user_data=user_data,
method="serial",
output_file='example_weights.csv',
weight_gen_crs=6931,
)
wdf = wght_gen.calculate_weights()
Run the grid-to-polygon area weighted average interpolation using the calculated weights.#
Future releases will add additional area-weighted statistics. Currently only the mean is returned.
If more that one variable were available in the dataset then the run_weights function could be embedded in a loop that indexes each variable.
run_weights Output#
Run weighs returns:
ngdf is the GeoDataFrame passed to run_weights. huc12_basins maybe reordered when running weights so the re-ordered GeoDataFrame is returned.
vals are the area-weighted interpolated values ordered by the ngdf.index
agg_gen = AggGen(
user_data=user_data,
stat_method="masked_mean",
agg_engine="serial",
agg_writer="netcdf",
weights='example_weights.csv',
out_path='.',
file_prefix="1980_merra2_swgdn"
)
ngdf, ds_out = agg_gen.calculate_agg()
ngdf
ds_out
Inspect format of netcdf output.#
xr.open_dataset('1980_merra2_swgdn.nc')
Plot the gridded data and interpolated data#
aggdata = agg_gen.agg_data
ds_ss = aggdata.get("SWGDN").da
ds_ss
timestep = "1980-08-01T00:30:00.000000000"
tvar = "SWGDN"
min = np.nanmin(ds_ss.sel(time=timestep).values)
max = np.nanmax(ds_ss.sel(time=timestep).values)
print(tvar, min, max)
print(type(tvar))
ngdf["SWGDN"] = ds_out[tvar].sel(time=timestep)
gridmet_agg = ngdf.to_crs(4326).hvplot(
c=tvar, geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(min, max),
xlabel="longitude", ylabel="latitude", clabel="surface_incoming_shortwave_flux (W m-2)", xlim=(-76.8, -73.8),
title="HUC12 area-weighted average", aspect='equal'
)
gridmet_raw = ds_ss.sel(time=timestep).hvplot.quadmesh(
'lon', 'lat', tvar, cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(min, max),
xlabel="longitude", ylabel="latitude", clabel="surface_incoming_shortwave_flux (W m-2)", xlim=(-76.8, -73.8),
title="Merra-2 Monthy average shortwave flux", aspect='equal'
)
(EsriTerrain() * gridmet_raw) + (EsriTerrain() * gridmet_agg)