gdptools example without the OPeNDAP catalog - accessing and processing Merra-2 data on NOAA GESDISC

gdptools example without the OPeNDAP catalog - accessing and processing Merra-2 data on NOAA GESDISC#

In this example, we demonstrate a use-case that involves accessing behind the EarthData, in this case [Merra-2 data] (https://disc.gsfc.nasa.gov/datasets?keywords=merra-2&page=1). This rather complicated use-case requires user input to login to EarthData. In addition, there is no aggregated OPeNDAP or THREDDS endpoint so the user must supply a list of OPeNDAP URLs to each dataset. The primary requirements of gdptools is a gridded dataset that can be opened in xarray and a user defined set of polygon spatial features.

Login to Earthdata#

# https://disc.gsfc.nasa.gov/information/howto?keywords=prerequisite&title=How%20to%20Generate%20Earthdata%20Prerequisite%20Files

from netrc import netrc
from subprocess import Popen
from getpass import getpass
import platform
import os
import shutil

earth_creds = True
if earth_creds:
    urs = 'urs.earthdata.nasa.gov'    # Earthdata URL to call for authentication
    prompts = ['Enter NASA Earthdata Login Username \n(or create an account at urs.earthdata.nasa.gov): ',
               'Enter NASA Earthdata Login Password: ']

    homeDir = os.path.expanduser("~") + os.sep

    with open(homeDir + '.netrc', 'w') as file:
        file.write('machine {} login {} password {}'.format(urs, getpass(prompt=prompts[0]), getpass(prompt=prompts[1])))
        file.close()
    with open(homeDir + '.urs_cookies', 'w') as file:
        file.write('')
        file.close()
    with open(homeDir + '.dodsrc', 'w') as file:
        file.write('HTTP.COOKIEJAR={}.urs_cookies\n'.format(homeDir))
        file.write('HTTP.NETRC={}.netrc'.format(homeDir))
        file.close()

    print('Saved .netrc, .urs_cookies, and .dodsrc to:', homeDir)

    # Set appropriate permissions for Linux/macOS
    if platform.system() != "Windows":
        Popen('chmod og-rw ~/.netrc', shell=True)
    else:
        # Copy dodsrc to working directory in Windows
        shutil.copy2(homeDir + '.dodsrc', os.getcwd())
        print('Copied .dodsrc to:', os.getcwd())

!chmod 0600 ~/.netrc

Per guidance here: https://disc.gsfc.nasa.gov/information/howto?keywords=prerequisite&title=How to Generate Earthdata Prerequisite Files run chmod 0600 .netrc before continuing.

import warnings

import xarray as xr
import pandas as pd
import numpy as np
import pyproj
from pynhd import NLDI
from pynhd import WaterData
import hvplot.xarray
import hvplot.pandas

from gdptools import WeightGen
from gdptools import AggGen
from gdptools import UserCatData
warnings.filterwarnings("ignore")

Load Merra-2 Monthly data#

There is no aggregated OPeNDAP endpoint so we load the data via teh xarray.open_mfdataset() function. The following code sets up a list of formated URLs.

y_time_series = pd.date_range(pd.to_datetime("1980"), periods=1, freq="1Y")
f_time_series = pd.date_range(pd.to_datetime("1980-01-01"), periods=12, freq="MS")
year = [t.strftime("%Y") for t in y_time_series]
month = [t.strftime("%m") for t in f_time_series]
urllist = []
for y in year:
    for m in month:
        urllist.append(f"https://goldsmr4.gesdisc.eosdis.nasa.gov/opendap/MERRA2_MONTHLY/M2TMNXRAD.5.12.4/{y}/MERRA2_100.tavgM_2d_rad_Nx.{y+m}.nc4")
urllist

data = xr.open_mfdataset(urllist)
data

Create GeoDataFrame of Delaware River Basin HUC12s#

In this cell we use clients to several https://labs.waterdata.ugsgs.gov tools:

Tool: USGS NLDI Client: pynhd.NLDI
Tool: USGS GeoServer Client: pynhd.WaterData

More examples of using the HyRiver pynhd package can be found here

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("huc12").bygeom(del_basins.geometry[0])

from holoviews.element.tiles import EsriTerrain
drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb

Calculate weights#

To determine an area-weighted statistic for each HUC12 polygon, the area of all grid cells (a Shapely Polygon object representing each cell of the grid).
The weight of each cell contributing to a HUC12 polygon is the cell-intersected-area / HUC12 polygon-area.
Not all all data readable by xarray are made the same. Therefor, when using non-OPeNDAP catalog endpoints, the user must supply some metadata about the xarray data and GeoDataFrame data.

# metadata for calculating weights

data_crs = 4326
x_coord = "lon"
y_coord = "lat"
t_coord = "time"
sdate = "1980-01-01"
edate = "1980-12-01"
var = ["SWGDN"]
shp_crs = 4326
shp_poly_idx = "huc12"
wght_gen_crs = 6931

user_data = UserCatData(
    ds=data,
    proj_ds=data_crs,
    x_coord=x_coord,
    y_coord=y_coord,
    t_coord=t_coord,
    var=var,
    f_feature=huc12_basins,
    proj_feature=shp_crs,
    id_feature=shp_poly_idx,
    period=[sdate, edate]
)

wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    output_file='example_weights.csv',
    weight_gen_crs=6931,
)

wdf = wght_gen.calculate_weights()

Run the grid-to-polygon area weighted average interpolation using the calculated weights.#

Future releases will add additional area-weighted statistics. Currently only the mean is returned.
If more that one variable were available in the dataset then the run_weights function could be embedded in a loop that indexes each variable.

run_weights Output#

Run weighs returns:

ngdf is the GeoDataFrame passed to run_weights. huc12_basins maybe reordered when running weights so the re-ordered GeoDataFrame is returned.
vals are the area-weighted interpolated values ordered by the ngdf.index

agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="netcdf",
    weights='example_weights.csv',
    out_path='.',
    file_prefix="1980_merra2_swgdn"
)
ngdf, ds_out = agg_gen.calculate_agg()

ngdf

ds_out

Inspect format of netcdf output.#

xr.open_dataset('1980_merra2_swgdn.nc')

Plot the gridded data and interpolated data#

aggdata = agg_gen.agg_data
ds_ss = aggdata.get("SWGDN").da

ds_ss

timestep = "1980-08-01T00:30:00.000000000"
tvar = "SWGDN"
min = np.nanmin(ds_ss.sel(time=timestep).values)
max = np.nanmax(ds_ss.sel(time=timestep).values)
print(tvar, min, max)
print(type(tvar))

ngdf["SWGDN"] = ds_out[tvar].sel(time=timestep)
gridmet_agg = ngdf.to_crs(4326).hvplot(
    c=tvar, geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(min, max),
    xlabel="longitude", ylabel="latitude", clabel="surface_incoming_shortwave_flux (W m-2)", xlim=(-76.8, -73.8),
    title="HUC12 area-weighted average", aspect='equal'
)
gridmet_raw = ds_ss.sel(time=timestep).hvplot.quadmesh(
    'lon', 'lat', tvar, cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(min, max),
    xlabel="longitude", ylabel="latitude", clabel="surface_incoming_shortwave_flux (W m-2)", xlim=(-76.8, -73.8),
    title="Merra-2 Monthy average shortwave flux", aspect='equal'
)

(EsriTerrain() * gridmet_raw) + (EsriTerrain() * gridmet_agg)