Loading bed picks and layer data

For workflows that primarily need surface, bed, or internal layer picks, this notebook demonstrates how to work with OPR layer picking information.

This example shows how to load bed picks for a region and grid them onto a regular grid using Verde.

(If your primary use case is plotting layers on top of a radargram you’ve already loaded, the demo_notebook.ipynb may be more useful to you.)

import xopr
from xopr.bedmap import query_bedmap, query_bedmap_catalog, fetch_bedmap

import holoviews as hv
import xarray as xr
import hvplot
import hvplot.xarray
import hvplot.pandas
import geoviews.feature as gf
import cartopy.crs as ccrs
import rioxarray
from tqdm.notebook import tqdm
import numpy as np
import verde as vd
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import box
from datetime import datetime
import time
import warnings
warnings.filterwarnings('ignore')

opr = xopr.OPRConnection(cache_dir='radar_cache')

We’ll setup some useful backgrounds for context on our maps:

epsg_3031 = ccrs.Stereographic(central_latitude=-90, true_scale_latitude=-71)
coastline = gf.coastline.options(scale='50m').opts(projection=epsg_3031)
velocity = rioxarray.open_rasterio(
    "https://its-live-data.s3.amazonaws.com/velocity_mosaic/v2/static/cog/ITS_LIVE_velocity_120m_RGI19A_0000_v02_v.tif",
    chunks='auto', overview_level=4, cache=False
).squeeze().drop_vars(['spatial_ref', 'band']).rename('velocity (m/year)')
velocity_map = velocity.hvplot.image(x='x', y='y', cmap='gray_r').opts(clim=(0,500))

For this example, we’ll focus on a specific region. Feel free to try swapping this region out for any other, of course.

The Vincennes Bay area has some deep troughs that run across flow, making it an interesting area to loop at bed topography. This region is covered more extensively by UTIG data, which has only recently become available through OPR. Keep in mind that not all of this data yet has bed picks, as we’ll see below.

region = xopr.geometry.get_antarctic_regions(name=["Vincennes_Bay", "Underwood"], merge_regions=True, simplify_tolerance=100)
region_projected = xopr.geometry.project_geojson(region, source_crs='EPSG:4326', target_crs="EPSG:3031")

region_hv = hv.Polygons([region_projected]).opts(
    color='green',
    line_color='black',
    fill_alpha=0.3)

(velocity_map * coastline * region_hv).opts(aspect='equal')

Querying for bed picks starts the same as any other radar query. We’ll begin by fetching the STAC items corresponding to the radar data we want. As a reminder, this step doesn’t load any actual radar data yet -- we’re just getting the paths along which the radar data was collected.

gdf = opr.query_frames(geometry=region).to_crs('EPSG:3031')
print(f"Found {len(gdf)} radar frames in the selected region.")
gdf.head()

Found 100 radar frames in the selected region.

radar_frames_hv = gdf.hvplot(by='collection', hover_cols=['id'])
(velocity_map * coastline * region_hv * radar_frames_hv).opts(aspect='equal', legend_position='top_left')

Getting bed picks¶

Now that we’ve got our frames selected, we can load layer information for them. Layer information includes surface and bed and might come from the OPS API or from layerdata files hosted on OPR servers and indexed in the STAC catalog.

See https://gitlab.com/openpolarradar/opr/-/wikis/Layer-File-Guide

xOPR tries to abstract away the difference between the layer database and the layer files by formatting both to look like the layer files.

layer_ds_list = []

for id, frame in tqdm(gdf.iterrows(), total=len(gdf), mininterval=1.0, desc="Loading layers"):
    layers = opr.get_layers(frame)
    bed_layer_name = None
    if 'standard:bottom' in layers: # Generally, the picked bed should be in group "standard" with layer name "bottom"
        bed_layer_name = 'standard:bottom'
    elif ':bottom' in layers:
        bed_layer_name = ':bottom' # But occasionally it seems to be missing the group
    else:
        continue  # No bed layer found
    # Layers are stored in terms of two-way travel time to avoid any questions about travel speed within ice
    # This is different from how BedMap layers are stored, but it does make more sense when the radar data is availble to use twtt
    layer_wgs84 = xopr.radar_util.layer_twtt_to_range(layers[bed_layer_name], layers["standard:surface"], vertical_coordinate='wgs84').rename({'lat': 'Latitude', 'lon': 'Longitude'})
    layer_wgs84 = xopr.geometry.project_dataset(layer_wgs84, target_crs='EPSG:3031')
    layer_wgs84 = layer_wgs84.dropna('slow_time', subset=['wgs84'])
    layer_wgs84['source'] = id
    layer_ds_list.append(layer_wgs84)

We can now combine all of the layers to get a pointwise list of bed picks:

bed_merged = xr.concat(layer_ds_list, dim='slow_time')

# Get extent of radar data for plots and bedmap comparison
xlim = (bed_merged.x.min().item(), bed_merged.x.max().item())
ylim = (bed_merged.y.min().item(), bed_merged.y.max().item())

bed_merged

bed_hv = bed_merged.hvplot.scatter(x='x', y='y', c='wgs84', cmap='turbo', s=2).opts(clabel='Bed Elevation WGS84 (m)')
(velocity_map.opts(colorbar=False) * coastline * region_hv * radar_frames_hv * bed_hv).opts(aspect='equal', legend_position='top_left', xlim=xlim, ylim=ylim)

As you can see, many of the radar lines are (as of the time of writing) missing bed picks.

If you’re looking at the Vincennes Bay region, you’ll see the very deep trough running roughly grid top to bottom in the radar bed picks.

Gridding¶

What you want to do with the bed picks is, of course, up to you. One use case might be to aggregate these picks onto a common grid. We’ll show an example of that below:

def grid_dataarray(d: xr.DataArray, spacing=1000, aggregation_fns={'median': "median", 'std': 'std', 'count': "count"}):
    """
    Grid a DataArray with x,y coordinates into a regular grid using block aggregation.
    
    Parameters
    ----------
    d : xr.DataArray
        Input DataArray with 'x' and 'y' coordinates
    spacing : float
        Grid spacing in the same units as x,y coordinates
    aggregation_fns : dict
        Dictionary mapping aggregation function names to functions (e.g., {'median': np.median, 'std': np.std})
    
    Returns
    -------
    xr.Dataset
        Dataset with variables named {d.name}_{fn_name} for each aggregation function
    """
    # Get data extent
    x_min = d['x'].min().values
    x_max = d['x'].max().values
    y_min = d['y'].min().values
    y_max = d['y'].max().values
    
    # Extract coordinate and data values
    x_data = d['x'].values
    y_data = d['y'].values
    data_values = d.values
    
    # Create grid coordinates
    grid_x, grid_y = vd.grid_coordinates(
        region=(x_min, x_max, y_min, y_max),
        spacing=spacing
    )
    
    # Dictionary to store gridded results for each aggregation function
    data_vars = {}
    
    for fn_name, fn in aggregation_fns.items():
        # Use Verde's BlockReduce with the specified aggregation function
        gridder = vd.BlockReduce(
            reduction=fn, 
            spacing=spacing, 
            region=(x_min, x_max, y_min, y_max),
            center_coordinates=True
        )
        block_coords, block_values = gridder.filter(
            coordinates=(x_data, y_data), 
            data=data_values
        )
        
        # Initialize grid with NaN
        grid_data = np.full(grid_x.shape, np.nan)
        
        # Vectorized approach: compute indices directly from coordinates
        x_indices = np.floor((block_coords[0] - x_min) / spacing).astype(int)
        y_indices = np.floor((block_coords[1] - y_min) / spacing).astype(int)
        
        for x_idx, y_idx, value in zip(x_indices.flatten(), y_indices.flatten(), block_values.flatten()):
            grid_data[y_idx, x_idx] = value
        
        # Store in dictionary with name pattern
        var_name = f"{d.name}_{fn_name}" if d.name else f"data_{fn_name}"
        data_vars[var_name] = (['y', 'x'], grid_data)
    
    # Create Dataset with all aggregated variables
    return xr.Dataset(
        data_vars=data_vars,
        coords={
            'y': grid_y[:, 0],
            'x': grid_x[0, :]
        }
    )

gridded = grid_dataarray(bed_merged['wgs84'], spacing=5000)

gridded_median_hv = hv.Image(gridded, kdims=['x', 'y'], vdims=['wgs84_median', 'wgs84_std', 'wgs84_count']).opts(
    cmap='turbo',
    aspect='equal',
    tools=['hover'],
    colorbar=True,
    clabel='WGS84 Elevation (m)'
)

gridded_std_hv = hv.Image(gridded, kdims=['x', 'y'], vdims=['wgs84_std', 'wgs84_median', 'wgs84_count']).opts(
    cmap='inferno',
    aspect='equal',
    tools=['hover'],
    colorbar=True,
    clabel='Std of WGS84 Elevation (m)'
)

(velocity_map * region_hv * coastline * gridded_median_hv).opts(width=500, aspect='equal', xlim=xlim, ylim=ylim) + \
    (velocity_map * region_hv * coastline * gridded_std_hv).opts(width=500, aspect='equal', xlim=xlim, ylim=ylim)

Bedmap Data Integration¶

The BedMap(1/2/3) datasets contain an enormous catalog of surface and bed picks. BedMap includes data from surveys that aren’t yet in the OPR catalog, while OPR has some radar data that hasn’t made it into BedMap. xopr provides unified access to bed picks from both sources.

The bedmap data is hosted on Source Cooperative at s3://us-west-2.opendata.source.coop/englacial/bedmap/. The query process works in two stages:

STAC Catalog Query: Find GeoParquet files that intersect with the query geometry/time
DuckDB Partial Reads: Fetch only relevant rows from those files using SQL pushdown

This approach minimizes data transfer - only the data you need is downloaded!

# Create polygon from radar extent (EPSG:3031) and convert to WGS84 for bedmap query
extent_box_3031 = box(xlim[0], ylim[0], xlim[1], ylim[1])
radar_extent = xopr.geometry.project_geojson(extent_box_3031, source_crs='EPSG:3031', target_crs='EPSG:4326')

# Query the catalog to see what bedmap files match our region
print("Querying STAC catalog for matching bedmap files...")

catalog_items = query_bedmap_catalog(
    geometry=radar_extent,
    collections=['bedmap3']
)

if not catalog_items.empty:
    print(f"\nFound {len(catalog_items)} matching files:")
    for _, row in catalog_items.head(10).iterrows():
        props = row['properties'] if 'properties' in row else {}
        name = props.get('name', row.get('id', 'unknown'))
        row_count = props.get('row_count', 0)
        print(f"  - {name}: {row_count:,} rows")
    if len(catalog_items) > 10:
        print(f"  ... and {len(catalog_items) - 10} more")
else:
    print("No matching files found.")

Querying STAC catalog for matching bedmap files...
Downloading bedmap catalogs to /home/runner/.cache/xopr/catalogs/bedmap...

Bedmap catalogs cached successfully


Found 7 matching files:
  - INGV_2003_Talos-Dome_AIR_BM3: 2,659,478 rows
  - NASA_2019_ICEBRIDGE_AIR_BM3: 1,395,747 rows
  - PRIC_2015_CHA1_AIR_BM3: 1,847,620 rows
  - PRIC_2016_CHA2_AIR_BM3: 1,003,471 rows
  - UTIG_2010_ICECAP_AIR_BM3: 9,372,658 rows
  - UTIG_2015_EAGLE_AIR_BM3: 1,056,478 rows
  - UTIG_2016_OLDICE_AIR_BM3: 425,945 rows

The cell above is a metadata query, and is optional-- you can just call query_bedmap by itself, which will automatically call query_bedmap_catalog to determine which files to access.

# Query bedmap data matching the radar data extent for comparison
print("Querying bedmap data from cloud GeoParquet files...")
print(f"Query region bounds: {radar_extent.bounds}")

print("Timing cloud-based query...")
t0 = time.time()

cloud_result = query_bedmap(
    geometry=radar_extent,
    collections=['bedmap3'],
    exclude_geometry=True
)

cloud_time = time.time() - t0
print(f"Cloud query: {cloud_time:.2f}s for {len(cloud_result):,} rows")

if not cloud_result.empty:
    print(f"\nRetrieved {len(cloud_result):,} points from bedmap")
    print("\nColumns available:")
    print(cloud_result.columns.tolist())
    print("\nFirst 5 rows:")
    display(cloud_result.head())
else:
    print("No bedmap data found in query region.")

Querying bedmap data from cloud GeoParquet files...
Query region bounds: (106.38786945002471, -67.91012091719583, 112.05973945248029, -65.99603208546871)
Timing cloud-based query...
Found 7 matching files in catalog
Querying 7 GeoParquet files...

Retrieved 444,588 rows
Cloud query: 44.72s for 444,588 rows

Retrieved 444,588 points from bedmap

Columns available:
['trajectory_id', 'trace_number', 'surface_altitude (m)', 'land_ice_thickness (m)', 'bedrock_altitude (m)', 'two_way_travel_time (m)', 'aircraft_altitude (m)', 'along_track_distance (m)', 'timestamp', 'source_file', 'geometry', 'lon', 'lat']

First 5 rows:

Cloud vs Local Cache Performance¶

For repeated queries or large datasets, you can use the local_cache option to download the GeoParquet files once and query them locally. This can provide significant speedups for subsequent queries-- though the exact speed up depends on your network connection and local disk speed.

# Download bedmap STAC catalogs to local cache for faster subsequent queries
# This only needs to be done once - catalog files are cached locally
print("Fetching bedmap catalogs to local cache...")
t0 = time.time()
fetch_bedmap('bedmap3') # Default is all versions
download_time = time.time() - t0
print(f"Local cache: {download_time:.2f}s to download")

Fetching bedmap catalogs to local cache...
bedmap3: Reading STAC catalog to get data file locations...

bedmap3: Found 84 data files to download

  Downloaded: AWI_2013_GEA-IV_AIR_BM3.parquet

  Downloaded: AWI_2014_Recovery-Glacier_AIR_BM3.parquet

  Downloaded: AWI_2015_GEA-DML_AIR_BM3.parquet

  Downloaded: AWI_2016_OIR_AIR_BM3.parquet

  Downloaded: AWI_2018_ANIRES_AIR_BM3.parquet

  Downloaded: AWI_2018_DML-Coast_AIR_BM3.parquet

  Downloaded: AWI_2018_JURAS_AIR_BM3.parquet

  Downloaded: AWI_2019_JURAS_AIR_BM3.parquet

  Downloaded: BAS_2007_Lake-Ellsworth_GRN_BM3.parquet

  Downloaded: BAS_2007_Rutford_GRN_BM3.parquet

  Downloaded: BAS_2008_Lake-Ellsworth_GRN_BM3.parquet

  Downloaded: BAS_2010_IMAFI_AIR_BM3.parquet

  Downloaded: BAS_2011_Adelaide_AIR_BM3.parquet

  Downloaded: BAS_2012_Castle_GRN_BM3.parquet

  Downloaded: BAS_2012_ICEGRAV_AIR_BM3.parquet

  Downloaded: BAS_2013_ISTAR_GRN_BM3.parquet

  Downloaded: BAS_2015_FISS_AIR_BM3.parquet

  Downloaded: BAS_2015_POLARGAP_AIR_BM3.parquet

  Downloaded: BAS_2016_FISS_AIR_BM3.parquet

  Downloaded: BAS_2017_English-Coast_AIR_BM3.parquet

  Downloaded: BAS_2018_Thwaites_AIR_BM3.parquet

  Downloaded: BAS_2019_Thwaites_AIR_BM3.parquet

  Downloaded: CECS_2006_Subglacial-Lake-CECs_GRN_BM3.parquet

  Downloaded: CRESIS_2009_AntarcticaTO_AIR_BM3.parquet

  Downloaded: CRESIS_2009_Thwaites_AIR_BM3.parquet

  Downloaded: CRESIS_2013_Siple-Coast_AIR_BM3.parquet

  Downloaded: INGV_1997_Talos-Dome_AIR_BM3.parquet

  Downloaded: INGV_1999_Talos-Dome_AIR_BM3.parquet

  Downloaded: INGV_2001_Talos-Dome_AIR_BM3.parquet

  Downloaded: INGV_2003_Talos-Dome_AIR_BM3.parquet

  Downloaded: KOPRI_2017_KRT1_AIR_BM3.parquet

  Downloaded: KOPRI_2018_KRT2_AIR_BM3.parquet

  Downloaded: LDEO_2015_ROSETTA_AIR_BM3.parquet

  Downloaded: NASA_2013_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NASA_2014_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NASA_2016_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NASA_2017_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NASA_2018_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NASA_2019_ICEBRIDGE_AIR_BM3.parquet

  Downloaded: NIPR_1992_JARE33_GRN_BM3.parquet

  Downloaded: NIPR_1996_JARE37_GRN_BM3.parquet
  Downloaded: NIPR_1999_JARE40_GRN_BM3.parquet

  Downloaded: NIPR_2007_JARE49_GRN_BM3.parquet

  Downloaded: NIPR_2007_JASE_GRN_BM3.parquet
  Downloaded: NIPR_2012_JARE54_GRN_BM3.parquet

  Downloaded: NIPR_2017_JARE59_GRN_BM3.parquet

  Downloaded: NIPR_2018_JARE60_GRN_BM3.parquet

  Downloaded: NPI_2012_ICERISES_GRN_BM3.parquet

  Downloaded: NPI_2015_POLARGAP_AIR_BM3.parquet

  Downloaded: NPI_2016_MADICE_GRN_BM3.parquet

  Downloaded: PRIC_2015_CHA1_AIR_BM3.parquet

  Downloaded: PRIC_2016_CHA2_AIR_BM3.parquet

  Downloaded: PRIC_2017_CHA3_AIR_BM3.parquet

  Downloaded: PRIC_2018_CHA4_AIR_BM3.parquet
  Downloaded: RNRF_1971_Lambert-Amery_SEI_BM3.parquet

  Downloaded: RNRF_1975_Filchner-Ronne_SEI_BM3.parquet
  Downloaded: RNRF_1975_Lazarev_SEI_BM3.parquet

  Downloaded: RNRF_2003_AMSap5_AIR_BM3.parquet

  Downloaded: RNRF_2004_AMSap5_AIR_BM3.parquet

  Downloaded: RNRF_2004_Mirny-Vostok_AIR_BM3.parquet

  Downloaded: RNRF_2005_AMSap5_AIR_BM3.parquet

  Downloaded: RNRF_2006_Komsom-Vostok_AIR_BM3.parquet

  Downloaded: RNRF_2006_RAEap5_AIR_BM3.parquet

  Downloaded: RNRF_2007_AMSap5_AIR_BM3.parquet

  Downloaded: RNRF_2008_AMSap5_AIR_BM3.parquet

  Downloaded: RNRF_2009_RAEap5_AIR_BM3.parquet

  Downloaded: RNRF_2010_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2011_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2013_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2014_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2015_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2016_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2017_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2018_RAE_AIR_BM3.parquet

  Downloaded: RNRF_2019_RAE_AIR_BM3.parquet

  Downloaded: STANFORD_1971_SPRI-NSF-TUD_AIR_BM3.parquet
  Downloaded: ULB_2012_BEWISE_GRN_BM3.parquet

  Downloaded: ULB_2012_ICECON_GRN_BM3.parquet

  Downloaded: UTIG_2009_Darwin-Hatherton_AIR_BM3.parquet

  Downloaded: UTIG_2010_ICECAP_AIR_BM3.parquet

  Downloaded: UTIG_2013_GIMBLE_AIR_BM3.parquet

  Downloaded: UTIG_2015_EAGLE_AIR_BM3.parquet

  Downloaded: UTIG_2016_OLDICE_AIR_BM3.parquet

  Downloaded: UWASHINGTON_2018_South-Pole-Lake_GRN_BM3.parquet
bedmap3: 84 files cached in radar_cache/bedmap
Local cache: 74.16s to download

# Now time the same query using local cache
print("Timing local cache query...")
t0 = time.time()
local_result = query_bedmap(
    geometry=radar_extent,
    collections=['bedmap3'],
    exclude_geometry=True,
    local_cache=True  # Use locally cached files
)
local_time = time.time() - t0
print(f"Local query: {local_time:.2f}s for {len(local_result):,} rows")

# Compare performance
print(f"\nSpeedup: {cloud_time / local_time:.1f}x faster with local cache")

Timing local cache query...

Retrieved 444,588 rows from local cache
Local query: 2.31s for 444,588 rows

Speedup: 19.4x faster with local cache

# Visualize bedmap bed elevation alongside OPR bed picks
# Filter to rows with valid bed elevation for apples-to-apples comparison
bedmap_with_bed = local_result.dropna(subset=['bedrock_altitude (m)'])
print(f"Points with bed elevation: {len(bedmap_with_bed):,} of {len(local_result):,}")

# Create GeoDataFrame for bedmap data
bedmap_gdf = gpd.GeoDataFrame(
    bedmap_with_bed,
    geometry=gpd.points_from_xy(bedmap_with_bed['lon'], bedmap_with_bed['lat']),
    crs='EPSG:4326'
).to_crs('EPSG:3031')

# Extract x/y coordinates for plotting
bedmap_gdf['x'] = bedmap_gdf.geometry.x
bedmap_gdf['y'] = bedmap_gdf.geometry.y

# Plot bedmap bed elevation
bedmap_hv = bedmap_gdf.hvplot.points(
    x='x', y='y', c='bedrock_altitude (m)',
    cmap='turbo', s=5, alpha=0.7
).opts(clabel='Bed Elevation (m)')

# Combine with OPR bed picks
(velocity_map.opts(colorbar=False) * coastline * region_hv * 
 radar_frames_hv * bedmap_hv).opts(
    aspect='equal', legend_position='top_left', xlim=xlim, ylim=ylim,
    title='Bedmap bed elevation in radar extent'
)

Points with bed elevation: 185,320 of 444,588