Skip to content

Catalog

The catalog module builds a local mapping of parent morton cells to granule S3 URLs by querying NASA's CMR (Common Metadata Repository). This avoids per-worker CMR queries during parallel processing.

Building a Catalog

The catalog CLI accepts date ranges, product names, and spatial polygons:

# ICESat-2 convenience (cycle → date range):
python -m zagg.catalog --cycle 22 --parent-order 6

# Explicit date range:
python -m zagg.catalog --start-date 2024-01-06 --end-date 2024-04-07 --parent-order 6

# Custom region via GeoJSON polygon:
python -m zagg.catalog --start-date 2024-01-01 --end-date 2024-06-01 \
    --polygon my_region.geojson --parent-order 6

# Different product:
python -m zagg.catalog --start-date 2024-01-01 --end-date 2024-06-01 \
    --short-name ATL08 --polygon my_region.geojson --parent-order 6

When --polygon is provided, it is used for two things:

  1. Cell discoverymorton_coverage runs on the polygon to find parent cells
  2. CMR bounding box — automatically computed from the polygon's extent

When no polygon is given, Antarctic drainage basins are used as the default (suitable for ATL06 ice sheet work).

Temporal Helpers

zagg.catalog.cycle_to_dates

cycle_to_dates(cycle: int) -> tuple[datetime, datetime]

Convert an ICESat-2 repeat cycle number to a date range.

Parameters:

  • cycle (int) –

    ICESat-2 cycle number (1-based)

Returns:

  • tuple of (start_date, end_date)

    Start and end datetimes for the cycle

Spatial Helpers

zagg.catalog.load_polygon

load_polygon(geojson_path: str) -> list[tuple]

Load polygon(s) from a GeoJSON file.

Supports Feature, FeatureCollection, Polygon, and MultiPolygon geometries.

Parameters:

  • geojson_path (str) –

    Path to a GeoJSON file

Returns:

  • list of (lats, lons)

    One (lats, lons) array pair per polygon ring, suitable for morton_coverage multipart input.

zagg.catalog.polygon_to_bbox

polygon_to_bbox(parts: list[tuple]) -> tuple[float, float, float, float]

Compute a bounding box from polygon parts.

Parameters:

  • parts (list of (lats, lons)) –

    Polygon parts as returned by load_polygon

Returns:

  • tuple of (lon_min, lat_min, lon_max, lat_max)

    Bounding box in CMR format

Cell Discovery

zagg.catalog.load_antarctic_basins

load_antarctic_basins(filepath=None)

Load Antarctic drainage basin polygons.

Parameters:

  • filepath (str, default: None ) –

    Path to basin polygon file. Defaults to the file shipped with mortie.

Returns:

  • list of (lats, lons)

    One (lats, lons) pair per basin, suitable for morton_coverage multipart input.

zagg.catalog.discover_cells

discover_cells(parent_order, polygon_parts=None)

Discover morton cells at parent_order covering a polygon.

Parameters:

  • parent_order (int) –

    Morton order for parent cells (e.g., 6)

  • polygon_parts (list of (lats, lons), default: None ) –

    Polygon parts for coverage. Defaults to Antarctic drainage basins.

Returns:

  • ndarray

    Sorted array of unique morton indices at parent_order

CMR Query

zagg.catalog.query_cmr

query_cmr(
    start_date: str,
    end_date: str,
    short_name: str = "ATL06",
    version: str = "007",
    provider: str = "NSIDC_CPRD",
    bbox: tuple = None,
    page_size: int = 2000,
) -> List[dict]

Query CMR for granules matching temporal and spatial filters.

Parameters:

  • start_date (str) –

    Start date (YYYY-MM-DD)

  • end_date (str) –

    End date (YYYY-MM-DD)

  • short_name (str, default: 'ATL06' ) –

    CMR short name (e.g., ATL06, ATL08)

  • version (str, default: '007' ) –

    Product version

  • provider (str, default: 'NSIDC_CPRD' ) –

    CMR provider

  • bbox (tuple of (lon_min, lat_min, lon_max, lat_max), default: None ) –

    Bounding box filter

  • page_size (int, default: 2000 ) –

    Results per page

Returns:

  • list

    List of granule metadata dicts

Catalog Builder

zagg.catalog.build_catalog

build_catalog(
    granules: List[dict], parent_order: int, polygon_parts: list = None
) -> tuple

Build a granule catalog using morton_coverage for cell discovery and shapely STRtree for granule-to-cell intersection.

Parameters:

  • granules (list) –

    List of granule metadata from CMR

  • parent_order (int) –

    Morton order for parent cells (e.g., 6)

  • polygon_parts (list of (lats, lons), default: None ) –

    Polygon parts for cell discovery. Defaults to Antarctic drainage basins.

Returns:

  • catalog ( dict ) –

    Mapping of parent_morton (int) -> list of S3 URLs

  • timings ( dict ) –

    Wall-clock seconds for each pipeline step

Granule Parsing

zagg.catalog.extract_granule_info

extract_granule_info(granule: dict) -> dict

Extract S3 URL and geometry points from a CMR granule.

Parameters:

  • granule (dict) –

    UMM-JSON granule from CMR

Returns:

  • dict

    Keys: granule_id, s3_url, points (list of (lat, lon) tuples)