Catalog¶
Catalog construction has two separable concerns:
- Fetch — query a STAC endpoint (CMR-STAC) for what / when / where → a
Catalog(a stac-geoparquet table of granule metadata, reusable across many grids). - Shard map — take a
Catalogplus an output grid → aShardMap: the work-distribution manifest mapping shard keys to granules.
The CLI chains them, building the output grid from the same pipeline config
the aggregator uses, so a shard map can never be built against a different
grid than the run (enforced at run time via grid.signature()).
Building a shard map (CLI)¶
# HEALPix grid from atl06.yaml, an ICESat-2 cycle, Antarctic polygon:
python -m zagg.catalog --config atl06.yaml --short-name ATL06 --cycle 22 \
--polygon antarctica.geojson
# Rectilinear (UTM) grid from a config, explicit dates, over a bbox:
python -m zagg.catalog --config serc_atl03.yaml --short-name ATL03 \
--start-date 2025-01-01 --end-date 2025-12-31 \
--bbox=-76.62107,38.84504,-76.50583,38.93512
# Persist the fetched Catalog too (reusable for other grids):
python -m zagg.catalog --config atl06.yaml --short-name ATL06 --cycle 22 \
--polygon antarctica.geojson --catalog-out cycle22.parquet
--polygon drives both the CMR query bbox and the coverage mask; --bbox
gives the query box directly (coverage falls back to that rectangle). The
geometry backend (--backend) defaults to auto: exact-S2 spherely if the
spherely fork (with SpatialIndex) is installed separately, else mortie
(HEALPix) / shapely (rectilinear).
Endpoint selection (S3 vs HTTPS) is not made here — each granule record
keeps both hrefs, and the aggregator picks one at run time via
data_source.driver.
Fetch¶
zagg.catalog.sources.Query
dataclass
¶
A spatiotemporal metadata query: what, when, where.
Parameters:
-
short_name(str) –Product short name (e.g.
"ATL03"). -
version(str) –Product version (e.g.
"007"). -
start_date(str) –Inclusive date bounds,
YYYY-MM-DD. -
end_date(str) –Inclusive date bounds,
YYYY-MM-DD. -
region(tuple or str) –Either a
(lon_min, lat_min, lon_max, lat_max)bbox or a path to a GeoJSON file (its bounding box is used for the STAC query). -
provider(str, default:'NSIDC_CPRD') –CMR provider / STAC sub-catalog. Default
"NSIDC_CPRD".
zagg.catalog.sources.CMRSource ¶
Fetch granule metadata from NASA's CMR-STAC endpoint.
Parameters:
-
provider(str, default:None) –Overrides the query provider for the STAC sub-catalog URL.
-
timeout(int, default:60) –Per-request timeout in seconds.
fetch ¶
Run query against CMR-STAC and return a Catalog.
Parameters:
-
query(Query) –What/when/where to fetch.
-
preserve_thumbnails(bool, default:False) –Keep
thumbnail_*/browseassets (default drops them). -
limit(int, default:2000) –Page size hint; CMR clamps it and paging follows
rel=next.
Returns:
-
Catalog–
zagg.catalog.sources.Catalog
dataclass
¶
Fetched granule metadata: a stac-geoparquet table + provenance.
Reusable across many ShardMap builds. Endpoint-neutral -- each granule
carries both its S3 and HTTPS .h5 hrefs.
Parameters:
-
table(Table) –stac-geoparquet table (one row per granule).
-
metadata(dict, default:dict()) –Query provenance (product, version, bbox, dates, ...).
from_geoparquet
classmethod
¶
from_geoparquet(path: str) -> 'Catalog'
Load a catalog from a stac-geoparquet file (CMR or user-supplied).
granule_records ¶
Decode the table into per-granule dicts for ShardMap building.
Returns:
-
list of dict–Each:
{"id", "s3", "https", "lats", "lons"}wherelats/lonsare the footprint exterior-ring coordinate arrays (WGS84) ands3/httpsare the data-asset hrefs (either may be None).
Shard map¶
zagg.catalog.shardmap.ShardMap
dataclass
¶
Work-distribution manifest: shard key -> granules, tied to one grid.
Parameters:
-
grid_signature(dict) –grid.spatial_signature()at build time -- the spatial layout only (#89). The runner checks it against the run grid's spatial signature so a map can't be paired with a mismatched spatial grid, while staying reusable across configs that differ only in aggregation fields. (Kept asgrid_signaturefor back-compat; old maps carry the full signature and still validate via a spatial-subset projection.) -
shard_keys(list of int) –Sorted shard keys with at least one granule.
-
granules(list of list of dict) –Parallel to
shard_keys. Each granule is{"id", "s3", "https"}(option C -- self-contained, endpoint-neutral). -
metadata(dict, default:dict()) –Provenance copied from the Catalog plus backend/timing info.
build
classmethod
¶
build(
catalog,
grid,
*,
region=None,
backend: str = "auto",
mortie_order: int | None = None,
footprint: str = "swath",
) -> "ShardMap"
Build a ShardMap from a Catalog and an output grid.
Parameters:
-
catalog(Catalog) –Fetched granule metadata (provides
granule_records()). -
grid(OutputGrid) –Output grid (provides
coverage,shard_footprint,spatial_signature). -
region(list of (lats, lons), default:None) –Coverage mask in WGS84. Defaults to the catalog bbox rectangle.
-
backend(('auto', 'spherely', 'mortie'), default:"auto") –Geometry backend.
"auto"-> spherely when importable, else mortie for HEALPix grids (non-HEALPix grids require spherely and raise anImportErrorwith an install pointer when it is absent). -
mortie_order(int, default:None) –MOC order for the mortie backend.
None(default) pins it to the grid's inner-chunk ordergrid.chunk_order(thechunk_innerorder, defaulting toparent_orderwhen unset), clamped to mortie's order-18 coverage cap -- the dispatch chunk's own resolution, enough to keepmoc_to_orderfrom upsampling a footprint onto neighbor shards (#92) at near-minimal compute. Raises if the resolved order is coarser thanparent_order. -
footprint(('swath', 'beams'), default:"swath") –Granule footprint used for intersection.
"swath"(default) uses the raw CMR polygon."beams"decomposes ICESat-2 ATL03/06 swaths into per-beam-pair corridors so granules stop being assigned to shards their beams never cross (issue #65); non-beam products fall back to the swath ring... deprecated:: The
"beams"corridor mechanism is a stopgap (seebeams.py); remove it once native per-beam CMR geometry, the memory-handling robustness in #66, or data virtualization (#97) lands.
Returns:
-
ShardMap–
Convenience¶
zagg.catalog.make_shardmap ¶
make_shardmap(
query,
grid,
*,
region=None,
backend="auto",
catalog_out=None,
footprint="swath",
)
Fetch a Catalog and build a ShardMap in one call (concerns 1+2 chained).
Parameters:
-
query(Query) –What/when/where to fetch.
-
grid(OutputGrid) –Output grid (typically
from_config(config)). -
region(list of (lats, lons), default:None) –Coverage mask. Defaults to the query bbox rectangle.
-
backend(str, default:'auto') –Geometry backend for the shard map.
-
catalog_out(str, default:None) –If given, persist the fetched Catalog to this geoparquet path.
-
footprint(('swath', 'beams'), default:"swath") –Granule footprint for intersection;
"beams"tightens ICESat-2 ATL03/06 assignment to per-beam-pair corridors (issue #65)... deprecated:: The
"beams"corridor mechanism is a stopgap. Remove it once a better fix lands -- native per-beam CMR geometry, the memory-handling robustness in #66, or data virtualization tracked in #97.
Returns:
-
ShardMap–
Temporal / spatial helpers¶
zagg.catalog.cycle_to_dates ¶
Convert an ICESat-2 repeat cycle number to a (start, end) date range.
Parameters:
-
cycle(int) –ICESat-2 cycle number (1-based).
Returns:
-
tuple of (datetime, datetime)–
zagg.catalog.load_polygon ¶
Load polygon(s) from a GeoJSON file as (lats, lons) parts.
Supports Feature, FeatureCollection, Polygon, and MultiPolygon geometries.
Parameters:
-
geojson_path(str) –Path to a GeoJSON file.
Returns:
-
list of (lats, lons)–One coordinate-array pair per polygon ring (WGS84).
zagg.catalog.polygon_to_bbox ¶
Compute a (lon_min, lat_min, lon_max, lat_max) bbox from polygon parts.
Parameters:
-
parts(list of (lats, lons)) –
Returns:
-
tuple of (lon_min, lat_min, lon_max, lat_max)–