AWS Lambda¶
AWS Lambda function for processing ICESat-2 ATL06 data by morton cell.
Overview¶
The Lambda function processes a single morton cell (order 6) by:
- Reading HDF5 files directly from S3 using h5coro (no downloads)
- Spatial filtering using morton indexing
- Calculating summary statistics for child cells (order 12)
- Writing xdggs-enabled Zarr to S3
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Lambda Function (process-shard) │
│ ────────────────────────────────────────────────────────── │
│ Runtime: Python 3.12 │
│ Memory: 2048 MB (2 GB) │
│ Timeout: 720s (12 minutes) │
│ ────────────────────────────────────────────────────────── │
│ Code (~5 MB): │
│ - deployment/aws/lambda_handler.py (AWS wrapper) │
│ - src/zagg/ package (processing, auth, catalog) │
│ ────────────────────────────────────────────────────────── │
│ Layer (~70 MB compressed, ~240 MB uncompressed): │
│ - numpy, pandas, h5coro, mortie, pyproj, odc-geo │
│ - fastparquet, cramjam, shapely, astropy, earthaccess │
│ - pydantic-zarr, zarr, obstore, pyarrow │
└─────────────────────────────────────────────────────────────┘
Files¶
| File | Purpose |
|---|---|
deployment/aws/lambda_handler.py |
AWS Lambda wrapper function |
src/zagg/processing.py |
Cloud-agnostic core processing logic |
src/zagg/auth.py |
NASA Earthdata authentication helper |
src/zagg/catalog/ |
CMR/STAC shard-map (granule catalog) builder (python -m zagg.catalog) |
deployment/aws/invoke_lambda.py |
Orchestration script |
deployment/aws/build_layer.sh |
Lambda layer build script (x86_64/arm64) |
Event Payload¶
{
"shard_key": 123456,
"parent_order": 6,
"child_order": 12,
"granule_urls": [
"s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/18/...",
"s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/19/..."
],
"store_path": "s3://your-output-bucket/atl06/production.zarr",
"s3_credentials": {
"accessKeyId": "ASIA...",
"secretAccessKey": "...",
"sessionToken": "..."
},
"output_credentials": {
"accessKeyId": "ASIA...",
"secretAccessKey": "...",
"sessionToken": "...",
"endpointUrl": "https://...",
"region": "us-west-2"
}
}
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
shard_key |
int | Yes | Grid-agnostic shard identifier (HEALPix: the parent-cell morton index) |
parent_order |
int | Yes | Order of parent cell (typically 6); HEALPix-only (null for other grids) |
child_order |
int | HEALPix only | Order of child cells for statistics (typically 12); omitted for non-HEALPix grids |
granule_urls |
list | Yes | Pre-computed list of S3 URLs from catalog |
store_path |
str | Yes | Output Zarr store path (e.g. s3://bucket/prefix.zarr) |
s3_credentials |
dict | Yes | NSIDC S3 credentials for reading source data |
output_credentials |
dict | No | Explicit credentials for writing the output store. Omit to use the execution role (in-account writes). Supply to write an external / S3-compatible target. Keys: accessKeyId, secretAccessKey, optional sessionToken/endpointUrl/region. |
Grid-neutral event fields
The unit of work is a shard — for HEALPix, one parent (order-6) cell. The
orchestrator and the catalog use that vocabulary (python -m zagg.catalog
emits a shard map with shard_keys + a grid_signature). The Lambda
event schema uses the grid-neutral field name shard_key (the shard
identifier for any grid; for HEALPix it is the parent-cell morton index).
parent_order/child_order are HEALPix-specific: parent_order is
forwarded for every grid (null for non-HEALPix), while child_order is
only required/sent for HEALPix runs. See deployment/aws/lambda_handler.py.
This rename landed via #24.
S3 Credentials¶
Credentials are obtained by the orchestrator once before invoking Lambda functions:
from zagg.auth import get_nsidc_s3_credentials
# Get credentials (valid for ~1 hour)
s3_creds = get_nsidc_s3_credentials()
# Pass to each Lambda invocation
event = {
"shard_key": -6134114,
"parent_order": 6,
"child_order": 12,
"granule_urls": [...],
"store_path": "s3://output-bucket/atl06/production.zarr",
"s3_credentials": s3_creds,
}
This approach avoids rate limiting from 1,872 simultaneous NASA logins and eliminates an AWS Secrets Manager dependency.
Output Credentials (external write targets)¶
By default the function writes the output store with its execution role
against the in-account bucket; omit output_credentials entirely to keep this
behavior. To write an external or S3-compatible target (another account, or
e.g. source.coop) without changing the execution role, supply
output_credentials in the event — symmetric to how s3_credentials injects
read credentials:
from zagg import load_config, agg
results = agg(
config, catalog="catalog.json", backend="lambda",
store="s3://us-west-2.opendata.source.coop/org/dataset.zarr",
output_credentials={ # runtime-only; never store in config/YAML
"accessKeyId": "ASIA...",
"secretAccessKey": "...",
"sessionToken": "...", # optional
# "endpointUrl": "https://...", # optional: R2/MinIO etc.
# "region": "us-west-2", # optional
},
)
From the CLI, point --output-creds at a JSON file holding that dict (keeps
secrets out of shell history):
python -m zagg --config atl06.yaml --catalog catalog.json --backend lambda \
--store s3://us-west-2.opendata.source.coop/org/dataset.zarr \
--output-creds /path/to/output-creds.json
The non-secret endpoint_url / region may also be set in the config's
output: section (overridable at runtime); credentials are runtime-only.
source.coop uses the standard AWS S3 endpoint with injected STS credentials —
endpointUrl is only needed for non-AWS S3-compatible stores. Dotted bucket
names (e.g. us-west-2.opendata.source.coop) and custom endpoints use
path-style addressing automatically.
Deployment¶
Recommended: CloudFormation standup¶
The recommended way to stand up the backend in a fresh AWS account is the
committed CloudFormation template, driven by stand_up.sh, which creates the
execution role, dependency layer, and function in one stack:
OUTPUT_BUCKET=my-results-bucket bash deployment/aws/stand_up.sh
See Standing Up the Backend for the full walkthrough: what the
script does, the parameter/environment-variable reference, cross-region staging,
and teardown. By default (CreateExecutionRole=true) the stack creates the IAM
execution role for you; the only exception is an account whose deploy identity
cannot create IAM roles (e.g. an AWS SSO "power user" set) — see
Execution Role for that IAM-constrained, legacy/unverified
path.
Legacy / manual deploy¶
Not the recommended path
The steps below hand-assemble the function zip and create/update the Lambda
with raw aws lambda calls. They are kept for understanding what the
template builds and for one-off tweaks, but the
CloudFormation standup above is the preferred, reproducible
way to deploy. The maintainer in-place code updater
deployment/aws/deploy.sh (pulls the latest CI artifacts and runs
aws lambda update-function-code) is a convenience over the manual
update-function-code step; it updates an already-deployed function and does
not create the role/function/bucket.
Step 1: Create the function package¶
cd /path/to/zagg
# Create function.zip with handler and zagg package
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
cd src && zip -ur ../deployment/aws/function.zip zagg/ -i "*.py" && cd ..
Step 2: Build and deploy the Lambda layer¶
See ARM64 Layer for building and deploying the Lambda layer.
Step 3: Create the Lambda function¶
aws lambda create-function \
--function-name process-shard \
--runtime python3.12 \
--architectures arm64 \
--role arn:aws:iam::ACCOUNT_ID:role/lambda-execution-role \
--handler lambda_handler.lambda_handler \
--zip-file fileb://deployment/aws/function.zip \
--timeout 720 \
--memory-size 2048 \
--layers arn:aws:lambda:REGION:ACCOUNT_ID:layer:zagg-layer-arm64:VERSION
Updating function code¶
# Re-create the zip
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
cd src && zip -ur ../deployment/aws/function.zip zagg/ -i "*.py" && cd ..
# Update the Lambda function
aws lambda update-function-code \
--function-name process-shard \
--zip-file fileb://deployment/aws/function.zip
Testing¶
# Raise the open-file limit before fanning out: each concurrent worker holds
# one socket to the Lambda endpoint, and the default soft limit (often 256)
# would otherwise cap concurrency. See "Concurrency, workers, and file
# descriptors" below.
ulimit -n 8192
# Build a shard map
uv run python -m zagg.catalog --config atl06.yaml --short-name ATL06 --cycle 22 \
--polygon antarctica.geojson
# Test locally first (no Lambda required)
uv run python -m zagg --config atl06.yaml --catalog catalog.json \
--store ./test.zarr --max-cells 1
# Dry run with the Lambda orchestrator
uv run python deployment/aws/invoke_lambda.py \
--config atl06.yaml --catalog catalog.json --dry-run
Concurrency, workers, and file descriptors¶
The Lambda backend fans out one synchronous invoke per cell across a thread
pool, and each in-flight worker holds an open socket to the Lambda endpoint.
Two limits bound how many can run at once, and the orchestrator checks both
before dispatch so cells are never silently dropped:
- Open file descriptors (
ulimit -n). If concurrent workers exceed the process's open-file soft limit (256 on stock macOS / many Linux shells), invokes fail withOSError: [Errno 24] Too many open files— a client-side failure AWS never sees. The runner derives a safe ceiling from the soft limit and surfaces errno-24 with actionable guidance instead of a raw connection error. Raise the limit before a large run:ulimit -n 8192. - Account Lambda concurrency. The runner reads the account
ConcurrentExecutionsceiling and current usage (CloudWatch) and clamps workers to the available headroom (5% padding, floored at 100 free slots), so a run can't saturate the account pool and throttle itself or other Lambda activity. This degrades gracefully if the dispatch role lackslambda:GetAccountSettings/cloudwatch:GetMetricStatistics— it then bounds workers by the FD limit alone.
Keep --max-workers ≤ min(ulimit -n − headroom, account concurrency). The
orchestrator enforces this automatically; setting ulimit -n higher simply
raises the FD ceiling it can use.
Performance¶
| Metric | Value |
|---|---|
| Average execution time | 2--3 minutes per cell |
| Maximum execution time | 10 minutes |
| Lambda timeout | 12 minutes (720s) |
| Configured memory | 2048 MB |
| Typical memory usage | 1--1.5 GB |
| Cold start | 3--5 seconds |
Cost Estimate¶
Per invocation (180s average, 2 GB memory): ~$0.006
Full run (~1,300 cells at order 6): ~$2 including S3 and CloudWatch costs.
Troubleshooting¶
Missing s3_credentials
Ensure your orchestrator script calls get_nsidc_s3_credentials and passes the credentials to each Lambda invocation.
No granules found
This is normal for cells outside the data coverage area. The function returns gracefully with error: "No granules found".
S3 write permission denied
Check that the Lambda execution role has s3:PutObject permission for the output bucket.
Too many open files
[Errno 24] Too many open files means concurrent workers exceeded the
open-file soft limit and cells would be dropped. Raise it (ulimit -n 8192)
or lower --max-workers. See "Concurrency, workers, and file descriptors"
above — the orchestrator now clamps workers to the FD and account-concurrency
limits automatically.