AWS Lambda¶

AWS Lambda function for processing ICESat-2 ATL06 data by morton cell.

Overview¶

The Lambda function processes a single morton cell (order 6) by:

Reading HDF5 files directly from S3 using h5coro (no downloads)
Spatial filtering using morton indexing
Calculating summary statistics for child cells (order 12)
Writing xdggs-enabled Zarr to S3

Architecture¶

┌─────────────────────────────────────────────────────────────┐
│  Lambda Function (process-shard)                      │
│  ──────────────────────────────────────────────────────────  │
│  Runtime: Python 3.12                                       │
│  Memory: 2048 MB (2 GB)                                     │
│  Timeout: 720s (12 minutes)                                 │
│  ──────────────────────────────────────────────────────────  │
│  Code (~5 MB):                                              │
│    - deployment/aws/lambda_handler.py (AWS wrapper)         │
│    - src/zagg/ package (processing, auth, catalog)          │
│  ──────────────────────────────────────────────────────────  │
│  Layer (~70 MB compressed, ~240 MB uncompressed):           │
│    - numpy, pandas, h5coro, mortie, pyproj, odc-geo         │
│    - fastparquet, cramjam, shapely, astropy, earthaccess    │
│    - pydantic-zarr, zarr, obstore, pyarrow                  │
└─────────────────────────────────────────────────────────────┘

Files¶

File	Purpose
`deployment/aws/lambda_handler.py`	AWS Lambda wrapper function
`src/zagg/processing.py`	Cloud-agnostic core processing logic
`src/zagg/auth.py`	NASA Earthdata authentication helper
`src/zagg/catalog/`	CMR/STAC shard-map (granule catalog) builder (`python -m zagg.catalog`)
`deployment/aws/invoke_lambda.py`	Orchestration script
`deployment/aws/build_layer.sh`	Lambda layer build script (`x86_64`/`arm64`)

Event Payload¶

{
  "shard_key": 123456,
  "parent_order": 6,
  "child_order": 12,
  "granule_urls": [
    "s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/18/...",
    "s3://nsidc-cumulus-prod-protected/ATLAS/ATL06/007/2023/12/19/..."
  ],
  "store_path": "s3://your-output-bucket/atl06/production.zarr",
  "s3_credentials": {
    "accessKeyId": "ASIA...",
    "secretAccessKey": "...",
    "sessionToken": "..."
  },
  "output_credentials": {
    "accessKeyId": "ASIA...",
    "secretAccessKey": "...",
    "sessionToken": "...",
    "endpointUrl": "https://...",
    "region": "us-west-2"
  }
}

Parameters¶

Parameter	Type	Required	Description
`shard_key`	int	Yes	Grid-agnostic shard identifier (HEALPix: the parent-cell morton index)
`parent_order`	int	Yes	Order of parent cell (typically 6); HEALPix-only (`null` for other grids)
`child_order`	int	HEALPix only	Order of child cells for statistics (typically 12); omitted for non-HEALPix grids
`granule_urls`	list	Yes	Pre-computed list of S3 URLs from catalog
`store_path`	str	Yes	Output Zarr store path (e.g. `s3://bucket/prefix.zarr`)
`s3_credentials`	dict	Yes	NSIDC S3 credentials for reading source data
`output_credentials`	dict	No	Explicit credentials for writing the output store. Omit to use the execution role (in-account writes). Supply to write an external / S3-compatible target. Keys: `accessKeyId`, `secretAccessKey`, optional `sessionToken`/`endpointUrl`/`region`.

Grid-neutral event fields

The unit of work is a shard — for HEALPix, one parent (order-6) cell. The orchestrator and the catalog use that vocabulary (python -m zagg.catalog emits a shard map with shard_keys + a grid_signature). The Lambda event schema uses the grid-neutral field name shard_key (the shard identifier for any grid; for HEALPix it is the parent-cell morton index). parent_order/child_order are HEALPix-specific: parent_order is forwarded for every grid (null for non-HEALPix), while child_order is only required/sent for HEALPix runs. See deployment/aws/lambda_handler.py. This rename landed via #24.

S3 Credentials¶

Credentials are obtained by the orchestrator once before invoking Lambda functions:

from zagg.auth import get_nsidc_s3_credentials

# Get credentials (valid for ~1 hour)
s3_creds = get_nsidc_s3_credentials()

# Pass to each Lambda invocation
event = {
    "shard_key": -6134114,
    "parent_order": 6,
    "child_order": 12,
    "granule_urls": [...],
    "store_path": "s3://output-bucket/atl06/production.zarr",
    "s3_credentials": s3_creds,
}

This approach avoids rate limiting from 1,872 simultaneous NASA logins and eliminates an AWS Secrets Manager dependency.

Output Credentials (external write targets)¶

By default the function writes the output store with its execution role against the in-account bucket; omit output_credentials entirely to keep this behavior. To write an external or S3-compatible target (another account, or e.g. source.coop) without changing the execution role, supply output_credentials in the event — symmetric to how s3_credentials injects read credentials:

from zagg import load_config, agg

results = agg(
    config, catalog="catalog.json", backend="lambda",
    store="s3://us-west-2.opendata.source.coop/org/dataset.zarr",
    output_credentials={  # runtime-only; never store in config/YAML
        "accessKeyId": "ASIA...",
        "secretAccessKey": "...",
        "sessionToken": "...",        # optional
        # "endpointUrl": "https://...",  # optional: R2/MinIO etc.
        # "region": "us-west-2",         # optional
    },
)

From the CLI, point --output-creds at a JSON file holding that dict (keeps secrets out of shell history):

python -m zagg --config atl06.yaml --catalog catalog.json --backend lambda \
  --store s3://us-west-2.opendata.source.coop/org/dataset.zarr \
  --output-creds /path/to/output-creds.json

The non-secret endpoint_url / region may also be set in the config's output: section (overridable at runtime); credentials are runtime-only. source.coop uses the standard AWS S3 endpoint with injected STS credentials — endpointUrl is only needed for non-AWS S3-compatible stores. Dotted bucket names (e.g. us-west-2.opendata.source.coop) and custom endpoints use path-style addressing automatically.

Deployment¶

Recommended: CloudFormation standup¶

The recommended way to stand up the backend in a fresh AWS account is the committed CloudFormation template, driven by stand_up.sh, which creates the execution role, dependency layer, and function in one stack:

OUTPUT_BUCKET=my-results-bucket bash deployment/aws/stand_up.sh

See Standing Up the Backend for the full walkthrough: what the script does, the parameter/environment-variable reference, cross-region staging, and teardown. By default (CreateExecutionRole=true) the stack creates the IAM execution role for you; the only exception is an account whose deploy identity cannot create IAM roles (e.g. an AWS SSO "power user" set) — see Execution Role for that IAM-constrained, legacy/unverified path.

Legacy / manual deploy¶

Not the recommended path

The steps below hand-assemble the function zip and create/update the Lambda with raw aws lambda calls. They are kept for understanding what the template builds and for one-off tweaks, but the CloudFormation standup above is the preferred, reproducible way to deploy. The maintainer in-place code updater deployment/aws/deploy.sh (pulls the latest CI artifacts and runs aws lambda update-function-code) is a convenience over the manual update-function-code step; it updates an already-deployed function and does not create the role/function/bucket.

Step 1: Create the function package¶

cd /path/to/zagg

# Create function.zip with handler and zagg package
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
  cd src && zip -ur ../deployment/aws/function.zip zagg/ -i "*.py" && cd ..

Step 2: Build and deploy the Lambda layer¶

See ARM64 Layer for building and deploying the Lambda layer.

Step 3: Create the Lambda function¶

aws lambda create-function \
  --function-name process-shard \
  --runtime python3.12 \
  --architectures arm64 \
  --role arn:aws:iam::ACCOUNT_ID:role/lambda-execution-role \
  --handler lambda_handler.lambda_handler \
  --zip-file fileb://deployment/aws/function.zip \
  --timeout 720 \
  --memory-size 2048 \
  --layers arn:aws:lambda:REGION:ACCOUNT_ID:layer:zagg-layer-arm64:VERSION

Updating function code¶

# Re-create the zip
zip -j deployment/aws/function.zip deployment/aws/lambda_handler.py && \
  cd src && zip -ur ../deployment/aws/function.zip zagg/ -i "*.py" && cd ..

# Update the Lambda function
aws lambda update-function-code \
  --function-name process-shard \
  --zip-file fileb://deployment/aws/function.zip

Testing¶

# Raise the open-file limit before fanning out: each concurrent worker holds
# one socket to the Lambda endpoint, and the default soft limit (often 256)
# would otherwise cap concurrency. See "Concurrency, workers, and file
# descriptors" below.
ulimit -n 8192

# Build a shard map
uv run python -m zagg.catalog --config atl06.yaml --short-name ATL06 --cycle 22 \
    --polygon antarctica.geojson

# Test locally first (no Lambda required)
uv run python -m zagg --config atl06.yaml --catalog catalog.json \
  --store ./test.zarr --max-cells 1

# Dry run with the Lambda orchestrator
uv run python deployment/aws/invoke_lambda.py \
  --config atl06.yaml --catalog catalog.json --dry-run

Concurrency, workers, and file descriptors¶

The Lambda backend fans out one synchronous invoke per cell across a thread pool, and each in-flight worker holds an open socket to the Lambda endpoint. Two limits bound how many can run at once, and the orchestrator checks both before dispatch so cells are never silently dropped:

Open file descriptors (ulimit -n). If concurrent workers exceed the process's open-file soft limit (256 on stock macOS / many Linux shells), invokes fail with OSError: [Errno 24] Too many open files — a client-side failure AWS never sees. The runner derives a safe ceiling from the soft limit and surfaces errno-24 with actionable guidance instead of a raw connection error. Raise the limit before a large run: ulimit -n 8192.
Account Lambda concurrency. The runner reads the account ConcurrentExecutions ceiling and current usage (CloudWatch) and clamps workers to the available headroom (5% padding, floored at 100 free slots), so a run can't saturate the account pool and throttle itself or other Lambda activity. This degrades gracefully if the dispatch role lacks lambda:GetAccountSettings / cloudwatch:GetMetricStatistics — it then bounds workers by the FD limit alone.

Keep --max-workers ≤ min(ulimit -n − headroom, account concurrency). The orchestrator enforces this automatically; setting ulimit -n higher simply raises the FD ceiling it can use.

Performance¶

Metric	Value
Average execution time	2--3 minutes per cell
Maximum execution time	10 minutes
Lambda timeout	12 minutes (720s)
Configured memory	2048 MB
Typical memory usage	1--1.5 GB
Cold start	3--5 seconds

Cost Estimate¶

Per invocation (180s average, 2 GB memory): ~$0.006

Full run (~1,300 cells at order 6): ~$2 including S3 and CloudWatch costs.

Troubleshooting¶

Missing s3_credentials

Ensure your orchestrator script calls get_nsidc_s3_credentials and passes the credentials to each Lambda invocation.

No granules found

This is normal for cells outside the data coverage area. The function returns gracefully with error: "No granules found".

S3 write permission denied

Check that the Lambda execution role has s3:PutObject permission for the output bucket.

Too many open files

[Errno 24] Too many open files means concurrent workers exceeded the open-file soft limit and cells would be dropped. Raise it (ulimit -n 8192) or lower --max-workers. See "Concurrency, workers, and file descriptors" above — the orchestrator now clamps workers to the FD and account-concurrency limits automatically.