Skip to content

Configuration

Portolan stores configuration in .portolan/config.yaml within your catalog directory.

Quick Start

# .portolan/config.yaml
remote: s3://my-bucket/catalog
profile: production    # AWS profile (alias: aws_profile)
region: us-west-2      # AWS region for S3

Setting Configuration

# Set remote storage URL
portolan config set remote s3://my-bucket/catalog

# Set AWS profile (either name works)
portolan config set profile production
# portolan config set aws_profile production  # Also valid

# Set AWS region
portolan config set region us-west-2

# View current settings
portolan config list

Configuration Precedence

Settings are resolved in this order (highest to lowest):

  1. CLI argument (--remote s3://...)
  2. Environment variable (PORTOLAN_REMOTE=s3://...)
  3. Collection config (in collections: section)
  4. Catalog config (top-level in config.yaml)
  5. Built-in default

Conversion Configuration

Control how Portolan handles different file formats during check and convert operations.

Use Cases

Scenario Configuration
Force-convert FlatGeobuf to GeoParquet extensions.convert: [fgb]
Keep Shapefiles as-is extensions.preserve: [shp]
Preserve everything in archive/ paths.preserve: ["archive/**"]

Full Example

# .portolan/config.yaml
remote: s3://my-bucket/catalog

conversion:
  extensions:
    # Force-convert these cloud-native formats to GeoParquet
    convert:
      - fgb      # FlatGeobuf

    # Keep these formats as-is (don't convert)
    preserve:
      - shp      # Shapefiles
      - gpkg     # GeoPackage

  paths:
    # Glob patterns for files to preserve regardless of format
    preserve:
      - "archive/**"           # Everything in archive/
      - "regulatory/*.shp"     # Regulatory shapefiles
      - "legacy/**"            # Legacy data directory

Extension Overrides

extensions.convert

Force-convert cloud-native formats to GeoParquet. Use when:

  • You want consistent columnar format for analytics
  • Your tooling prefers GeoParquet over FlatGeobuf
conversion:
  extensions:
    convert:
      - fgb       # FlatGeobuf -> GeoParquet

extensions.preserve

Keep convertible formats as-is. Use when:

  • Regulatory requirements mandate original format
  • Downstream tools require specific formats
  • You're preserving archival data
conversion:
  extensions:
    preserve:
      - shp       # Keep Shapefiles
      - gpkg      # Keep GeoPackage
      - geojson   # Keep GeoJSON

Path Patterns

Use glob patterns to override behavior for specific directories or files.

conversion:
  paths:
    preserve:
      - "archive/**"           # All files in archive/ and subdirectories
      - "regulatory/*.shp"     # Only .shp files in regulatory/
      - "**/*.backup.geojson"  # Any .backup.geojson file

Pattern syntax:

  • * matches any characters except /
  • ** matches any characters including /
  • ? matches any single character

Precedence: Path patterns override extension rules. A FlatGeobuf file in archive/ will be preserved even if extensions.convert: [fgb] is set.

COG Settings

Configure Cloud-Optimized GeoTIFF conversion parameters. By default, Portolan uses ADR-0019 defaults (DEFLATE compression, predictor=2, 512×512 tiles, nearest resampling).

conversion:
  cog:
    compression: JPEG      # DEFLATE (default), JPEG, LZW, ZSTD, WEBP
    quality: 95            # Quality 1-100 (applies to JPEG and WEBP)
    tile_size: 512         # Internal tile size in pixels
    predictor: 2           # 1=none, 2=horizontal (default), 3=floating point
    resampling: nearest    # Overview resampling: nearest, bilinear, cubic, etc.

Validation

Invalid settings produce warnings but don't block conversion. Quality is clamped to 1-100, and unknown compression/resampling values are passed through to let rio-cogeo handle errors.

Use Cases

Scenario Configuration
RGB imagery (smaller files) compression: JPEG, quality: 95
Elevation data (lossless) compression: DEFLATE, predictor: 3
Analytics (fast reads) compression: LZW, tile_size: 256

Available Compression Methods

Method Best For Notes
DEFLATE General use (default) Lossless, universal compatibility
LZW Fast compression/decompression Lossless, slightly larger files
ZSTD High compression ratio Lossless, requires GDAL 2.3+
JPEG RGB imagery Lossy, smallest files for photos
WEBP Web display Lossy, modern browsers only

Collection-Level Configuration

Override settings for specific collections using the collections: section:

# .portolan/config.yaml
remote: s3://default-bucket/catalog

collections:
  public-data:
    remote: s3://public-bucket/data

  analytics:
    conversion:
      extensions:
        convert: [fgb]  # Force GeoParquet for analytics queries

  archive:
    conversion:
      extensions:
        preserve: [shp, gpkg, geojson]  # Preserve all original formats

This approach works well for most catalogs. For large catalogs with many collections, see Hierarchical Configuration below.

Hierarchical Configuration (Optional)

For large catalogs or when different maintainers manage different collections, you can optionally create .portolan/ folders at collection or subcatalog levels:

catalog/
  .portolan/
    config.yaml           # Catalog defaults
  demographics/
    .portolan/
      config.yaml         # Collection-specific overrides (optional)
    collection.json
  historical/             # Subcatalog
    .portolan/
      config.yaml         # Subcatalog defaults (optional)
    census-1990/
      collection.json

This is entirely optional. Benefits include:

  • Scalability: Avoids one giant config file with 100+ collection entries
  • Ownership: Collection maintainers edit their own folder without touching root
  • Git-friendly: Changes to one collection don't create merge conflicts in root

Inheritance Rules

Settings are inherited from parent levels. Child values override parent values:

# catalog/.portolan/config.yaml
aws_profile: default
remote: s3://catalog/

# catalog/demographics/.portolan/config.yaml
remote: s3://demographics/  # Overrides parent
# aws_profile inherited from catalog

Precedence

When both approaches are used, folder config takes precedence over collections: section:

CLI > Env var > Collection folder config > Subcatalog folder config >
  Root collections: section > Catalog config > Default

When to Use Each Approach

Approach Best For
collections: section Small catalogs, simple overrides
Hierarchical folders Large catalogs, multiple maintainers, verbose metadata

Most users should start with collections: and only add per-collection .portolan/ folders when needed

Environment Variables

All settings can be set via environment variables with the PORTOLAN_ prefix:

Setting Environment Variable Notes
remote PORTOLAN_REMOTE
aws_profile PORTOLAN_AWS_PROFILE
profile PORTOLAN_PROFILE Alias for aws_profile
region PORTOLAN_REGION AWS region for S3

Environment variables override config file settings but are overridden by CLI arguments.

Setting Aliases

Some settings have aliases for convenience:

Canonical Name Alias
aws_profile profile

Both names work interchangeably in config files and environment variables.

Metadata Enrichment

In addition to config.yaml, Portolan supports .portolan/metadata.yaml for human-enrichable metadata that supplements STAC.

Purpose

STAC provides machine-extractable metadata (title, description, extent, columns). metadata.yaml adds human-only fields that can't be derived automatically:

Field Purpose
contact Accountability (name, email)
license SPDX identifier (e.g., CC-BY-4.0, MIT)
citation Academic citation text
doi Zenodo/DataCite DOI
known_issues Data quality caveats
source_url Link to original data source
processing_notes Documentation of transformations applied
keywords Tags for search/discovery (rendered as badges)
attribution Credit to data provider or organization

Quick Start

# Generate template
portolan metadata init

# Validate required fields
portolan metadata validate

# Generate README from STAC + metadata
portolan readme

Example

# .portolan/metadata.yaml
contact:
  name: Data Team
  email: data@example.org

license: CC-BY-4.0

# Optional enrichment fields
license_url: https://creativecommons.org/licenses/by/4.0/
citation: "Census Bureau (2024). Demographics Dataset. DOI: 10.5281/zenodo.1234567"
doi: 10.5281/zenodo.1234567
known_issues: "Coverage gaps in rural areas for 2020 data."

# Provenance and discovery
source_url: https://data.census.gov/demographics
processing_notes: |
  - Reprojected from NAD83 to EPSG:4326
  - Simplified geometries for web display
  - Joined with income data from ACS 2020
keywords:
  - census
  - demographics
  - population
attribution: "U.S. Census Bureau"

Required Fields

Only two fields are required in metadata.yaml:

  • contact.name and contact.email - Who maintains this data
  • license - SPDX identifier (validated against common licenses)

Title and description come from STAC metadata (set during portolan init).

Hierarchical Inheritance

Like config.yaml, metadata.yaml supports hierarchical resolution:

catalog/
  .portolan/
    metadata.yaml         # Default contact and license
  demographics/
    .portolan/
      metadata.yaml       # Override or add collection-specific fields

Child values override parent values. Use this to set catalog-wide defaults (license, contact) while adding collection-specific fields (known_issues, citation).

README Generation

The portolan readme command generates README.md by combining:

From STAC (automatic): - Title, description - Spatial/temporal coverage - Schema columns (from table:columns) - Bands (from eo:bands, raster:bands) - Files with checksums - Code examples based on format

From metadata.yaml (human): - License, contact - Citation, DOI - Known issues - Source URL, processing notes - Keywords (as badges), attribution

# Generate README.md
portolan readme

# Preview without writing
portolan readme --stdout

# Check if README is up-to-date (for CI)
portolan readme --check

# Generate for catalog and all collections
portolan readme --recursive

Catalog-level README: When run at catalog root, generates an index README with: - Aggregated spatial extent (envelope of all collections) - Aggregated temporal extent (earliest to latest) - List of collections with links

Data Defaults

When source files lack certain metadata (nodata values, temporal info), you can specify defaults in metadata.yaml:

# .portolan/metadata.yaml
defaults:
  temporal:
    year: 2025              # Items default to 2025-01-01
    # Or explicit bounds:
    # start: "2025-04-15"
    # end: "2025-05-30"

  raster:
    nodata: 0               # Uniform nodata for all bands
    # Or per-band:
    # nodata: [0, 0, 255]

Behavior:

Scenario Result
Source file has value File value used (defaults don't override)
Source file lacks value Default applied
CLI flag provided CLI flag overrides default
No default, no source value Field left null

Validation:

  • temporal.year must be an integer between 1800 and 2100
  • temporal.start/temporal.end must be valid ISO dates (YYYY-MM-DD)
  • Specifying both year and start is an error (use one or the other)
  • raster.nodata must be a finite number (no NaN or Infinity)
  • Per-band nodata lists must match the raster's band count exactly

See the Metadata Defaults Guide for detailed usage.