Creating Nested Catalogs¶

Portolan supports hierarchical catalog structures where directories automatically become subcatalogs. This is useful for organizing large datasets by theme, region, or time period.

Quick Start¶

# Organize your data into themed directories
mkdir -p my-catalog/{climate,environment,housing}
cp climate-data/*.parquet my-catalog/climate/
cp env-data/*.parquet my-catalog/environment/

# Initialize and add everything
cd my-catalog
portolan init --auto --title "My Regional Data"
portolan add . --workers 4

# Add metadata and generate documentation
portolan metadata init
# Edit .portolan/metadata.yaml with your info
portolan readme --recursive

How Directory Structure Maps to STAC¶

Portolan infers the catalog hierarchy from your directory layout:

my-catalog/                    # Root catalog (catalog.json)
├── climate/                   # Subcatalog (climate/catalog.json)
│   ├── temperature/          # Collection (climate/temperature/collection.json)
│   │   └── temperature.parquet
│   └── precipitation/        # Collection
│       └── precipitation.parquet
└── demographics/              # Subcatalog
    └── census-2020/          # Collection
        └── census.parquet

When you run portolan add ., Portolan:

Creates catalog.json at the root with links to subcatalogs
Creates catalog.json in each intermediate directory (subcatalogs)
Creates collection.json + item metadata in leaf directories (collections)
Generates versions.json for tracking at each level

Bulk Adding Files¶

Process many files efficiently with parallel workers:

portolan add . --workers 4 --verbose

The --verbose flag shows progress for each file. Without it, only changed/added files appear.

Metadata and READMEs¶

Setting Up Metadata¶

portolan metadata init

This creates .portolan/metadata.yaml with required fields (contact, license) and optional fields (citation, keywords, source URL, known issues).

Example:

contact:
  name: "Data Team"
  email: "data@example.org"

license: "CC-BY-4.0"
license_url: "https://creativecommons.org/licenses/by/4.0/"

keywords:
  - climate
  - regional data
  - open data

source_url: "https://data.example.org/"
processing_notes: "Converted from Shapefile to GeoParquet with Hilbert sorting."
known_issues: "Temporal extent not specified for most datasets."

Generating READMEs¶

portolan readme --recursive

This generates README.md files at every level — root catalog, subcatalogs, and collections. Metadata from the root cascades down, so you only need to edit one metadata.yaml for consistent attribution across all READMEs.

To preview without writing:

portolan readme --stdout

Validation¶

Check the catalog structure and data formats:

portolan check --verbose

This validates:

STAC metadata completeness
Cloud-native format compliance (GeoParquet, COG)
Provisional datetime warnings (items without explicit dates)

Example: The Hague Open Data¶

A real-world example with 6 thematic subcatalogs and 23 collections:

den-haag/
├── catalog.json
├── climate/           # 3 collections: heat maps, climate scores
├── environment/       # 7 collections: air quality, noise, soil
├── housing/           # 1 collection: energy labels
├── infrastructure/    # 3 collections: waste, zones, storage
├── nature/            # 7 collections: species, habitats, trees
└── water/             # 2 collections: gauges, water bodies

Created with:

portolan init --auto --title "The Hague Open Data" \
  --description "Municipal open data from Den Haag, Netherlands"
portolan add . --workers 4
portolan metadata init
# Edit .portolan/metadata.yaml
portolan readme --recursive
portolan check

Tips¶

Start flat, restructure later. You can reorganize directories and re-run portolan add . — Portolan regenerates the STAC hierarchy from the current structure.

One metadata.yaml for consistency. Root-level metadata cascades to all READMEs. Only create collection-level metadata.yaml files when you need overrides.

Use --workers for large catalogs. Parallel processing significantly speeds up metadata extraction for catalogs with many files.