Extracting Data from ArcGIS Services¶
Portolan can extract data directly from ArcGIS REST services:
- FeatureServer/MapServer: Vector data → GeoParquet files
- ImageServer: Raster imagery → Cloud-Optimized GeoTIFF (COG) tiles
Quick Start¶
# Extract all layers from a FeatureServer
portolan extract arcgis https://services.arcgis.com/.../FeatureServer ./output
# Extract tiles from an ImageServer (uses bbox to limit area)
portolan extract arcgis https://example.com/.../ImageServer ./output --bbox "minx,miny,maxx,maxy"
# Preview what would be extracted (dry run)
portolan extract arcgis URL --dry-run
Service Types¶
Portolan auto-detects the service type from the URL:
| URL Pattern | Service Type | Output Format |
|---|---|---|
.../FeatureServer |
Vector features | GeoParquet |
.../MapServer |
Vector features | GeoParquet |
.../ImageServer |
Raster imagery | COG tiles |
FeatureServer / MapServer Extraction¶
Basic Usage¶
Point Portolan at any ArcGIS FeatureServer or MapServer URL:
portolan extract arcgis \
https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Census_2020/FeatureServer \
./census_2020
This will:
- Discover all layers in the service
- Extract each layer to GeoParquet format
- Apply Hilbert spatial sorting for efficient queries
- Generate an extraction report in
.portolan/extraction-report.json
Filtering Layers¶
Use glob patterns to extract specific layers:
# Include only layers matching patterns
portolan extract arcgis URL --layers "Census*,Transport*"
# Exclude layers matching patterns
portolan extract arcgis URL --exclude-layers "*_Archive,*_Backup"
# Combine include and exclude
portolan extract arcgis URL --layers "Census*" --exclude-layers "*_2010"
Pattern syntax uses fnmatch:
*matches any characters?matches a single character- Examples:
sdn_*,*_2024,cod_ab_*
Output Structure¶
Each layer becomes a collection with the parquet file as a collection-level asset:
output/
├── .portolan/
│ └── extraction-report.json # Extraction metadata
├── census_block_groups/
│ ├── collection.json
│ └── census_block_groups.parquet
└── census_tracts/
├── collection.json
└── census_tracts.parquet
ImageServer Extraction¶
Basic Usage¶
Extract raster imagery from an ArcGIS ImageServer:
portolan extract arcgis \
https://sampleserver6.arcgisonline.com/arcgis/rest/services/Toronto/ImageServer \
./toronto-imagery
This will:
- Query service metadata (extent, CRS, pixel size, bands)
- Compute a tile grid covering the service extent
- Download tiles via
exportImageAPI - Convert each tile to Cloud-Optimized GeoTIFF (COG)
- Create STAC items for each tile with spatial metadata
- Generate an extraction report
Limiting Extraction Area¶
For large ImageServers, use --bbox to extract a subset:
# Extract only tiles within bounding box (in service CRS coordinates)
portolan extract arcgis URL --bbox "-8841000,5405000,-8840000,5406000"
Important: The bbox coordinates must be in the service's native CRS (check the service metadata for spatialReference.wkid).
ImageServer Options¶
# Tile size in pixels (default: 4096)
portolan extract arcgis URL --tile-size 2048
# Maximum concurrent downloads (default: 4)
portolan extract arcgis URL --max-concurrent 8
# COG compression (default: deflate)
portolan extract arcgis URL --compression jpeg # Good for RGB imagery
Output Structure¶
Raster data uses item-level assets — each tile becomes a STAC item:
output/
├── .portolan/
│ ├── config.yaml
│ ├── extraction-report.json
│ └── imageserver-resume.json # For resuming interrupted extractions
├── catalog.json
└── tiles/ # Collection (one per ImageServer)
├── collection.json
├── versions.json
├── tile_0_0/
│ ├── tile_0_0.json # STAC item with tile bbox
│ └── tile_0_0.tif # COG asset
├── tile_0_1/
│ ├── tile_0_1.json
│ └── tile_0_1.tif
└── ...
Adding Metadata After Extraction¶
Extraction creates STAC metadata but not metadata.yaml. Per ADR-0038, contact and license info must be added manually:
# Create metadata.yaml in the collection's .portolan directory
mkdir -p tiles/.portolan
cat > tiles/.portolan/metadata.yaml << 'EOF'
contact:
name: Your Name
email: your.email@example.com
license: CC-BY-4.0
source_url: https://example.com/.../ImageServer
EOF
# Generate README from STAC + metadata.yaml
portolan readme tiles
Common Options¶
These options work for both FeatureServer and ImageServer:
Controlling Extraction¶
# Request timeout in seconds (default: 60 for vectors, 120 for rasters)
portolan extract arcgis URL --timeout 120
# Retry failed requests (default: 3 attempts)
portolan extract arcgis URL --retries 5
Resume Failed Extractions¶
If an extraction fails partway through, resume from where you left off:
# Initial extraction (fails partway)
portolan extract arcgis URL ./output
# Resume - skips already-extracted layers/tiles
portolan extract arcgis URL ./output --resume
Dry Run Mode¶
Preview what would be extracted without downloading any data:
portolan extract arcgis URL --dry-run
JSON Output¶
For automation and scripts, use JSON output:
portolan extract arcgis URL --json
Non-Interactive Mode¶
Skip confirmation prompts (useful in scripts):
portolan extract arcgis URL --auto
Extraction Report¶
The extraction report (.portolan/extraction-report.json) contains:
- Source URL and extraction timestamp
- Metadata extracted from the ArcGIS service
- Per-layer/tile results: status, count, file size, duration, any errors
- Summary: totals for succeeded, failed, skipped
Example (FeatureServer):
{
"extraction_date": "2024-03-15T10:30:00Z",
"source_url": "https://services.arcgis.com/.../FeatureServer",
"summary": {
"total_layers": 5,
"succeeded": 4,
"failed": 1,
"total_features": 125000,
"total_size_bytes": 45000000
}
}
Tips¶
Finding ArcGIS Services¶
ArcGIS services are typically found at URLs like:
https://services.arcgis.com/{org_id}/ArcGIS/rest/services/{service_name}/FeatureServerhttps://gis.example.com/arcgis/rest/services/{folder}/{service_name}/ImageServer
You can browse available services at the root:
https://services.arcgis.com/{org_id}/ArcGIS/rest/services
Large Services¶
For services with many layers or large datasets:
- Use
--dry-runfirst to see what will be extracted - Filter with
--layers(vectors) or--bbox(rasters) - Use
--resumeif extraction is interrupted - Increase parallelism with
--workers(vectors) or--max-concurrent(rasters)
Error Handling¶
If a layer/tile fails to extract:
- The extraction continues with remaining items
- Failed items are recorded in the report with error details
- Use
--resumeto retry only failed items
Requirements¶
- geoparquet-io — Vector extraction (automatically installed)
- rio-cogeo — COG conversion (automatically installed)
- Network access to the ArcGIS service