Skip to content

extract_layer silently returns empty DataFrame when the layer URL responds with an Esri error JSON #3

Description

@ctenold

Environment:

ezesri 0.3.3
Python 3.12
geopandas / pandas current

Summary

When extract_layer(url) is called with a layer URL that responds successfully (HTTP 200) but with an Esri-style error JSON ({"error": {"code": ..., "message": ...}}), the function returns an empty pd.DataFrame with no warning. The caller has no way to distinguish "layer is a non-geometry table with 0 rows" from "the metadata endpoint returned an error" without re-fetching the URL themselves.

Reproduction

These three Wisconsin county GIS endpoints reproduce it today (Service not started error is intermittent); each returns HTTP 200 with an error body when its underlying service is down or its layer index is wrong:

import ezesri
from urllib3.exceptions import InsecureRequestWarning
import warnings, requests
warnings.filterwarnings("ignore", category=InsecureRequestWarning)

# These are the kinds of responses the endpoints return for ?f=json:
#   {"error": {"code": 404, "message": "Layer not found", ...}}
#   {"error": {"code": 500, "message": "Service ... not started", ...}}
urls = [
    "https://arcgisweb.sccwi.gov/arcgis/rest/services/StCroixLFparcel/Parcels_wPA_LF/MapServer/3",       # 404 Layer not found (changed their layer id)
    "https://public1.co.waupaca.wi.us/arcgis/rest/services/Public/LandRecords/MapServer/12",            # 500 Service not started
    "https://gis.woodcountywi.gov/gis/rest/services/LandRecordsViewer/FoundationalElements/MapServer/20", # 500 Service not started
]

for url in urls:
    result = ezesri.extract_layer(url)
    print(type(result).__name__, len(result))
    # → DataFrame 0

Expected

One of:

Raise a clear exception, e.g. ezesri.LayerMetadataError(error_code, error_message).
Or return an empty GeoDataFrame consistently (currently you get a plain DataFrame, which makes downstream isinstance(result, GeoDataFrame) checks misleading).

Actual

Returns an empty pandas.DataFrame. Downstream code that expects a GeoDataFrame for a feature layer cannot distinguish this from a layer that legitimately has no geometry, leading to silent data loss in pipelines.

Root cause

In ezesri/extract.py, extract_layer does:

metadata = get_metadata(url)
if not metadata:
return gpd.GeoDataFrame()

has_geometry = metadata.get("geometryType") is not None
When the server responds with {"error": {...}}, metadata is a non-empty dict (truthy), so the if not metadata guard doesn't fire. geometryType is absent, so has_geometry becomes False, and the function continues down the table (non-geometry) code path, eventually returning an empty pd.DataFrame.

Suggested fix

Detect the error key before the empty-metadata check:

metadata = get_metadata(url)
if not metadata:
return gpd.GeoDataFrame()

if isinstance(metadata, dict) and "error" in metadata:
err = metadata["error"]
raise ValueError(
f"Esri layer metadata request failed: "
f"code={err.get('code')} message={err.get('message')}"
)
A dedicated exception class (e.g. EsriLayerError) would be even better so callers can except it specifically and decide whether to fall back to another source.

Why this matters

I have a pipeline downloading from multiple county ArcGIS endpoints. The silent empty-DataFrame return caused corrupted files to be saved and propagated through downstream normalization steps, masking the real problem (county-side service outages and URL changes) for some time before we caught it. Raising would have made the upstream failure immediately diagnosable.

P.S. I appreciate this project, thanks for open sourcing it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions