refactor(api): migrate to httpx/Hishel, simplify modules, and extend client config#7
Merged
Conversation
- Add request_retries, retry_backoff_factor, max_retry_delay, retry_status_codes - Refactor __init__ with explicit env var handling - Add quota_cache_file and use_global_cache support - Add tests for explicit override and retry env vars
- BDLHTTPError for HTTP failures with status_code, response_body, url - BDLResponseError for invalid/unexpected payloads - Used by BaseAPIClient for structured error handling
- Use per-instance rate limiters instead of global shared limiters - Extract _build_proxy_url and _build_default_headers helpers - Add httpx.AsyncClient for async requests with proxy support - Add close() and aclose() for resource cleanup - Support quota_cache_file and use_global_cache via PersistentQuotaCache
- Extract _list_params, _search_params static helpers where applicable - Trim docstrings for brevity - Reduce duplication across aggregates, attributes, levels, measures, subjects, units, variables, years, version
- Consolidate parameter building and request logic - Reduce code duplication in sync/async paths - Align with other API module patterns
- Add enrichment.py with with_enrichment decorator and EnrichmentSpec - Support levels, measures, subjects, units, attributes, aggregates lookups - Add _enrichment_cache to BaseAccess for deduplication - Refactor BaseAccess._get_calling_function_name (use sys._getframe) - Apply enrichment to aggregates, units, variables, data access methods - Add variable_ids param support and get_data_by_variable_with_metadata
- Replace SimpleNamespace with typed APINamespace dataclass - Add close() and aclose() for resource cleanup - Add __enter__/__exit__ and __aenter__/__aexit__ for sync/async context managers
- Move matplotlib, numpy, seaborn to optional [viz] extra - Add platformdirs for quota cache paths - Remove dataclasses (stdlib in 3.11+)
- Add installation, quick start, configuration, API layers sections - Document optional [viz] extra and env vars - Update examples.ipynb for enrichment and context manager usage
- Add test_enrichment.py for with_enrichment decorator - Add test_enrichment_integration.py for access layer - Add test_enrichment_e2e.py for live API - Add enrich_levels tests to aggregates, data, units, variables integration
- Adapt test_api_client for per-instance limiters, close methods - Adapt test_api_data for simplified data API - Add test_client tests for context manager and aclose
- Support custom_file to use explicit path instead of default cache dir - Create parent dirs when custom_file is used - Used by PersistentQuotaCache for flexible cache location
- cache_backend: 'memory' | 'file' | None (replaces use_cache boolean) - raise_on_rate_limit: raise RateLimitError vs wait when quota exhausted - http_429_max_retries, http_429_max_delay: separate 429 retry policy - Add QuotaMap type alias for quota configuration
…tches - Use pybdl.utils.cache.resolve_cache_file_path for cache location - Add remove_last_if_matches for atomic slot refund (used by rate limiter release)
…refund - acquire() now returns float | None (monotonic time) for later release - Add release(recorded_at) to refund a slot (HTTP errors, retries) - Add raise_on_limit param passed from config - Empty quotas: acquire returns None immediately
…ith respx - hishel[async] for HTTP caching (memory/file sqlite backends) - respx for httpx-compatible request mocking in tests - Remove requests, requests-cache, responses, types-requests - Add real_rate_limiting pytest marker
…mit release - Replace requests/requests-cache with hishel SyncCacheClient/AsyncCacheClient - Support cache_backend: memory (sqlite :memory:), file (sqlite db), or disabled - Pass raise_on_limit to rate limiters from config - Call rate limiter release() on HTTP errors for quota refund - Parse Retry-After header (seconds or HTTP-date) for 429/5xx retries - Use httpx exclusively (remove Response | httpx.Response unions)
- Add _DataJsonPayload, _DataWithMetadata, _DataCollectionResult - Use cast for get_*_with_metadata return type narrowing - Simplify variable_ids resolution (inline ternary)
- Simplify _normalize_variable_ids ternary in data.py - Add return type annotations to with_enrichment decorator - Reorder imports in enrichment.py
- Add CHANGELOG.md (Keep a Changelog format) - Add docs/changelog.md for Sphinx changelog page - Expand access_layer, config, main_client, rate_limiting - Update index and appendix - Add changelog to conf.py - Update examples.ipynb for new features - README: document cache_backend, rate limit env vars
- Replace responses with respx in conftest (paginated_mock uses MockRouter) - Add real_rate_limiting marker skip in api conftest - Add test_api_client_cache.py for hishel cache backend - Add test_rate_limiter_lifecycle.py for acquire/release - Update API tests for hishel/httpx - Update config tests for cache_backend, raise_on_rate_limit, HTTP 429 - Update client, enrichment, data access tests - Add samples_raw_subjects.json entries
Contributor
Contributor
Contributor
…rkflow - Simplified the release workflow by removing the push and pull_request triggers, retaining only the workflow_dispatch event.
- Adjusted the example formatting in the BaseAccess class to use a block-style example for clarity.
- Bump project version in pyproject.toml from 0.0.1 to 0.1.0. - Include .worktrees directory in .gitignore to prevent tracking of worktree files.
Contributor
Test Results538 tests 519 ✅ 7s ⏱️ For more details on these failures, see this check. Results for commit 78259d7. |
Contributor
Test Results538 tests 519 ✅ 7s ⏱️ For more details on these failures, see this check. Results for commit 78259d7. |
Contributor
Test Results538 tests 527 ✅ 7s ⏱️ Results for commit 654b0df. ♻️ This comment has been updated with latest results. |
- Streamlined the type conversion process for DataFrame columns in the BaseAccess class. - Enhanced checks for object and string dtypes before attempting conversion to numeric or boolean types. - Improved handling of non-null values to ensure accurate type casting.
- Refactored the numeric conversion checks for DataFrame columns to improve readability and efficiency. - Consolidated conditional statements for dtype checks and non-null value handling. - Ensured accurate type casting for integer-like values in the DataFrame.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch modernizes the pyBDL API layer and public client: HTTP traffic moves from
requestsandrequests-cacheto httpx with optional Hishel caching, configuration grows structured retry, cache-backend, and rate-limit options, and API modules are flattened behind shared helpers and clearer return typing. The top-level client gains context-manager usage and typed namespaces; the access layer adds a DataFrame enrichment decorator and tighter typing on data paths. Documentation (README, Sphinx, examples notebook), root changelog, and tests are updated throughout, including replacingresponseswith respx and adding cache lifecycle and enrichment coverage. Release automation and versioning were adjusted (workflow triggers,0.1.0,.worktreesin.gitignore).Purpose & Context
The goal is a maintainable HTTP stack that aligns sync and async usage, gives callers explicit control over caching and backoff (including HTTP 429), and keeps rate limiting accurate when work is cancelled or retried—hence acquire returning a timestamp and a release path for quota refund. API modules had duplicated parameter wiring; consolidating that reduces drift and makes endpoint modules easier to review. The release workflow change removes
pushandpull_requesttriggers so releases follow the intended manual or tag-driven process.Changes Made
BDLConfigsupports cache backend selection, retry settings, and rate-limit-related options;BaseAPIClientuses per-instance limiters and raisesBDLHTTPError/BDLResponseErrorwhere appropriate.pybdl/api/(notablydata,subjects,units, and shared patterns across aggregates, attributes, levels, measures, variables, years, version) with type aliases and casts for data API returns.resolve_cache_file_pathandremove_last_if_matches; sharedresolve_cache_file_pathinpybdl/utils/cache.py.enrichmentmodule and decorator; updates tobase,data, aggregates, units, variables.platformdirs, optionalvizextra; lockfile refresh; dev dependency swap to respx.CHANGELOG.md;.gitignoreadds.worktrees.Testing
This repository typically runs
make all(Ruff format/check, Bandit, Mypy, pytest with coverage) andmake docsfor a strict Sphinx HTML build. This description was generated from git history only; the full suite was not re-run in that session. Reviewers should rely on CI and, locally,uv run pytestormake testplusmake docsif validating before merge.Review Focus Areas
requests-specific types or prior config field names.Dependencies & Side Effects
viz.Deployment Notes
.github/workflows/release.ymlbehavior after removingpushandpull_requesttriggers matches the team’s release policy.