Skip to content

feat: add async extract jobs support to SDK#93

Open
erdanzhang wants to merge 2 commits into
mainfrom
zzhang/async_extract
Open

feat: add async extract jobs support to SDK#93
erdanzhang wants to merge 2 commits into
mainfrom
zzhang/async_extract

Conversation

@erdanzhang

Copy link
Copy Markdown

Summary

This PR adds async extract jobs support to the ade-python SDK, following the same pattern as the existing parse_jobs implementation.

Changes

1. New Extract Jobs Resource (src/landingai_ade/resources/extract_jobs.py)

  • Implemented ExtractJobsResource and AsyncExtractJobsResource classes
  • Added create(), get(), and list() methods for both sync and async versions
  • Proper multipart form data handling for markdown files
  • Support for raw response and streaming response wrappers

2. Client Integration (src/landingai_ade/_client.py)

  • Added extract_jobs property to both LandingAIADE and AsyncLandingAIADE classes
  • Integrated extract_jobs into all wrapper classes (WithRawResponse, WithStreamingResponse)
  • Added necessary imports for type checking

3. Type Definitions

  • extract_job_create_params.py - Parameters for creating extract jobs
  • extract_job_create_response.py - Response from job creation
  • extract_job_get_response.py - Response from getting job status
  • extract_job_list_params.py - Parameters for listing jobs
  • extract_job_list_response.py - Response from listing jobs

4. Resource Exports (src/landingai_ade/resources/__init__.py)

  • Exported all ExtractJobs classes for public API access

Features Supported

  • Async job creation for extract operations
  • Markdown input via file upload or string content
  • URL-based markdown input (markdown_url)
  • Zero Data Retention (ZDR) with output_save_url
  • Job status polling and result retrieval
  • Job listing with filtering options
  • Full async/await support in the async client

Usage Example

# Sync usage
from landingai_ade import LandingAIADE

client = LandingAIADE(api_key="...")
response = client.extract_jobs.create(
    schema=json.dumps(schema),
    markdown=markdown_content,
    model="extract-latest"
)
status = client.extract_jobs.get(response.job_id)

# Async usage
from landingai_ade import AsyncLandingAIADE

async_client = AsyncLandingAIADE(api_key="...")
response = await async_client.extract_jobs.create(
    schema=json.dumps(schema),
    markdown=markdown_content,
    model="extract-latest"
)
status = await async_client.extract_jobs.get(response.job_id)

Testing

A test script test_async_extract.py is included to verify both sync and async functionality.

Related PRs


This PR completes the SDK support for async extract operations, providing feature parity with the async parse functionality.

- Add ExtractJobsResource and AsyncExtractJobsResource classes
- Implement create(), get(), and list() methods for extract jobs
- Add proper multipart form data handling for markdown files
- Export all extract jobs resources in __init__.py
- Integrate extract_jobs into main client classes
- Add support for raw response and streaming response wrappers
- Include test script to verify functionality

Following the same pattern as parse_jobs implementation.
Copilot AI review requested due to automatic review settings May 23, 2026 08:32
Test script was for local testing only and should not be included in the PR

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new extract_jobs resource (sync + async) to the landingai_ade SDK, intended to mirror the existing parse_jobs job-style API for asynchronous extract operations.

Changes:

  • Introduces ExtractJobsResource / AsyncExtractJobsResource with create(), get(), and list() plus raw/streaming wrappers.
  • Adds new extract-job request/response type models.
  • Wires extract_jobs into the main client and resource exports; adds a standalone local test script.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test_async_extract.py Standalone script exercising sync/async extract jobs against a local server.
src/landingai_ade/types/extract_job_create_params.py TypedDict params for creating extract jobs (schema + markdown inputs + options).
src/landingai_ade/types/extract_job_create_response.py Model for create-job response (job_id, message).
src/landingai_ade/types/extract_job_get_response.py Model for job status/result payload returned by get().
src/landingai_ade/types/extract_job_list_params.py Query params for listing extract jobs.
src/landingai_ade/types/extract_job_list_response.py Model for list response (pagination + collection field).
src/landingai_ade/resources/extract_jobs.py New sync/async resource implementation + raw/streaming wrapper classes.
src/landingai_ade/resources/init.py Exports extract-jobs resources and wrapper variants.
src/landingai_ade/_client.py Adds extract_jobs cached properties across sync/async clients and wrapper clients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""Whether there are more results available."""

list: List[ExtractJobGetResponse]
"""List of extract jobs.""" No newline at end of file

id: str
"""The unique identifier for the extract job."""

extra_body: Add additional JSON properties to the request

timeout: Override the client-level default timeout for this request, in seconds
"""
extra_body: Add additional JSON properties to the request

timeout: Override the client-level default timeout for this request, in seconds
"""
Comment on lines +7 to +15
__all__ = ["ExtractJobCreateResponse"]


class ExtractJobCreateResponse(BaseModel):
job_id: str
"""The unique identifier for the extract job."""

message: Optional[str] = None
"""Optional message about the job creation.""" No newline at end of file
Comment on lines +50 to +66
def create(
self,
*,
schema: str,
markdown: Optional[FileTypes | str] | Omit = omit,
markdown_url: Optional[str] | Omit = omit,
model: Optional[str] | Omit = omit,
output_save_url: Optional[str] | Omit = omit,
strict: bool | Omit = omit,
# Use the following arguments if you need to pass additional parameters to the API that aren't available via kwargs.
# The extra values given here take precedence over values defined on the client or passed to this method.
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = not_given,
) -> ExtractJobCreateResponse:
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants