Skip to content

Proposal: Add pipeline mode support (PostgreSQL 14+) #181

@jjn1056

Description

@jjn1056

Pipeline mode has been on the TODO list for a while (Support pipeline mode: https://www.postgresql.org/docs/14/libpq-pipeline-mode.html). We have a working implementation and would like to get feedback on the API design before pushing further.

Motivation

Pipeline mode (libpq, PG14+) batches multiple queries into fewer network round trips. Instead of the normal send-wait-receive cycle per query, the client queues N queries and collects N results after a sync point. This can dramatically reduce latency for workloads that issue many independent queries — batch inserts, bulk prepared statement execution, etc.

Every major PostgreSQL driver has added or is adding pipeline support: psycopg3 (conn.pipeline()), Ruby pg (enter_pipeline_mode), pgx (SendBatch), tokio-postgres (implicit pipelining).

Proposed API

Low-level wrappers matching the Ruby pg gem's approach — thin 1:1 mappings of the libpq pipeline functions. Users manage the lifecycle explicitly.

Method Wraps Returns
pg_enter_pipeline_mode PQenterPipelineMode 1 success, 0 fail
pg_exit_pipeline_mode PQexitPipelineMode 1 success, 0 fail
pg_pipeline_status PQpipelineStatus 0=off, 1=on, 2=aborted
pg_pipeline_sync PQpipelineSync 1 success, 0 fail
pg_getresult PQgetResult hashref or undef
pg_send_query_params PQsendQueryParams 1 success, 0 fail
pg_send_prepare PQsendPrepare 1 success, 0 fail
pg_send_query_prepared PQsendQueryPrepared 1 success, 0 fail
pg_send_flush_request PQsendFlushRequest 1 success, 0 fail
pg_flush PQflush 0=flushed, 1=pending, -1=error

pg_getresult returns a hashref:

{
    status    => 2,              # ExecStatusType integer
    error     => undef,          # error message or undef
    ntuples   => 3,              # row count
    nfields   => 2,              # column count
    cmdtuples => '',             # affected rows (DML)
    rows      => [[...], ...],   # only present for TUPLES_OK
}

Usage example

$dbh->pg_enter_pipeline_mode();

$dbh->pg_send_query_params('INSERT INTO t VALUES (?)', [1]);
$dbh->pg_send_query_params('INSERT INTO t VALUES (?)', [2]);
$dbh->pg_send_query_params('SELECT * FROM t WHERE id = ?', [1]);
$dbh->pg_pipeline_sync();

# Results arrive: query -> undef -> query -> undef -> ... -> PIPELINE_SYNC
my $r1 = $dbh->pg_getresult();   # {status => 1}  INSERT
$dbh->pg_getresult();              # undef
my $r2 = $dbh->pg_getresult();   # {status => 1}  INSERT
$dbh->pg_getresult();              # undef
my $r3 = $dbh->pg_getresult();   # {status => 2, rows => [...]}
$dbh->pg_getresult();              # undef
$dbh->pg_getresult();              # {status => 10} PIPELINE_SYNC

$dbh->pg_exit_pipeline_mode();

Design decisions

Low-level, not automatic. We considered a high-level approach where execute() queues internally and results are distributed back to statement handles (like psycopg3). That would require fundamental changes to DBI's execute flow. The low-level approach is safer, simpler, and can be layered on later.

pg_getresult returns a hashref, not a statement handle. Pipeline results don't map naturally to DBI's one-query-one-handle model. The hashref gives users everything they need (including row data for SELECTs) without coupling to DBI internals.

Both ? and $1/$2 placeholders are supported. The pipeline send methods call libpq's PQsendQueryParams directly, which requires $1/$2 syntax. However, since DBI users (and especially DBIx::Class/SQL::Abstract) generate SQL with ? placeholders, the implementation includes a C-level placeholder converter (pg_convert_placeholders in dbdimp.c) that transforms ? to $1, $2, etc. while properly skipping single-quoted strings, double-quoted identifiers, dollar-quoted strings, and SQL comments — following the same skip patterns as the existing pg_st_split_statement parser. When the SQL already contains $N placeholders, conversion is skipped entirely.

COPY is explicitly rejected in pipeline mode. This is a PostgreSQL protocol constraint, not a driver choice. We added guards that croak with clear messages on pg_putcopydata, pg_putcopyend, and pg_getcopydata.

Compile-time gating. The implementation uses #if PGLIBVERSION >= 140000. Methods croak at runtime if compiled against older libpq. Tests skip entirely on PG < 14.

Open questions

  1. Method naming — Are the pg_send_* / pg_pipeline_* / pg_getresult names consistent with the project's conventions? Any preferences?

  2. Version gating approach — Methods croak at runtime on older libpq. Tests skip on PG < 14. The CI matrix already includes PG 15 and 17. Is this the right approach, or would you prefer the methods to not exist at all on older builds?

  3. pg_getresult scope — Currently this is a general-purpose PQgetResult wrapper that works outside pipeline mode too. Should it be pipeline-only, or is general availability useful?

  4. pg_flush overlappg_flush (wrapping PQflush) is also needed for async COPY FROM (PR Add non-blocking async COPY FROM support (pg_putcopydata_async, pg_putcopyend_async, pg_flush) #176). The two PRs add it independently. If both land, we'd deduplicate.

  5. Placeholder conversion approach — The C-level ? to $N converter reuses the skip patterns from pg_st_split_statement but is a separate ~180 line function. An alternative would be to refactor the existing parser to be callable standalone, but that would be significantly more invasive. Is a dedicated converter function acceptable, or would you prefer a different approach?

  6. High-level API future — This PR is deliberately low-level. A future enhancement could add higher-level support (e.g., making execute() pipeline-aware, automatically distributing results to statement handles). Is that something you'd want to see as a follow-up, or is the low-level API sufficient?

Reference implementation

PR linked below has a working implementation with 109 tests covering the full pipeline lifecycle, error handling, placeholder conversion edge cases, and more. It passes the full author test suite. The code is meant as a concrete starting point for discussion — not a final API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions