Skip to content

Releases: TavariAgent/Py-TokenGate

[MINOR-UPDATE] - Formatting

04 Jun 21:36

Choose a tag to compare

Pre-release

This is a formatting and structure altering release.

[UPDATE] - ProcessPoolExecuter & Bug Safety

25 May 23:26

Choose a tag to compare

This release adds an opt in system to use the ProcessPoolExecuter on operations.

[FIXES] - Fixed Broken Import & Type Cleanup

24 May 15:58

Choose a tag to compare

This release fixes the errors I missed in the previous release - one broken import and three broken types.

[FIXES] - Race Conditions & Positional Alignment

19 May 02:48

Choose a tag to compare

This release fixes some race conditions that appeared in testing, hashes no longer miss with low probability.

[UPDATE] - The Optimization Pass

18 May 18:54

Choose a tag to compare

Pre-release

TokenGate — Optimization Pass Release Notes

Summary

This pass targets the token submission hot path — the sequence of operations
between a caller invoking a decorated function and the token landing in a
worker mailbox. No routing contracts, no execution semantics, and no public
API signatures were changed. Also added a runtime guard for tokens inserted
into the coordinator. All improvements are opt-in or transparent.


Benchmark Comparison

Metric Before After Delta
Total wall time (131k tokens) 83.506s 42.337s −49%
Active execution time ~83s 4.673s −94%
Avg latency / token 0.351ms 0.086ms −75%
Peak concurrency ratio 3.15× 49.22× +15.6×
Peak overlap ratio 46.09× 47.39× maintained
Peak single-wave throughput ~3,368 tok/s ~64,754 tok/s ~19×
Sustained throughput ~1,569 tok/s ~28,046 tok/s ~18×

The concurrency jump from 3× to 49× indicates the submission path was the
real ceiling. Workers were idle-waiting on routing overhead. Once that
overhead was removed the thread pool expressed its actual capacity.


Throughput Reporting — How to Read the Numbers

The benchmark reports four throughput figures. Each measures something
distinct and they should not be compared directly without understanding
what each one counts.

Important: All mean calculations are derived from accumulated token
totals and elapsed time totals across waves — they are not arithmetic
averages of independent per-wave unit rates. Reading them as per-wave
averages will make them appear inconsistent with wall time. They are not.

Metric Formula What it measures
Sustained total_tokens / Σ wave_elapsed Volume-weighted reality. Dominated by the largest waves which carry the most tokens. The honest number for full-workload throughput.
Peak max(tokens_i / elapsed_i) Best single-wave rate. Reflects the sweet-spot batch size where parallelism is fullest and scheduling overhead is smallest.
Token-weighted mean Σ(rate_i × tokens_i) / total_tokens Each token votes equally on the average rate. Sits between sustained and peak — useful for understanding where the system spends most of its token-budget.
Arithmetic mean Σ rate_i / N Each wave votes equally. Small fast waves inflate this significantly. Included for completeness but the least representative of real workload behaviour.

Why sustained and wall time appear inconsistent:

Wall time includes 15 × 0.05s = 0.75s of deliberate inter-wave sleep plus
event loop scheduling gaps and print overhead. Active time is the raw sum of
asyncio.gather spans only. The tok/s figure in each wave row and in the
summary is always based on active time. Wall time is reported separately so
the two are never conflated.

Example from the final benchmark run:

Waves 13–15:  114,688 tokens / 4.266s  =  ~26,900 tok/s  ← 87% of all volume
Waves  1–12:   16,380 tokens / 0.407s  =  ~40,200 tok/s  ← 13% of all volume
Combined:      131,068 tokens / 4.673s  =  ~28,046 tok/s  ← sustained (correct)

The sustained rate is pulled toward the large-wave rate because large waves
dominate the token count. This is the correct and expected result.


Changes

1. unhashable_checker.py — O(1) Type Dispatch

Problem: make_hashable walked a 23-layer isinstance chain on every
un-hashable value, including common exact types like dict, list, and
np.ndarray.

Change: Added _DISPATCH: dict populated once at module load by
_build_dispatch(). At call time, a single _DISPATCH.get(type(obj))
lookup short-circuits to the correct handler for registered exact types.
Subclass misses fall through to the existing isinstance chain — no
coverage regression.

Registered at load time: dict, list, set, bytearray, memoryview,
slice, array.array, deque, OrderedDict, defaultdict, Counter,
ChainMap, and optionally np.ndarray, pd.DataFrame, pd.Series,
pd.MultiIndex, pd.Index, pd.Categorical, torch.Tensor, cp.ndarray,
PIL.Image.

Fast path fix: The hash() fast path now catches (TypeError, RuntimeError) instead of TypeError only. Non-scalar torch.Tensor raises
RuntimeError from hash() and previously slipped through to the tensor
handler at layer 6. This closes that gap consistently with is_hashable.


2. unhashable_checker.pyHashPolicy Enum

Problem: All tokens paid the full make_hashable pipeline cost at
submission time regardless of whether their operation used sticky routing
or conductor domain anchoring.

Change: Added HashPolicy enum with four levels:

Value Behaviour
NONE No arg hashing. route_args is always (). Free routing only.
FAST Builtins and stdlib only (layers 1–3). Unknown types get identity routing (type_name, id).
STANDARD Full make_hashable pipeline. Default — unchanged behavior.
FULL Same as STANDARD. Reserved for explicit subclass-fallthrough intent.

Set per-operation via decorator tag:

@task_token_guard(
    operation_type="conductor_lead",
    tags={"weight": "medium",
          "hash_policy": HashPolicy.FAST,
          "digest_policy": DigestPolicy.FAST,
          "external_calls": ["conductor_child"]},
)
def my_function(...): ...

Default remains STANDARD. No existing code changes behavior without opt-in.

Also added fast_make_hashable() — the reduced pipeline used by
HashPolicy.FAST. Covers layers 1–3, falls back to (type_name, id) for
unrecognised types.


3. unhashable_checker.pyDigestPolicy Enum

Problem: Conductor seed generation was hardwired to SHA-256 full 64-char
hex on every lead token submission regardless of volume or lifetime
requirements.

Change: Added DigestPolicy enum with four levels:

Value Algorithm Output Collision space
FULL SHA-256 64 chars 256-bit. Default, unchanged.
SHORT SHA-256 truncated 16 chars 64-bit. Safe at any realistic volume.
FAST BLAKE2s (8-byte) 16 chars 64-bit. Lower compute cost than SHA-256.
MINIMAL SHA-256 truncated 8 chars 32-bit. Low volume only.

Collision semantics for MINIMAL: Collisions merge two logical domain
chains into a shared mailbox cluster. No data corruption occurs — the
least-loaded mechanism compensates. Under saturation, heavy tasks may fall
back from their primary core, shifting load distribution and inertly reducing
variance. The effect is benign at low-to-mid token volume with short-lived
leads: collisions reduce domain variance rather than causing failures, but
they can force heavy tasks off their primary core under sustained call chains,
inertly reducing performance at the affinity boundary.

Set per-operation via decorator tag:

@task_token_guard(
    operation_type="my_op",
    tags={"digest_policy": DigestPolicy.FAST}
)
def my_function(...): ...

4. hash_conductor.py — Policy-Aware generate_seed and charge

Change: generate_seed now accepts a DigestPolicy parameter and
branches to the appropriate algorithm. charge reads the digest_policy
tag from the token's metadata, resolves string values to the enum with a
fallback-to-FULL warning, and passes the resolved policy to generate_seed.

The tg_print dispatch line in charge now includes digest={policy.value}
for observability during mixed-policy runs.

snapshot() seed[:12] truncation is safe for all four policies — Python's
slice never raises on out-of-bounds, so an 8-char MINIMAL seed returns
itself.


5. sticky_token.pyfreeze() Empty-Args Guard

Problem: _make_key called freeze(args) on every mark() and
unmark(), including the majority path where args = (). freeze(()) is
always () — the recursive call was redundant on every conductor
on_completeunmark(seed, ()) call.

Change:

return op_name, freeze(args) if args else ()

Zero overhead on the empty-args path which is now the dominant path through
the conductor release cycle.


6. core_pinned_staggered_queue.py — Loop and Boolean Simplification

| Location | Change |
|----------------------------------------|------------------------------------------------------------------------------------------------------...

Read more

[MINOR-UPDATE] - Type Association Resolution & Cleanup

12 May 21:46

Choose a tag to compare

This update fixes some types I made that were ambiguous and some fields I forgot.

[MINOR-UPDATE] - Resolved Un-hashables & Fall Through Prevention

12 May 00:20

Choose a tag to compare

This is a must have update for clarity and security of hash-able and sticky operations. Anyone using these releases should most certainly update.

[MINOR-UPDATE] - Cleanup & The Crew (Dunders)

10 May 01:42

Choose a tag to compare

This is just a nice-to-have release, it cleans some misleading pathing up and adds in more type friendly resolution via the "double underscored methods".

[UPDATE] - The Production Safety Update

09 May 15:23

Choose a tag to compare

Pre-release

TokenGate — Release Notes

This update marks the first major "production ready features" given improved
data locality, safety from cache storms, and proper cleanup protocols.

Take note of the "Layer by layer" setup - it mentions all stages in the tokens layered contracts.

Extra information was added to BETA.md! Take a look there for bonus setup information.

Hash Conductor & Sticky Token Registry

This release introduces two systems that work together to anchor token execution
to stable core domains for the full lifetime of a call chain. The result is
deterministic routing, zero cross-domain data races, and measurably better
behaviour under saturated load conditions.


What Changed

StickyTokenRegistry

Tokens carrying the same (operation_type, args) key are now pinned to the
core that first receives them. Any later token arriving with the same key is
redirected to that core automatically. The pin releases when the token
completes, freeing the next submission to route normally.

A sticky_anchor tag can be added to any decorator to give the sticky key an
explicit name, independent of operation type.

@task_token_guard(
    operation_type="my_op",
    tags={"weight": "medium", "sticky_anchor": "op_token"},
)
def my_operation(n: int) -> int:
    ...

HashConductor

Lead tokens — those decorated with external_calls — generate a SHA-256 seed
from their token ID and call list. That seed is pinned to a core domain. Any
token spawned during the lead's execution inherits the seed and is routed to the
same core automatically. The domain releases when the lead and all of its
children have completed.

@task_token_guard(
    operation_type="lead_op",
    tags={"weight": "medium", "external_calls": ["child_op"]},
)
def lead_operation(n: int) -> list:
    return [child_op(n + i) for i in range(4)]

No changes are required at call sites. Domain anchoring is fully automatic once
external_calls is declared.

State Machine Cleanup

conductor.on_complete() is now wired into transition_state() directly. Every
terminal state — COMPLETED, FAILED, KILLED, TIMEOUT — decrements the
pending count. Killed tokens are reported as completed for observability clarity.
Domains cannot leak memory regardless of how a token ends.


How Routing Works — Layer by Layer

Understanding the full path a token takes from call to completion.

Layer 1 — Core Pinning Workers in Their Domains:
Workers are fixed to a single core domain at startup and never move. HEAVY
tokens belong to Core 1. MEDIUM tokens belong to Core 2 and above. LIGHT
tokens belong to Core 3 and above. Workers sit in their domain and reach for
the nearest valid token. The queue is what moves — it forms itself into the
correct shape around the workers, routing tokens into position so each worker
always reaches the right one.

Layer 2 — Token Creation & Metadata:
task_token_guard intercepts the decorated call before execution. A TaskToken
is created carrying the function, arguments, operation type, and routing tags.
Complexity scoring runs once and is cached on the wrapper. If an active conductor
seed is present in the current executor thread, it is stamped onto the token's
metadata here, before any event loop crossing occurs.

Layer 3 — Weight Classification & Position Assignment:
The token is classified as HEAVY, MEDIUM, or LIGHT from its tags or
operation type name. A staggered global position is calculated from the per-core
position counter, respecting the active worker pattern for that core. The
candidate core and local worker index are derived from this position.

Layer 4 — Sticky Token Enforcement:
For standard tokens, sticky_registry.mark() is called with the resolved
sticky_anchor or operation type as the key. If a marker already exists for
this key, the token is redirected to the pinned core. If not, the candidate core
is pinned and returned. This prevents concurrent tokens with matching keys from
splitting across core domains.

Layer 5 — Seed Generation:
For lead tokens carrying external_calls, a SHA-256 digest is computed from the
token ID concatenated with a frozen representation of the call list. The full
64-character hex digest is used. Collision probability at any realistic token
volume is negligible. The seed is stored on the token's metadata tags.

Layer 6 — Core Resolution:
_put_routing_block makes the final routing decision. Lead tokens with
external_calls go to conductor.charge(). Tokens carrying a
conductor_seed tag go to conductor.register_child(). All other tokens
follow the normal sticky registry path. The returned core_id is written to
the token's sticky_core tag and used for all subsequent mailbox placements.

Layer 7 — Charge Lead:
conductor.charge() generates the seed, pins it to the candidate core via the
sticky registry, and opens the pending count at 1 for the lead itself. The seed
and core mapping are stored in the conductor's internal registry for the lifetime
of the chain.

Layer 8 — Pre-Register Children:
When the lead function runs in the executor thread and spawns child tokens,
task_token_guard stamps the active seed onto each child at creation time and
calls conductor.pre_register() to increment the pending count immediately.
This increment happens before the child crosses to the event loop, ensuring the
domain stays alive even if the lead completes before the children reach put().

Layer 9 — Register Children:
When the child arrives at put(), register_child() reads the seed from the
token tag and returns the pinned core_id. Position assignment still runs —
but now it runs with a core_id that is already established. assign_position_for_token
uses that known core_id to derive local_i, selecting the correct local worker
within the domain. The staggered counter is not bypassed; it is given the right
context to work from. This keeps domain membership valid as worker counts change
live.

Layer 10 — Domain Grouping:
With the target core_id confirmed, the least-loaded active local worker within
that core is selected. The token is placed into that worker's mailbox. All tokens
under the same seed land in the same core's mailbox domain, keeping related work
physically co-located.

Layer 11 — Execution:
The worker loop dequeues the token and calls _execute_token_wrapped. Before the
function runs, conductor.activate() sets the conductor seed into a thread-local
on the executor thread. If the function spawns further child tokens, they
automatically inherit the seed through the same creation-time stamping mechanism
in task_token_guard.

Layer 12 — Finalization: Cleanup & Metadata:
When the token transitions to any terminal state (COMPLETED, FAILED,
KILLED, TIMEOUT), transition_state() calls conductor.on_complete().
The pending count decrements. When it reaches zero — meaning the lead and every
child have finished — the seed is removed from the conductor registry and the
sticky pin is released. Execution timing, core assignment, and complexity score
are written to the execution record.


Test Coverage Added

Cache Storm Test (demo/cache_storm.py)
Submits anchor tokens to pin keys to cores, then fires concurrent bursts of
tokens with identical (op_type, args) keys targeting different cores. Verifies
the sticky registry redirects all of them to the pinned core with zero misses.

Hash Conductor Test (demo/hash_conductor_test.py)
Submits lead tokens that spawn children during execution. Verifies all tokens
in each chain land on the same core, carry matching seeds, and that the conductor
snapshot is empty after resolution. Also checks seed uniqueness across
independent concurrent leads.


Benchmark

Endurance run across 15 doubling waves. 131,068 tokens total.

Wave   Tokens    OK      Fail    Time      Tok/s     Lat(ms)   Conc    Overlap
1      4         4       0       0.003s    1386.2    0.721     1.00×   1.44×
2      8         8       0       0.003s    2391.2    0.418     1.72×   2.48×
3      16        16      0       0.006s    2744.8    0.364     1.98×   4.82×
4      32        32      0       0.011s    2812.7    0.356     2.03×   11.32×
5      64        64      0       0.022s    2880.0    0.347     2.08×   22.01×
6      128       128     0       0.044s    2907.6    0.344     2.10×   29.78×
7      256       256     0       0.090s    2846.8    0.351     2.05×   37.98×
8      512       512     0       0.182s    2811.5    0.356     2.03×   41.81×
9      1024      1024    0       0.364s    2813.9    0.355     2.03×   44.18×
10     2048      2048    0       0.775s    2644.3    0.378     1.91×   44.86×
11     4096      4096    0       1.454s    2816.3    0.355     2.03×   38.34×
12     8192      8192    0       2.905s    2819.9    0.355     2.03×   32.64×
13     16384     16384   0       5.925s    2765.0    0.362     1.99×   27.92×
14     32768     32768   0       12.102s   2707.7    0.369     1.95×   24.96×
15     65536     65536   0       23.494s   2789.5    0.358     2.01×   24.21×

TOTAL  131,068   131,068  0      89.091s
Avg latency : 0.386 ms/token
Peak overlap: 44.86×

Less peak overlap due to more deterministic task routing and better task performance
under saturated conditions indicating the update was a success.

Zero failures. Latency holds within 0.04ms from wave 3 through wave 15.
Peak overlap of 44.86× achieved at wave 10 with flat latency — the system
reached saturation and held position rather than degrad...

Read more

[FIXES] - Actually Clean Prints & Better Alignment

01 May 16:19

Choose a tag to compare

This set of fixes includes some removed unused code, some conflicting deprecated flags and aligns TokenGates token registration access methods (no more protected member access needed).