Skip to content

Wip/modular refactor pnpm turborepo#18

Merged
robbiebyrd merged 2 commits into
mainfrom
wip/modular-refactor-pnpm-turborepo
Jun 9, 2026
Merged

Wip/modular refactor pnpm turborepo#18
robbiebyrd merged 2 commits into
mainfrom
wip/modular-refactor-pnpm-turborepo

Conversation

@robbiebyrd

@robbiebyrd robbiebyrd commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary by Sourcery

Handle CDN-embedded origins and additional redirect patterns in URL rewriting while fixing proxy handler wrapping for undici compatibility.

New Features:

  • Support unwrapping Akamai-style CDN URLs that embed the origin hostname in their path when rewriting HTML and CSS asset URLs.
  • Add handling for AOL dynamic.aol.com CGI redirection URLs when resolving redirect targets.

Bug Fixes:

  • Preserve and forward prototype-defined DispatchHandler methods when wrapping undici proxy handlers to avoid invalid handler errors and proxy rotation failures.

Enhancements:

  • Record prewarm asset discovery using the unwrapped origin URL rather than the CDN host when processing archived URLs.

Tests:

  • Extend URL rewriter tests to cover Akamai CDN URL unwrapping across HTML attributes, inline styles, and CSS, including interaction with embedded archive timestamps.
  • Add regression tests ensuring wrapped proxy handlers expose all expected methods to undici and correctly forward calls.
  • Add tests for the new AOL dynamic.aol.com redirect-unwrapping behavior and non-unwrapping edge cases.

The `#wrapHandler` returned `{ ...handler, onResponseStart, onResponseError }`,
but undici's fetch builds its DispatchHandler as a class instance with methods
on the prototype. Object spread only copies own enumerable properties, so the
wrapped handler reached undici missing `onRequestStart`, `onResponseData`, and
`onResponseEnd`. Undici rejected every dispatch with
`UND_ERR_INVALID_ARG (invalid onRequestStart method)`, the rotator marked
the proxy failed, the re-probe (which builds its own handler via `request()`)
succeeded, and the proxy was restored — only for the next real request to
fail the same way. /status reported healthy while every fetch returned
"OUTBOUND_PROXY: no healthy proxy available".

Replaces the object spread with a Proxy that transparently forwards every
property to the inner handler and only intercepts the two we observe.

Regression test instantiates a class-based handler (mirroring undici's real
shape) and asserts prototype methods survive the wrap.
@robbiebyrd robbiebyrd marked this pull request as ready for review June 9, 2026 01:14
@sourcery-ai

sourcery-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds CDN-embedded origin URL unwrapping for Akamai-style URLs in HTML/CSS rewriting, extends redirect unwrapping to AOL dynamic.aol.com patterns, and fixes RotatingProxyDispatcher handler wrapping to preserve undici DispatchHandler prototype methods, with corresponding regression tests.

Sequence diagram for RotatingProxyDispatcher Proxy-wrapped DispatchHandler

sequenceDiagram
    participant RotatingProxyDispatcher
    participant ProxyEntry
    participant InnerHandler as DispatcherDispatchHandler
    participant WrappedHandler as ProxyWrappedHandler
    participant Undici as UndiciDispatcher

    RotatingProxyDispatcher->>RotatingProxyDispatcher: #wrapHandler(entry, handler)
    RotatingProxyDispatcher->>WrappedHandler: new Proxy(handler, { get })
    RotatingProxyDispatcher-->>Undici: WrappedHandler as Dispatcher.DispatchHandler

    Undici->>WrappedHandler: onResponseStart(controller, statusCode, headers, statusMessage)
    WrappedHandler->>RotatingProxyDispatcher: [get("onResponseStart")]
    WrappedHandler->>RotatingProxyDispatcher: PROXY_ERROR_STATUS.has(statusCode)
    alt [status is proxy error]
        WrappedHandler->>RotatingProxyDispatcher: #markFailed(entry, "proxy returned " + statusCode)
    end
    WrappedHandler->>InnerHandler: onResponseStart(controller, statusCode, headers, statusMessage)

    Undici->>WrappedHandler: onResponseError(controller, error)
    WrappedHandler->>RotatingProxyDispatcher: [get("onResponseError")]
    WrappedHandler->>RotatingProxyDispatcher: #markFailed(entry, "transport: " + error.message)
    WrappedHandler->>InnerHandler: onResponseError(controller, error)

    Undici->>WrappedHandler: otherMethod(...)
    WrappedHandler->>InnerHandler: otherMethod(...) (via Reflect.get and bind)
Loading

Flow diagram for URL rewriting with extractCdnEmbeddedUrl and unwrapRedirectUrl

flowchart TD
    A[raw URL string] --> B[match RE_ARCHIVE_URL]
    B -->|archive match| C[extract timestamp ts and rawOriginal]
    C --> D["extractCdnEmbeddedUrl(rawOriginal)"]
    D -->|found| E[originalUrl = extracted]
    D -->|not found| F[originalUrl = rawOriginal]

    B -->|no archive match| G["unwrapRedirectUrl(trimmed)"]
    G --> H["extractCdnEmbeddedUrl(unwrapped)"]
    H -->|found| I["cdnUnwrapped = extracted"]
    H -->|not found| J["cdnUnwrapped = unwrapped"]

    E --> K["buildProxyUrl(originalUrl, fallbackTime, lockTime)"]
    F --> K
    I --> L["new URL(cdnUnwrapped, targetUrl)"]
    J --> L
    L --> M{protocol http: or https:?}
    M -->|no| N[return raw]
    M -->|yes| O["buildProxyUrl(resolved.toString(), fallbackTime, lockTime)"]
Loading

File-Level Changes

Change Details Files
Unwrap Akamai-style CDN URLs so rewritten proxy URLs and asset discovery target the true origin instead of the CDN host.
  • Import a new extractCdnEmbeddedUrl helper into the URL rewriter and apply it when handling archive-wrapped URLs before recording discovered assets.
  • Apply CDN unwrapping to all rewritten URLs after redirect unwrapping and before resolution against the target URL, so html attributes, srcset entries, inline styles, and CSS url() references consistently point to the origin.
  • Extend HTML and CSS URL rewriting tests to cover Akamai CDN patterns, wayback-wrapped URLs, srcset handling, inline styles, and ensure non-CDN absolute URLs are preserved through the unwrap path.
src/lib/url-rewriter.ts
tests/lib/url-rewriter.test.ts
Extend redirect unwrapping to handle AOL dynamic.aol.com CGI redirector URLs with embedded destinations while keeping non-matching cases intact.
  • Add a new dynamic.aol.com/cgi/redir pattern to the redirect-unwrapper regex list that captures a raw http/https URL directly after the query delimiter.
  • Add regression tests for AOL dynamic.aol.com redirect unwrapping, covering http/https destinations, preservation of paths and query strings, and non-unwrapping behaviors for unsupported schemes and paths.
src/lib/redirect-unwrapper.ts
tests/lib/redirect-unwrapper.test.ts
Fix RotatingProxyDispatcher handler wrapping so undici-style DispatchHandler prototype methods are preserved while still intercepting response status and errors to mark failing proxies.
  • Replace the previous object-spread-based handler wrapper with a JavaScript Proxy that intercepts onResponseStart and onResponseError to mark failed proxies, while delegating calls to the original handler methods when present.
  • Ensure the Proxy transparently forwards all other properties and methods, binding functions to the original handler target to preserve method semantics.
  • Add a regression test simulating an undici-like class-based DispatchHandler with prototype-defined methods to verify that wrapped handlers expose onRequestStart, onResponseData, and onResponseEnd and that these methods are invoked correctly.
src/lib/outbound-proxy.ts
tests/lib/outbound-proxy.test.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@robbiebyrd robbiebyrd merged commit e977a2a into main Jun 9, 2026
1 check passed

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="tests/lib/url-rewriter.test.ts" line_range="374-383" />
<code_context>
+describe("rewriteHtmlUrls — CDN URLs with embedded origin", () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a regression test for Akamai URLs that do not contain an embedded origin hostname

You already cover Akamai URLs with an embedded origin hostname and non-CDN absolute URLs. To further harden `extractCdnEmbeddedUrl` and the rewrite logic, please add a case where the host looks like Akamai (e.g. `a388.g.akamai.net`) but the path contains no embedded origin hostname, to ensure we neither rewrite such URLs incorrectly nor break valid CDN-hosted assets that can’t be unwrapped.
</issue_to_address>

### Comment 2
<location path="tests/lib/outbound-proxy.test.ts" line_range="563-572" />
<code_context>
+		it("forwards prototype methods from the inner handler (regression: undici DispatchHandler is a class instance)", () => {
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the regression test by asserting that handler methods are invoked with the original instance as `this`

Since the implementation now uses a `Proxy` and binds functions, consider adding an assertion that `this` inside the handler methods is the original `FetchLikeHandler` instance, not the proxy. For example, have `onRequestStart` capture `this` and assert `capturedThis === userHandler` to ensure instance-state access via `this` continues to work as expected.

Suggested implementation:

```typescript
			const { built, d } = buildRotator(2);

			let onRequestStartThis: unknown;
			let onResponseDataThis: unknown;
			let onResponseEndThis: unknown;

			const onRequestStart = jest.fn(function (this: unknown, ...args: unknown[]) {
				onRequestStartThis = this;
				// preserve jest's mock call recording
				return (onRequestStart as jest.Mock).mock.calls && (onRequestStart as jest.Mock)(...args);
			});
			const onResponseData = jest.fn(function (this: unknown, ...args: unknown[]) {
				onResponseDataThis = this;
				return (onResponseData as jest.Mock)(...args);
			});
			const onResponseEnd = jest.fn(function (this: unknown, ...args: unknown[]) {
				onResponseEndThis = this;
				return (onResponseEnd as jest.Mock)(...args);
			});

			class FetchLikeHandler {

```

To fully implement the suggestion and assert that handler methods are invoked with the original instance as `this`, you should also:

1. Implement `onRequestStart`, `onResponseData`, and `onResponseEnd` as instance methods on `FetchLikeHandler` that delegate to the above mocks, e.g.:
   ```ts
   class FetchLikeHandler {
     onRequestStart(...args: unknown[]) {
       return onRequestStart.apply(this, args);
     }
     onResponseData(...args: unknown[]) {
       return onResponseData.apply(this, args);
     }
     onResponseEnd(...args: unknown[]) {
       return onResponseEnd.apply(this, args);
     }
   }
   ```
2. In this test, ensure you keep a reference to the original handler instance (e.g. `const userHandler = new FetchLikeHandler();`) and that this instance is what `buildRotator`/the proxy wraps.
3. After invoking a request through `built`/`d` such that all three handler methods run, add assertions like:
   ```ts
   expect(onRequestStartThis).toBe(userHandler);
   expect(onResponseDataThis).toBe(userHandler);
   expect(onResponseEndThis).toBe(userHandler);
   ```
   This will verify that the `Proxy` and function binding in the production code preserve the original handler instance as `this` inside the methods.
</issue_to_address>

### Comment 3
<location path="src/lib/outbound-proxy.ts" line_range="374" />
<code_context>
-			onResponseStart(controller, statusCode, headers, statusMessage) {
-				if (PROXY_ERROR_STATUS.has(statusCode)) {
-					self.#markFailed(entry, `proxy returned ${statusCode}`);
+		// Proxy (not object spread) because undici's fetch builds its handler as
+		// a class instance with methods on the prototype. `{ ...handler }` would
+		// drop them, and undici would reject the wrapped handler with
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the generic Proxy-based handler wrapper with an Object.create-based wrapper that only overrides the two relevant methods to keep behavior explicit and easier to follow.

You can avoid the `Proxy` (and its global `get` trap/binding logic) while still preserving prototype methods and `this` by using `Object.create` and explicitly overriding only the two methods you care about.

That keeps the fix (no `{ ...handler }` cloning) but makes the wrapper local and easier to reason about:

```ts
#wrapHandler(entry: ProxyEntry, handler: Dispatcher.DispatchHandler): Dispatcher.DispatchHandler {
	const self = this;

	// Preserve handler's prototype chain & instance methods
	const wrapped = Object.create(handler) as Dispatcher.DispatchHandler;

	const origOnResponseStart = handler.onResponseStart?.bind(handler);
	const origOnResponseError = handler.onResponseError?.bind(handler);

	wrapped.onResponseStart = function (
		controller: Dispatcher.DispatchController,
		statusCode: number,
		headers: IncomingHttpHeaders,
		statusMessage?: string,
	): void {
		if (PROXY_ERROR_STATUS.has(statusCode)) {
			self.#markFailed(entry, `proxy returned ${statusCode}`);
		}
		origOnResponseStart?.(controller, statusCode, headers, statusMessage);
	};

	wrapped.onResponseError = function (
		controller: Dispatcher.DispatchController,
		error: Error,
	): void {
		const reason = error instanceof Error ? error.message : String(error);
		self.#markFailed(entry, `transport: ${reason}`);
		origOnResponseError?.(controller, error);
	};

	return wrapped;
}
```

This:

- Preserves prototype methods (undici still sees the class instance shape).
- Keeps `this` for the original handler methods via `bind(handler)`.
- Avoids intercepting all properties, `Proxy`, `Reflect.get`, and generic function binding.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +374 to +383
describe("rewriteHtmlUrls — CDN URLs with embedded origin", () => {
it("unwraps an Akamai URL on <img src> and points the proxy URL at the origin", () => {
const html = `<img src="http://a388.g.akamai.net/f/388/21/1d/www.cnn.com/images/newmain/top.main.special.report.gif">`;
const { html: r } = rewriteHtmlUrls(html, TARGET, TIME);
expect(r).toContain(
`/web/${TIME}/http://www.cnn.com/images/newmain/top.main.special.report.gif`,
);
expect(r).not.toContain("a388.g.akamai.net");
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding a regression test for Akamai URLs that do not contain an embedded origin hostname

You already cover Akamai URLs with an embedded origin hostname and non-CDN absolute URLs. To further harden extractCdnEmbeddedUrl and the rewrite logic, please add a case where the host looks like Akamai (e.g. a388.g.akamai.net) but the path contains no embedded origin hostname, to ensure we neither rewrite such URLs incorrectly nor break valid CDN-hosted assets that can’t be unwrapped.

Comment on lines +563 to +572
it("forwards prototype methods from the inner handler (regression: undici DispatchHandler is a class instance)", () => {
// Regression for the production bug where `{ ...handler }` dropped
// undici's prototype-defined methods (onRequestStart, onResponseData,
// onResponseEnd), causing undici to reject every dispatch with
// "UND_ERR_INVALID_ARG (invalid onRequestStart method)" and take
// the proxy permanently out of rotation.
const { built, d } = buildRotator(2);
const onRequestStart = jest.fn();
const onResponseData = jest.fn();
const onResponseEnd = jest.fn();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Strengthen the regression test by asserting that handler methods are invoked with the original instance as this

Since the implementation now uses a Proxy and binds functions, consider adding an assertion that this inside the handler methods is the original FetchLikeHandler instance, not the proxy. For example, have onRequestStart capture this and assert capturedThis === userHandler to ensure instance-state access via this continues to work as expected.

Suggested implementation:

			const { built, d } = buildRotator(2);

			let onRequestStartThis: unknown;
			let onResponseDataThis: unknown;
			let onResponseEndThis: unknown;

			const onRequestStart = jest.fn(function (this: unknown, ...args: unknown[]) {
				onRequestStartThis = this;
				// preserve jest's mock call recording
				return (onRequestStart as jest.Mock).mock.calls && (onRequestStart as jest.Mock)(...args);
			});
			const onResponseData = jest.fn(function (this: unknown, ...args: unknown[]) {
				onResponseDataThis = this;
				return (onResponseData as jest.Mock)(...args);
			});
			const onResponseEnd = jest.fn(function (this: unknown, ...args: unknown[]) {
				onResponseEndThis = this;
				return (onResponseEnd as jest.Mock)(...args);
			});

			class FetchLikeHandler {

To fully implement the suggestion and assert that handler methods are invoked with the original instance as this, you should also:

  1. Implement onRequestStart, onResponseData, and onResponseEnd as instance methods on FetchLikeHandler that delegate to the above mocks, e.g.:
    class FetchLikeHandler {
      onRequestStart(...args: unknown[]) {
        return onRequestStart.apply(this, args);
      }
      onResponseData(...args: unknown[]) {
        return onResponseData.apply(this, args);
      }
      onResponseEnd(...args: unknown[]) {
        return onResponseEnd.apply(this, args);
      }
    }
  2. In this test, ensure you keep a reference to the original handler instance (e.g. const userHandler = new FetchLikeHandler();) and that this instance is what buildRotator/the proxy wraps.
  3. After invoking a request through built/d such that all three handler methods run, add assertions like:
    expect(onRequestStartThis).toBe(userHandler);
    expect(onResponseDataThis).toBe(userHandler);
    expect(onResponseEndThis).toBe(userHandler);
    This will verify that the Proxy and function binding in the production code preserve the original handler instance as this inside the methods.

Comment thread src/lib/outbound-proxy.ts
onResponseStart(controller, statusCode, headers, statusMessage) {
if (PROXY_ERROR_STATUS.has(statusCode)) {
self.#markFailed(entry, `proxy returned ${statusCode}`);
// Proxy (not object spread) because undici's fetch builds its handler as

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider replacing the generic Proxy-based handler wrapper with an Object.create-based wrapper that only overrides the two relevant methods to keep behavior explicit and easier to follow.

You can avoid the Proxy (and its global get trap/binding logic) while still preserving prototype methods and this by using Object.create and explicitly overriding only the two methods you care about.

That keeps the fix (no { ...handler } cloning) but makes the wrapper local and easier to reason about:

#wrapHandler(entry: ProxyEntry, handler: Dispatcher.DispatchHandler): Dispatcher.DispatchHandler {
	const self = this;

	// Preserve handler's prototype chain & instance methods
	const wrapped = Object.create(handler) as Dispatcher.DispatchHandler;

	const origOnResponseStart = handler.onResponseStart?.bind(handler);
	const origOnResponseError = handler.onResponseError?.bind(handler);

	wrapped.onResponseStart = function (
		controller: Dispatcher.DispatchController,
		statusCode: number,
		headers: IncomingHttpHeaders,
		statusMessage?: string,
	): void {
		if (PROXY_ERROR_STATUS.has(statusCode)) {
			self.#markFailed(entry, `proxy returned ${statusCode}`);
		}
		origOnResponseStart?.(controller, statusCode, headers, statusMessage);
	};

	wrapped.onResponseError = function (
		controller: Dispatcher.DispatchController,
		error: Error,
	): void {
		const reason = error instanceof Error ? error.message : String(error);
		self.#markFailed(entry, `transport: ${reason}`);
		origOnResponseError?.(controller, error);
	};

	return wrapped;
}

This:

  • Preserves prototype methods (undici still sees the class instance shape).
  • Keeps this for the original handler methods via bind(handler).
  • Avoids intercepting all properties, Proxy, Reflect.get, and generic function binding.

robbiebyrd added a commit that referenced this pull request Jun 10, 2026
src/services/time-machine.ts:
- Use crypto.timingSafeEqual for Bearer token auth (004-a668)
- Add Content-Security-Policy: SANDBOX_CSP on SSE path (005-78d7)
- Guard all ws.send() calls with readyState check; add per-connection
  ws.on("error") listener (012-1901)
- Expand signal handler to call stop().catch() with fatal log (014-e360)
- Map result.cache to "HIT"|"MISS" in WsResponse (022-1b8c)
- Fix hardcoded 200 in access log after handleCacheClear (033-ab07)
- Tighten getStatus return type to Promise<SystemStatus> (035-c1be)

src/services/cache.ts:
- Validate domainFilter query param; return 400 on invalid hostname (007-6eb8)
- Add atomicWrite helper (writeFile+rename via TMP_SUFFIX); wire in
  writeResolvedTimeSidecar and writeContentTypeSidecar (008-0492)
- Guard outer catch in handleCacheClear with res.headersSent (013-5379)
- Replace fs.readFile with fs.open+read(SNIFF_BYTES)+close in
  sniffContentType (017-8751)

src/clients/archive-job-client.ts:
- Replace serial queue.add() loop with queue.addBulk() (018-ff86)

src/queue/archive-worker.ts:
- Delete startTimes entry in stalled event to fix Map leak (019-59bf)

tests/lib/runtime-shim.test.ts:
- Add lockTime=true describe block with 15 tests (027-6577)

tests/: update mocks for addBulk, atomicWrite, and sniff partial-read

Stories: #4-a668 #5-78d7 #7-6eb8 #8-0492 #12-1901 #13-5379
         #14-e360 #17-8751 #18-ff86 #19-59bf #022-1b8c #027-6577
         #033-ab07 #035-c1be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant