Skip to content

kindling: switch from getlantern/fronted to getlantern/domainfront#426

Merged
myleshorton merged 11 commits into
refactorfrom
adam/switch-to-domainfront
Apr 28, 2026
Merged

kindling: switch from getlantern/fronted to getlantern/domainfront#426
myleshorton merged 11 commits into
refactorfrom
adam/switch-to-domainfront

Conversation

@myleshorton
Copy link
Copy Markdown
Contributor

@myleshorton myleshorton commented Apr 19, 2026

Summary

Ports radiance's domain fronting transport from getlantern/fronted to getlantern/domainfront (clean-room rewrite: no global state, context-driven lifecycle, atomic config replace, lower Android GC pressure).

Paired with getlantern/kindling#31 which swaps kindling's WithDomainFronting to accept *domainfront.Client.

Changes

  • kindling/fronted/fronted.go
    • NewFronted now returns a *domainfront.Client instead of fronted.Fronted
    • Drops the panicListener parameter (domainfront's goroutines are owned by the Client and shut down via Close(); no equivalent hook)
    • Fail fast if the smart-dialer HTTP client can't be built, so we don't silently degrade to http.DefaultClient
    • Three-tier config bootstrap fallback:
      1. Live fetch of fronted.yaml.gz via smart dialer (30s timeout) — bypasses DNS/SNI blocking of raw.githubusercontent.com. Raw bytes are persisted to <cacheFile's dir>/fronted_config.yaml.gz on success.
      2. On-disk cache from a prior successful fetch.
      3. Embedded fronted.yaml.gz via //go:embed (committed alongside the source). domainfront doesn't embed its own; the old getlantern/fronted package did, and without this fallback a fresh install where raw.githubusercontent.com is blocked from the first boot can't initialize.
    • New bypassDialer struct wraps bypass.DialContext so it satisfies the domainfront.Dialer interface (takes an interface, not a function).
  • kindling/client.go and config/fetcher_test.go — updated for the new NewFronted signature.
  • go.mod / go.sum — drops direct getlantern/fronted dep; adds getlantern/domainfront; bumps getlantern/kindling to the matching branch.
  • .github/workflows/refresh-fronted-config.yml — daily cron (04:30 UTC) that mirrors upstream's daily fronted.yaml.gz refresh: pulls from getlantern/fronted/main, validates it's a real gzip, commits and pushes the new copy directly when it differs. Matches the maintenance pattern the old fronted package had via its external daily-update pipeline.

Test plan

  • go build ./kindling/... ./config/... and go vet ./kindling/... ./config/... pass
  • go test -short ./kindling/... ./config/... passes
  • Manual: build a Lantern nightly off this branch + the kindling PR and confirm fronted traffic still works end-to-end in-app
  • CI: verify the full suite runs clean once the kindling PR merges and we can re-pin to @main

Next

After merge we should re-pin getlantern/kindling from the branch tag to @main.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 19, 2026 15:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports Radiance’s domain-fronting transport from getlantern/fronted to the new getlantern/domainfront client, updating call sites and module dependencies accordingly.

Changes:

  • Reworked NewFronted to construct and return a *domainfront.Client (no panic listener), including a synchronous initial config fetch and updated dialer adaptation.
  • Updated kindling initialization and config fetcher test call sites to match the new NewFronted signature.
  • Updated Go module dependencies to drop getlantern/fronted and add getlantern/domainfront (and bump getlantern/kindling).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
kindling/fronted/fronted.go Replaces fronted usage with domainfront client creation, adds initial config fetch + dialer adapter.
kindling/client.go Updates kindling bootstrap to use the new NewFronted signature and close the returned client.
config/fetcher_test.go Updates test to use the new NewFronted signature (test remains skipped).
go.mod Swaps dependencies: removes getlantern/fronted, adds getlantern/domainfront, bumps getlantern/kindling.
go.sum Updates checksums to reflect dependency changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kindling/fronted/fronted.go
Comment thread kindling/fronted/fronted.go Outdated
domainfront is the clean-room successor to fronted: no global state,
context-driven lifecycle, atomic config replace, and meaningfully lower
allocation pressure on Android. See getlantern/domainfront for details.

- kindling/fronted/fronted.go: NewFronted now returns a *domainfront.Client
  instead of a fronted.Fronted. Drops the panicListener parameter
  (domainfront has no equivalent — its goroutines are owned by the Client
  and shut down cleanly via the context/Close).
- Fetch the initial fronted.yaml.gz synchronously (30s timeout) via the
  smart dialer, preserving the pre-existing behavior of bypassing DNS-level
  blocking for raw.githubusercontent.com on the initial download. The same
  smart-dialer-backed HTTP client is reused for domainfront's internal 12h
  config refresh loop.
- Introduce a small bypassDialer struct wrapping bypass.DialContext so it
  satisfies domainfront.Dialer (which takes an interface, not a function).
- Update callers in kindling/client.go and config/fetcher_test.go for the
  new signature.

Requires the matching kindling bump (getlantern/kindling#31).
@myleshorton myleshorton force-pushed the adam/switch-to-domainfront branch from 3953c40 to 2717a4d Compare April 19, 2026 15:45
@myleshorton myleshorton changed the base branch from main to refactor April 19, 2026 15:46
…mainfront-merged

# Conflicts:
#	go.mod
#	go.sum
domainfront itself doesn't embed a config — its doc says the caller
provides the "initial (typically embedded) configuration". The old
getlantern/fronted package we're replacing did embed it. Without an
embedded fallback, a fresh install where raw.githubusercontent.com is
blocked from the first boot can't initialize at all (fetch fails, no
on-disk cache from prior run, no embedded copy).

Add //go:embed fronted.yaml.gz and have loadCachedConfig fall through
to the embedded bytes when both the live fetch and on-disk cache miss.
Refresh the committed copy periodically from the getlantern/fronted
main branch — same maintenance pattern the old package had.
Mirror the daily updates getlantern/fronted gets from its external
pipeline, so the embedded fallback config stays close to upstream.
Workflow runs at 04:30 UTC (a few hours after the upstream ~00:15 UTC
commit), downloads the latest fronted.yaml.gz, validates it parses as
gzip, and opens a PR if it differs from the committed copy.

Manual workflow_dispatch is also enabled for ad-hoc refreshes.
A daily PR with a known-trusted gzipped binary from another repo is
just merge-button noise — the source already validates upstream. Drop
peter-evans/create-pull-request and have the workflow commit/push
directly to the default branch when the file changes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 6 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kindling/fronted/fronted.go Outdated
Comment thread kindling/fronted/fronted.go Outdated
Comment thread .github/workflows/refresh-fronted-config.yml Outdated
Adam Fisk and others added 6 commits April 28, 2026 09:55
- Bump kindling pin to pull in the new TransportName type and constants.
- Replace string-keyed enabledTransports with map[kindling.TransportName]bool
  keyed by TransportDomainfront / TransportSmart / TransportAMP /
  TransportDNSTunnel — typo-proof and self-documenting.
- Promote EnabledTransports and SetKindling to exported names so
  cmd/kindling-tester (and any future external tools) can drive the
  package without relying on package-private state.
- cmd/kindling-tester: cast TRANSPORT env var to kindling.TransportName;
  README updated to reflect the renamed transports (proxyless → smart,
  fronted → domainfront).
- Subtest names in client_test.go now match the actual underlying
  kindling transport names instead of the old radiance labels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mainfront

# Conflicts:
#	kindling/client.go
…currency

Address PR review feedback:
- io.LimitReader caps fetched fronted.yaml.gz at 10 MiB so a misconfigured
  or compromised endpoint can't OOM the process.
- writeFileAtomic persists the cache via temp+rename so a crash mid-write
  can't leave a truncated cache that fails to parse on next boot.
- refresh-fronted-config.yml gets a concurrency group and a 3-attempt
  rebase-and-retry on push, so cron + manual dispatch races and concurrent
  commits to the target branch don't leave the embedded config stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the inline writeFileAtomic helper added in 462a23e in favor of
common/atomicfile.WriteFile, which is the convention used by config,
settings, dnstt, vpn/boxoptions, vpn/split_tunnel, and deviceid.
Same temp+sync+rename guarantee, plus Windows handling that the
inline helper didn't have.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton merged commit cca09c6 into refactor Apr 28, 2026
1 check passed
@myleshorton myleshorton deleted the adam/switch-to-domainfront branch April 28, 2026 17:27
myleshorton added a commit that referenced this pull request Apr 28, 2026
)

* feat(ipc): add SSE status event streaming and replace string status with VPNStatus type

Introduce a Server-Sent Events endpoint (/status/events) that streams
VPN status changes to clients in real time, replacing the previous
poll-based approach. Refactor status representation from string constants
(StatusRunning, StatusClosed, etc.) to a typed VPNStatus enum (Connected,
Disconnected, Connecting, Disconnecting, Restarting, ErrorStatus) and
move status emission from the IPC server into the tunnel layer. The
tracer middleware is scoped to standard routes so it no longer buffers
long-lived SSE connections, and the HTTP transport is upgraded to
unencrypted HTTP/2 for multiplexed streaming support.

* fix tests

* address pr comments

* fix test

* refactor!: restructure codebase around LocalBackend and VPNClient patterns

Reorganize the project architecture to establish clear data ownership
and dependency flow, inspired by Tailscale's LocalBackend/localapi
pattern.

* don't update default logger

* remove concept of server groups from tunnel, just use auto/manual

* macos & win lantern/lanternd support

* update README

* add just cmds for building

* pass values from settings to StripeBillingPortalURL

* missed file

* poll datacap info

* use .exe extension on windows

* fix windows service start issue

* add run cmd to service file

* use auto for empty string in connect

* fix daemon service name linux

* cleanup daemon binary on uninstall

* fix tests

* add standalone build tag to cli on macos

* fix tracer name, start telemetry if enabled

* add version cmd, check version before installing

* Apply suggestions from code review

Co-authored-by: Wendel Hime <6754291+WendelHime@users.noreply.github.com>

* remove systemd unit file, resolve dst symlink

* small cleanup

* add default data/log directories for desktop platforms

* fix linux daemon path

* rename PreTest to OfflineTest

* fix status for auto selected server

* wrap inner log writer so published logs are fmted

* pass header in data-cap and update to user data saving

* Merge branch 'main' into 'refactor'

* use publicip in cli, fetch public ip on start

* clean up

* merge in datacap streaming

* servers types marshal/unmarshal funcs, move gostack to lantern

* persist oauth provider

* fix cli vpn connect handler to select server if connected

* clear oauth provider on logout, check error when setting it

* remove redundant peer verification on macos

* expose urltest results

* refactor server manager: remove server group concept

* remove modes from servers

* return tags for added servers

* run initial offline url test on start

* remove isLantern parameters, reject adding server with existing tag

* auto-restart daemon if it's not running

* add getters for servers as json

* set selected server in settings to nil when removed

* add loopback ipc client for mobile

* uninstall existing daemon befor installing

* add selected server json function

* update lantern-box, change windows service name

* fix several bugs, small cleanup

* code review, accept version for makefile/justfile

* fix split tunnel dangling-pointer issues

* add code review from PR fix to main

* prevent scientific notation for numeric settings values

* bundle all .log files for issue report

* lazily init kindling and requesting initial IP/config

* Auth error code issues and token issue

* fix(servers): release write lock during saveServers to prevent reader starvation

Port of PR #416 to the refactor branch. Splits the write lock so
mutators (SetServers, AddServers, RemoveServers) only hold it for the
in-memory mutation, then release before saveServers. saveServers
acquires a brief RLock for marshalling and a separate saveMu for
serializing disk writes, so readers are never blocked by slow fsync.

Includes per-phase timing instrumentation and reader-starvation
detection to help root-cause any future slow cases.

See getlantern/engineering#3176 and Freshdesk #172640.

Co-Authored-By: garmr <garmr@users.noreply.github.com>

* eagerly start kindling and public ip detection

* start autoselected listener

* Pass empty when value is nil

* config: match main's UserID format in ConfigRequest (#420)

* config: match main's UserID format in ConfigRequest

On main, the config request sends `fmt.Sprintf("%d", GetInt64(UserIDKey))`
which yields "0" when the user ID is unset. On refactor, this was changed
to `GetString(UserIDKey)`, which returns "" for an unset key.

For authenticated users the two are equivalent (GetString handles the
JSON-roundtripped float64 case by converting back to int64 decimal), but
the empty-string vs "0" divergence can alter server-side Pro detection
paths that parse the field via strconv.ParseInt and treat empty as
malformed rather than zero.

Restoring main's formulation so the refactor branch is bit-for-bit
compatible with main on this field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* config: assert ConfigRequest.UserID serializes as base-10 decimal

Adds assertions to prevent the format-parity regression from returning:
- TestFetchConfig now asserts confReq.UserID == "1234567890"
- New TestUserIDFormatMatchesMain covers unset (-> "0"), small, and
  large (float64 JSON-roundtrip) values at the exact expression used
  in fetchConfig

Addresses Copilot review on PR #420.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ipc: add /env endpoint to patch env vars at runtime

Allow mutating the in-memory dotenv map over IPC for dev/testing
workflows without restarting the daemon. Adds thread-safe Set/GetAll
to the env package and wires up GET/PATCH /env in the IPC server.

* ipc: add /config/events SSE endpoint for config-change notifications (#422)

On the mobile/macOS refactor, config.NewConfigEvent is emitted inside
the packet-tunnel extension's radiance process. Subscribers in the
host app (e.g. lantern-core, which used to forward these to Flutter
as "config" events) never see them because events.Subscribe is an
in-memory fan-out that does not cross the process boundary.

Mirror the existing /server/auto-selected/events pattern: a dedicated
SSE endpoint that subscribes to NewConfigEvent and streams a frame
per event. The payload is intentionally empty ("{}") — callers only
need to know a change occurred and can fetch fresh state via the
other GET endpoints; streaming the (potentially large) full Config
would waste bandwidth.

Client: Client.ConfigEvents(ctx, handler func()) opens the stream;
the handler fires once per frame until ctx is cancelled.

Part of getlantern/engineering#3182. Companion lantern-core change
wires lc.client.ConfigEvents into the existing listenConfigEvents
path so Flutter's app_event_notifier "config" case fires again.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* add deviceid migration to copy file to new location

* ci: run on every PR regardless of target branch (#431)

PRs targeting any branch other than main (notably our long-lived
'refactor' dev branch) were silently skipping CI because the workflow
filtered pull_request events to 'branches: [main]' only. Drop the
branches filter on pull_request — we want CI on every PR.

Keep the push filter at [main, refactor] so direct pushes to other
branches don't produce ambient CI load.

* deps: bump lantern-box to v0.0.70 + broflake to main (caea079) (#430)

Picks up lantern-box PR #244 (which itself pulls in the broflake main
that has #241 Unbounded outbound + #350 covert-dtls/observability +
#354 pion v4 + quic-go v0.59 with ConnectionID-bump fork preserved).

Also mirrors lantern-box's qpack v0.5.1 replace, for the same reason:
sing-box-minimal/sagernet quic-go http3 needs qpack v0.5.1 API even
though quic-go v0.59.0 requires v0.6.0. Inline comment documents the
exit condition (sagernet/quic-go bump to v0.59.0-sing-box-mod.4 or
later).

Verified:
  GOWORK=off go build \
    -tags "with_gvisor,with_quic,with_dhcp,with_wireguard,with_utls,with_acme,with_clash_api" \
    ./vpn/... ./backend/...

(cmd/lantern's build break is pre-existing on refactor HEAD — unrelated.)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* vpn: direct-transport + streaming wrapper for Unbounded signaling (plus env propagation) (#429)

* feat(settings): runtime-reloadable toggles + set/get CLI

Previously frozen at startup and now re-read on demand:

- config.fetchConfig reads settings.ConfigFetchDisabledKey on each
  call; the cached fetchDisabled field is gone and fetchLoop always
  starts, returning nil silently when disabled so runtime re-enable
  works. pollInterval <= 0 is normalized to the default.
- log.NewLogger installs a slog.Leveler that reads settings.LogLevelKey
  per record; the startup-parsed level becomes the fallback. The NoOp
  short-circuit now gates on env.Testing rather than the Disable level
  so starting at disable does not foreclose later re-enable.
- common.GetVersion reads RADIANCE_VERSION dynamically with the
  build-time Version as fallback; account, telemetry, and vpn header
  fills route through it so /env overrides take effect without
  restart.

* feat(config): add IPC + CLI command to force a config update

 Adds ConfigHandler.Update() which triggers an immediate fetch, returning
 ErrConfigFetchDisabled when config fetching is disabled in settings. The
 method is exposed through the backend as UpdateConfig(), over IPC as
 POST /config/update (mapping the sentinel to HTTP 409), and on the CLI
 as `lantern update-config`.

* don't send config request if one is in flight

* save country and feature overrides env var to settings in ipc patch

* deps: bump sing-box-minimal to v1.12.22-lantern (#435)

Picks up getlantern/sing-box-minimal#41: seed NetworkManager's interface
list synchronously from the platform at startup, fixing the "no available
network interface" flood after Android VPN stop→start cycles (Freshdesk
#173507: 24k errors in 3 min on a Russian user's Lenovo K12 Pro).

* vpn: close MutableGroupManager on tunnel close (#432) (#437)

* vpn: close MutableGroupManager on tunnel close

The tunnel's MutableGroupManager owns a removalQueue goroutine that holds a
reference to the sing-box OutboundManager. Before this change, tearing down
the tunnel (e.g. on SetSmartRouting, routing-mode toggle, or any
VPNClient.Restart) closed libbox but never closed mgm — so its removalQueue
survived into the new tunnel lifecycle, still pointing at the old, already-
Close()'d OutboundManager. Each 5s tick, the queue drained a pending tag and
called outMgr.Remove, which panics inside sing-box-minimal because Close()
nils m.outbounds but leaves m.outboundByTag populated, so the tag is "found"
but the slice lookup returns -1. lantern-box #80 recovered the panic, so it
surfaced as "panic during outbound/endpoint removal error invalid inbound
index" spam rather than a crash.

Register mgm.Close() as a tunnel closer so the queue goroutine exits when
the tunnel does.

Freshdesk #173359, #173158.



* vpn: address PR review — prepend mgm closer, alias groups import

- Prepend the MutableGroupManager closer so it runs before libbox/sing-box
  managers are torn down, shrinking the window where the removalQueue could
  fire against a closing OutboundManager (Copilot review).
- Alias the lantern-box/adapter/groups import as lbgroups to avoid shadowing
  the existing local 'groups' variable in TestUpdateServers (Copilot review).



* vpn: trim comments



---------


(cherry picked from commit 07b948a)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(3263): simplify ruleset guards and strengthen test coverage

 Remove unreachable post-normalization length checks in buildOptions — if
 normalization returns a non-empty slice, ToOptions always yields entries
 — and move the "no valid rules" warning to the else branch that catches
 when input was present but normalization dropped everything.

 Expand TestBuildOptions_Rulesets with a "direct" smart-routing category
 in the test fixture, isolate each subtest with its own config to prevent
 mutation leaks, and drive the ad-block expectation off
 AdBlockRules.ToOptions to avoid RejectActionOptions default-field
 mismatches.

* fix(3265): gate auto-selected polling on VPN connect

 Subscribe to VPN status events and start AutoSelectedChangeListener only
 once the tunnel is Connected, instead of launching it eagerly from
 Start(). Also unexport startAutoSelectedListener since it's only called
 internally.

* emit auto selected server event after url tests

* bump go to v1.26.2

Go v1.26.2 includes a patch to CGo that addresses some of
the bulkBarrierPreWrite panics.

* country env override, urltest history ownership, dep bumps

 - backend: skip config-response country write when RADIANCE_COUNTRY is
   set so the env override is respected for issue reports.
 - vpn: own URLTestHistoryStorage on the tunnel, registering one if the
   context doesn't already carry it, instead of going through
   clashServer.HistoryStorage().
 - deps: bump keepcurrent and lantern-box.

* refactor(ipc): split event streams by platform with retry and local-only support

 Move SSE event stream methods (VPNStatusEvents, AutoSelectedEvents, ConfigEvents,
 DataCapStream) into platform-specific files. Add sseRetryLoop for automatic
 reconnection with backoff, and gate DataCapStream on VPN connected status.
 Mobile builds subscribe to in-process events and short-circuit SSE when the
 client is in local-only mode.

* Fix sign up issue

* backend: treat ConnectVPN while connected as a server swap (#439)

When the Flutter UI picks a new server or toggles Smart Routing while the
tunnel is already up, it calls ConnectVPN with the new tag expecting a
seamless outbound swap. The backend was unconditionally invoking
VPNClient.Connect, which returns ErrTunnelAlreadyConnected, surfacing to
the user as "start service failed: ipc: status 500: failed to connect
VPN: tunnel already connected".

Short-circuit to selectServer when Status() == Connected so the active
outbound switches in place without tearing down and rebuilding the
tunnel, matching the UI's expectation and the comment in
lib/features/vpn/server_selection.dart.

Fixes the IPC errors documented in getlantern/engineering#3291 for
Smart Routing while connected (free) and server selection while connected
(pro).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "backend: treat ConnectVPN while connected as a server swap (#439)"

This reverts commit 42d491c.
SelectServer should be used to manually switch servers while connected.

* add documentation for PlatformInterface

* refactor(vpn): own VPN status on the client so restarts span tunnels

Restart set Restarting on the current tunnel, then close() dropped it
and start() built a new one, so the marker was orphaned on a
torn-down object and observers saw Disconnected -> Connecting ->
Connected instead of Restarting. The platform path has the same
shape because RestartService drives Disconnect + Connect.

Move status to VPNClient (atomic.Value plus a setStatus guard that
lets only Connected or ErrorStatus succeed Restarting). start/close
bracket the tunnel call with the appropriate transitions; tunnel no
longer carries VPNStatus at all, and selectMode gates on lbService.

Also fold Close into Disconnect, move PostServiceClose into close(),
rename ClearNetErrorState to AttemptFixNetState, and collapse the
duplicated tests into subtests.

* vpn: instrument tunnel.start phases + VPNClient.Restart (#443)

* vpn: instrument tunnel.start phases + VPNClient.Restart (#3299)

Port of #442 to the refactor branch. Adds child spans around the phases
inside tunnel.start so we can attribute the 10s+ tail observed on
/service/start (max 11.25s across 170 calls in 24h, matching Freshdesk
#173696).

This branch's in-process VPNClient.Restart also had no span — the whole
settings-toggle restart path was invisible in SigNoz. Wrapped it
end-to-end so restart latency shows up alongside connect latency, with
a path=direct|platform_ifce attribute to distinguish the two flows.

- VPNClient.Restart span wrapping close+start (new)
- tunnel.start span (options_size, platform, is_restart attributes)
- tunnel.init + tunnel.connect spans
- child spans: libbox.Setup, libbox.NewServiceWithContext,
  libbox.BoxService.Start, newMutableGroupManager

Note: this branch has no loadURLTestHistory call (urltest history is
pulled from context via service.FromContext, not from disk), so that
phase is absent compared to main. The set of span names otherwise
matches #442 so dashboards built against main work here too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* vpn: address review feedback on #443

Three fixes from Copilot:

1. tunnel.init: record errors on the init span via deferred closure
   on the named return — previously failures only showed up on child
   spans (libbox.Setup, libbox.NewServiceWithContext) but the phase
   span itself stayed green.

2. tunnel.connect: same issue — the panic-recovery path sets the
   named err, but the span wasn't marked errored. Added a deferred
   error-recording closure before the recover closure so the recover
   runs first (LIFO) and the span-recording sees the post-recover err.

3. tunnel.start is_restart attribute: VPNClient.Restart creates a
   fresh tunnel{} via c.start, so t.status is always the zero value
   (never Restarting) when t.start is called — is_restart was always
   false. Replaced the status sniff with an explicit isRestart
   parameter threaded through VPNClient.start → tunnel.start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(vpn): treat the empty string as AutoSelect in SelectServer

* fix(backend): treat empty tag as AutoSelect in LocalBackend.SelectServer (#444)

fac9089 normalized the empty-string → AutoSelectTag convention in
VPNClient.SelectServer, but LocalBackend.SelectServer — which wraps
vpnClient.SelectServer and then performs its own settings update —
still tag-compared against vpn.AutoSelectTag directly. With tag == ""
the wrapper's vpnClient.SelectServer call succeeds (fac9089 handles it),
then the outer

    if tag == vpn.AutoSelectTag { ...auto path... return nil }

check is false (tag is still "") and execution falls through to
srvManager.GetServerByTag(""), which isn't found, returning
"no server found with tag " (with trailing space). The IPC layer
propagates the error as HTTP 500.

Reproduced on Lantern 9.0.30 (getlantern/lantern@6de3c9aa9 refactor)
when the user clicks Smart from a live tunnel:

    ffi.go:startVPN → c.ConnectVPN("")
    → LanternCore.ConnectVPN → vpn_tunnel.ConnectToServer
    → VPNStatus == Connected → client.SelectServer(ctx, "")
    → POST /server/selected {Tag: ""}
    → LocalBackend.SelectServer("") ← bug site

Surfaces as "start service failed: ipc: status 500: no server found
with tag " in the Dart UI.

Fix: normalize tag == "" to vpn.AutoSelectTag at the top of
LocalBackend.SelectServer, mirroring the same normalization in
LocalBackend.ConnectVPN. Finishes fac9089's intent by aligning the
outer wrapper with VPNClient.SelectServer's behavior.

Internal tester report: getlantern/lantern's Freshdesk #173773.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* centralize file permissions in common/fileperm

Introduce a build-tagged fileperm.File constant so application-owned files
use 0600 on Linux, Windows, and standalone macOS, and 0644 on mobile and
non-standalone macOS where other sandbox processes need read access.

* docs: prune redundant doc comments and document AGENTS.md style

Drop comments that merely restate identifier names, and tighten the few
that remain to document contracts rather than mechanism. AGENTS.md now
records the comment and Go doc-comment guidelines the cleanup applies.

* feat(vpn): persist URL test results across tunnel close/open

Seed the tunnel's URL test history storage from servers.Server on
init so prior latency results survive reconnects, and coalesce hook
notifications into a periodic flush so per-result writes don't
re-marshal the servers file for each parallel test completion.
UpdateURLTestResults now persists to disk.

* feat(vpn): wire InitialServer for startup server selection

InitialServer fixes the outbound selected at tunnel start, replacing the
post-Connect SelectServer round-trip; a stub clashServer plus a CacheFile
wrapper own the selection so libbox's on-disk last-selected value can't
override it. Also re-attaches the URL-test listener on Connected to fix
a race where unordered Restarting/Connected events could leave it bound
to a closed storage.

* backend: stop logging ErrTunnelAlreadyConnected as an error after config (#446)

When a fresh /v1/config-new response arrives while the VPN is up:

  1. setServers(list, true) runs first, which calls
     vpnClient.UpdateOutbounds(list) → tunnel.updateOutbounds → addOutbounds.
     addOutbounds loads the new outbounds into the running sing-box,
     installs the bandit URL overrides on the AutoSelect group via
     mutGrpMgr.SetURLOverrides, and (if any overrides were present)
     synchronously triggers an immediate URL test cycle via
     mutGrpMgr.CheckOutbounds — see vpn/tunnel.go:436-450.
  2. Then RunOfflineURLTests() runs and is gated by
     `if c.tunnel != nil` in vpn.go:462, returning
     ErrTunnelAlreadyConnected.

So the offline pre-warm is intentionally skipped while the tunnel is
up — the in-tunnel path already covered it. But we were logging the
expected sentinel as level=ERROR, which made it look like URL tests
weren't running after a config update. They are: just via the running
sing-box's URLTest selector instead of the offline pre-warm code path.

Skip the log when the error is ErrTunnelAlreadyConnected; keep it for
genuine failures (e.g. "offline tests already running"). Behavior is
unchanged — just stops a misleading ERROR line that's been showing up
on every config refresh while connected.

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: require subagent verification of comment edits

Self-review by the Claude that wrote a comment is a known failure mode
and doesn't catch the violations the Code Comments checklist is meant
to prevent. Mandate spawning a code-reviewer subagent on the diff
instead.

* kindling: switch from getlantern/fronted to getlantern/domainfront (#426)

* kindling: switch from getlantern/fronted to getlantern/domainfront

domainfront is the clean-room successor to fronted: no global state,
context-driven lifecycle, atomic config replace, and meaningfully lower
allocation pressure on Android. See getlantern/domainfront for details.

- kindling/fronted/fronted.go: NewFronted now returns a *domainfront.Client
  instead of a fronted.Fronted. Drops the panicListener parameter
  (domainfront has no equivalent — its goroutines are owned by the Client
  and shut down cleanly via the context/Close).
- Fetch the initial fronted.yaml.gz synchronously (30s timeout) via the
  smart dialer, preserving the pre-existing behavior of bypassing DNS-level
  blocking for raw.githubusercontent.com on the initial download. The same
  smart-dialer-backed HTTP client is reused for domainfront's internal 12h
  config refresh loop.
- Introduce a small bypassDialer struct wrapping bypass.DialContext so it
  satisfies domainfront.Dialer (which takes an interface, not a function).
- Update callers in kindling/client.go and config/fetcher_test.go for the
  new signature.

Requires the matching kindling bump (getlantern/kindling#31).

* fronted: embed fronted.yaml.gz as last-resort initial config

domainfront itself doesn't embed a config — its doc says the caller
provides the "initial (typically embedded) configuration". The old
getlantern/fronted package we're replacing did embed it. Without an
embedded fallback, a fresh install where raw.githubusercontent.com is
blocked from the first boot can't initialize at all (fetch fails, no
on-disk cache from prior run, no embedded copy).

Add //go:embed fronted.yaml.gz and have loadCachedConfig fall through
to the embedded bytes when both the live fetch and on-disk cache miss.
Refresh the committed copy periodically from the getlantern/fronted
main branch — same maintenance pattern the old package had.

* ci: daily refresh of embedded fronted.yaml.gz

Mirror the daily updates getlantern/fronted gets from its external
pipeline, so the embedded fallback config stays close to upstream.
Workflow runs at 04:30 UTC (a few hours after the upstream ~00:15 UTC
commit), downloads the latest fronted.yaml.gz, validates it parses as
gzip, and opens a PR if it differs from the committed copy.

Manual workflow_dispatch is also enabled for ad-hoc refreshes.

* ci: direct-commit fronted.yaml.gz refresh instead of opening PR

A daily PR with a known-trusted gzipped binary from another repo is
just merge-button noise — the source already validates upstream. Drop
peter-evans/create-pull-request and have the workflow commit/push
directly to the default branch when the file changes.

* better test and code cleanup

* kindling: switch enabledTransports map to typed kindling.TransportName

- Bump kindling pin to pull in the new TransportName type and constants.
- Replace string-keyed enabledTransports with map[kindling.TransportName]bool
  keyed by TransportDomainfront / TransportSmart / TransportAMP /
  TransportDNSTunnel — typo-proof and self-documenting.
- Promote EnabledTransports and SetKindling to exported names so
  cmd/kindling-tester (and any future external tools) can drive the
  package without relying on package-private state.
- cmd/kindling-tester: cast TRANSPORT env var to kindling.TransportName;
  README updated to reflect the renamed transports (proxyless → smart,
  fronted → domainfront).
- Subtest names in client_test.go now match the actual underlying
  kindling transport names instead of the old radiance labels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* go.mod: repin kindling to merged main after kindling#31

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fronted: bound config download size, atomic cache write, workflow concurrency

Address PR review feedback:
- io.LimitReader caps fetched fronted.yaml.gz at 10 MiB so a misconfigured
  or compromised endpoint can't OOM the process.
- writeFileAtomic persists the cache via temp+rename so a crash mid-write
  can't leave a truncated cache that fails to parse on next boot.
- refresh-fronted-config.yml gets a concurrency group and a 3-attempt
  rebase-and-retry on push, so cron + manual dispatch races and concurrent
  commits to the target branch don't leave the embedded config stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fronted: reuse common/atomicfile.WriteFile for cache persistence

Drop the inline writeFileAtomic helper added in 462a23e in favor of
common/atomicfile.WriteFile, which is the convention used by config,
settings, dnstt, vpn/boxoptions, vpn/split_tunnel, and deviceid.
Same temp+sync+rename guarantee, plus Windows handling that the
inline helper didn't have.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Adam Fisk <afisk@mini.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* reporting: drop sentry, keep slog-only PanicListener

Aligns refactor with main's #418 (remove sentry references) ahead of the
refactor → main merge in PR #370. Init() and PanicListener() keep their
existing signatures so callers (kindling.WithPanicListener, common.Init)
don't change. PanicListener now logs via slog at log.LevelPanic with the
captured stack instead of CaptureMessage + Flush. Init() is a no-op
beyond logging the version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: tighten comments in kindling and reporting packages

Drop comments that restated identifier names, narrated the next visible
line, or carried stale migration history (Sentry removal, getlantern/fronted
predecessor, defunct Warm() reference). Rewrite remaining doc comments to
lead with the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wendel Hime <6754291+WendelHime@users.noreply.github.com>
Co-authored-by: Jigar-f <jigar@getlantern.org>
Co-authored-by: myleshorton <myles@getlantern.org>
Co-authored-by: garmr <garmr@users.noreply.github.com>
Co-authored-by: Myles Horton <afisk@getlantern.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Adam Fisk <afisk@mini.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants