Skip to content

Client-Originated PIM Register beacon (implementation for RFC-22)#3959

Open
ben-malbeclabs wants to merge 13 commits into
mainfrom
bc/rfc22-pim-register-impl
Open

Client-Originated PIM Register beacon (implementation for RFC-22)#3959
ben-malbeclabs wants to merge 13 commits into
mainfrom
bc/rfc22-pim-register-impl

Conversation

@ben-malbeclabs

Copy link
Copy Markdown
Contributor

Client-Originated PIM Register beacon (implementation for RFC-22)

Implementation for RFC-22 (PR #3951). On a dual-role (publisher+subscriber) multicast tunnel, the subscriber-side PIM neighbor suppresses the device's pim border-router source injection, so the device never originates the MSDP SA for the published source and it goes dark to the rest of the anycast-RP mesh. This change has doublezerod originate a periodic PIM Register (a "beacon") for its published sources, so the device — receiving a Register — originates the SA regardless of border-router suppression.

Changes

  • pim: PIM Register message serialization + a RegisterSender beacon (periodic, no inbound Register-Stop path; checksum over the first 8 bytes per RFC 7761 §4.9.1; egress pinned to the tunnel via ControlMessage.IfIndex so no 10.0.0.0 route is installed).
  • api: MulticastRpAddress on ProvisionRequest (default 10.0.0.0).
  • services: start the beacon for publishers only.
  • manager/runtime: wire it through; the reconciler validates the built request so the RP defaults (see the bug note below).
  • controller: permit pim any host 10.0.0.0 on publisher tunnels' SEC-USER-PUB-MCAST-IN ACL so the client Register reaches the RP; pim ipv4 border-router retained as a backstop.

How it works

The client encapsulates its existing heartbeat into a PIM Register and unicasts it to the RP over the GRE tunnel. The device, as the anycast RP, learns the source via the Register (C flag), sets the "may notify MSDP" (N) flag, and originates the SA — the exact step border-router can't do when a PIM neighbor is present.

Verification

Unit tests

All client suites (pim, api, services, manager) and the controller golden suite pass (run in a Linux container; the client is Linux-only).

A bug the e2e caught (and this PR fixes)

End-to-end testing surfaced that the reconciler path (buildProvisionRequest) never called Validate(), so in production MulticastRpAddress was nil → the Register failed with "missing address" and was never sent (the unit tests validated manually and missed it). Fixed by validating the reconciler-built request, with a regression test (TestReconcile_ProvisionMulticast_DefaultsRpAddress) that drives the real reconcile path and asserts the RP reaching the register is 10.0.0.0.

End-to-end on real cEOS — the Register on the wire

Captured on the client underlay (eth0), eth0-gre-run2.pcap frame 242 shows the complete protocol stack from RFC-22: a GRE-encapsulated PIM Register to the RP, encapsulating the original heartbeat datagram.

Frame 242: 98 bytes on wire
Ethernet II, Src: a6:d0:b4:f7:af:24, Dst: 56:c3:e2:c2:b2:ee
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 9.210.180.100, Dst: 9.210.180.8   ← GRE delivery (underlay)
    Protocol: Generic Routing Encapsulation (47)
Generic Routing Encapsulation (IP)
    Protocol Type: IP (0x0800)
Internet Protocol Version 4, Src: 169.254.0.1, Dst: 10.0.0.0         ← the Register, unicast to the RP
    Protocol: PIM (103)
Protocol Independent Multicast
    Type: Register (1)
    PIM Options
        Flags: 0x00000000                                           ← Border=0, Null=0 (data-bearing)
Internet Protocol Version 4, Src: 148.51.120.0, Dst: 233.84.178.0    ← encapsulated original datagram
    Protocol: UDP (17)
User Datagram Protocol, Src Port: 5765, Dst Port: 5765
    Length: 12
Data (4 bytes)
    Data: 445a0001                                                  ← the heartbeat ("DZ" + version)

Protocol stack: eth:ethertype:ip:gre:ip:pim:ip:udp:data — exactly the RFC-22 packet diagram.

End-to-end on real cEOS: the MSDP SA flood to a second RP

A two-device run (dz1 + dz2, both anycast RP 10.0.0.0) proves the SA propagates end to end. dz1 originates the source from the client Register, then floods it to dz2.

dz1 (the RP that receives the Register), show ip mroute:

233.84.178.0
  0.0.0.0, RP 10.0.0.0, flags: W
    Incoming interface: Register
  148.51.120.0, flags: SNCP
    Incoming interface: Tunnel500
    RPF route: [U] 148.51.120.0/32 via 169.254.0.1

C = learned from a DR via a register, N = may notify MSDP. Before this change, with border-router suppressed on a dual-role tunnel, the entry never carried N and nothing was originated.

dz2 (the remote anycast RP), show ip msdp sa-cache:

MSDP Source Active Cache
(148.51.120.0, 233.84.178.0), RP 10.0.0.0, heard from 9.210.180.8

dz2's show ip mroute carries flags MPE (M = learned via MSDP) and the MSDP session shows SA Count 1. That is the full chain: client Register, dz1 sets N and originates the SA, MSDP floods it, dz2 installs it.

cEOS constraints (checks that are real-hardware only)

  • ACL binding to the tunnel is unsupported on cEOS. The permit pim any host 10.0.0.0 entry renders correctly in the ACL, but ip access-group SEC-USER-PUB-MCAST-IN in on a tunnel returns "not supported on this hardware platform" in the emulator. That is exactly why the template guards the binding behind NoHardware. The per-entry hit-counter check for the Register permit can only run on real hardware.
  • MSDP peering used the directly-connected addresses (9.210.180.8 and 9.210.180.9), not the loopbacks. cEOS in Docker does not carry loopback-sourced MSDP or BGP TCP between the two containers; only directly-connected traffic forwards. Production peers over loopbacks on a real data plane. The re-point is a devnet-only shim to exercise the flood and does not change the control-plane behavior under test.
  • dz2's mroute incoming interface is Null, the documented cEOS multicast data-plane limitation (e2e/docs/CEOS_MULTICAST_LIMITATION.md). The SA flood proven here is control-plane and works regardless.

Backward compatibility

Belt-and-suspenders: pim ivp4 border-router stays as a backstop, and the ACL permit is additive — every old/new client × old/new device combination is safe (see RFC-22 §Backward Compatibility). No smart-contract change. New optional mcast_rp_address provision field defaults to 10.0.0.0.

🤖 Generated with Claude Code

ben-malbeclabs and others added 10 commits June 30, 2026 22:14
Implements RegisterSender: periodic PIM Register message sender that
beacons publisher groups to the RP so the RP can originate the source
into MSDP. Includes mock-based unit test exercising sendRegister directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cksum test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Export heartbeat.HeartbeatPayload (was unexported heartbeatPayload)
- Add RegisterWriter interface to services/base.go
- Wire register field into MulticastService: ctor param, Setup start,
  Teardown close, UpdateGroups update
- Add mockRegister + TestMulticastSetupStartsRegisterForPublisher;
  remove manager import from test file to avoid arity breakage in Task 5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…anager

Thread services.RegisterWriter through CreateService (new final param),
NetlinkManager struct + NewNetlinkManager (after heartbeat), and the
internal provisionLocked call site. Construct pim.NewRegisterSender()
in runtime/run.go and pass it through NewNetlinkManager. Update all
test call sites (reconciler_test.go, http_test.go) with mock
implementations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `permit pim any host 10.0.0.0` to SEC-USER-PUB-MCAST-IN before the
final deny, so unicast PIM Register packets sent by the client to the RP
(10.0.0.0) are not dropped at the inbound tunnel ACL. The belt-and-
suspenders `pim ipv4 border-router` and SEC-USER-SUB-MCAST-IN are
unchanged.
…pub groups

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…RpAddress defaults

The reconciler path called buildProvisionRequest which constructed an
api.ProvisionRequest without calling Validate(). This left
MulticastRpAddress as nil. When the multicast service then called
register.Start with that nil rp, sendRegister built an ipv4.Header
with a nil Dst and WriteTo failed with "missing address".

The HTTP provisioning path was unaffected because ServeProvision calls
Validate() explicitly (internal/manager/http.go:66).

Fix: build the request into a local var in buildProvisionRequest and
call Validate() before returning, mirroring what the HTTP path does.
Validate() defaults nil MulticastRpAddress to 10.0.0.0.

Regression test TestReconcile_ProvisionMulticast_DefaultsRpAddress
exercises a publisher user through the reconciler and asserts that
both ProvisionRequest.MulticastRpAddress and the rp arg passed to
mockRegisterSender.Start equal net.IPv4(10,0,0,0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ben-malbeclabs ben-malbeclabs self-assigned this Jul 1, 2026
ben-malbeclabs and others added 3 commits June 30, 2026 22:25
The Task 4 rename heartbeatPayload -> HeartbeatPayload missed the test
file references, tripping go-lint typecheck on CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix mockRawConn method alignment flagged by go-lint (gofmt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The controller now renders `permit pim any host 10.0.0.0` in the
device-global SEC-USER-PUB-MCAST-IN ACL (RFC-22), so every e2e agent
config fixture that includes that ACL needs the line. Updates the
multicast, ibrl, and ibrl_with_allocated_addr fixtures to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ben-malbeclabs ben-malbeclabs marked this pull request as ready for review July 1, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant