perf(fs-view): skip bootstrap index init for non-bootstrap tables to cut S3 HEAD calls#19077
perf(fs-view): skip bootstrap index init for non-bootstrap tables to cut S3 HEAD calls#19077wangxianghu wants to merge 2 commits into
Conversation
|
hi @yihua, do you have time to review this ? |
hudi-agent
left a comment
There was a problem hiding this comment.
Thanks for the contribution! This PR substitutes a NoOpBootstrapIndex for non-bootstrap tables to avoid the two storage.exists() probes (billable S3 HEAD requests) that HFileBootstrapIndex's constructor issues on every file-system-view init. The change is behavior-preserving: useIndex() is final in the base class and already requires a configured bootstrap base path, so when the path is absent every index implementation returns false, and all createReader() call sites are guarded by useIndex() — meaning the NoOp's throwing methods are never reached. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
…cut S3 HEAD calls
…cut S3 HEAD calls
Describe the issue this Pull Request addresses
For non-bootstrap tables, skip initializing the real bootstrap index and use a NoOp implementation instead. HFileBootstrapIndex's constructor issues two storage.exists() probes (plus a reflective class load) on every view init; on object stores like S3 each probe is a billable HEAD request, so for the common case where no bootstrap able exists this is a pure waste of latency and money. useIndex() already short-circuits to false when the bootstrap base path is absent, so swapping in NoOpBootstrapIndex is behavior-preserving while cutting those S3 API calls.
Summary and Changelog
Init
BootstrapIndexasNoOpBootstrapIndexwhen is not bootstrap table to avoid call s3 api to save moneyImpact
None
Risk Level
low
Documentation Update
none
Contributor's checklist