Skip to content

perf(fs-view): skip bootstrap index init for non-bootstrap tables to cut S3 HEAD calls#19077

Open
wangxianghu wants to merge 2 commits into
apache:masterfrom
wangxianghu:perf_fsv
Open

perf(fs-view): skip bootstrap index init for non-bootstrap tables to cut S3 HEAD calls#19077
wangxianghu wants to merge 2 commits into
apache:masterfrom
wangxianghu:perf_fsv

Conversation

@wangxianghu

@wangxianghu wangxianghu commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Describe the issue this Pull Request addresses

For non-bootstrap tables, skip initializing the real bootstrap index and use a NoOp implementation instead. HFileBootstrapIndex's constructor issues two storage.exists() probes (plus a reflective class load) on every view init; on object stores like S3 each probe is a billable HEAD request, so for the common case where no bootstrap able exists this is a pure waste of latency and money. useIndex() already short-circuits to false when the bootstrap base path is absent, so swapping in NoOpBootstrapIndex is behavior-preserving while cutting those S3 API calls.

Summary and Changelog

Init BootstrapIndex as NoOpBootstrapIndex when is not bootstrap table to avoid call s3 api to save money

Impact

None

Risk Level

low

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@wangxianghu wangxianghu changed the title perf(bootstrapIndex):Use NoOpBootstrapIndex when is not bootstrap tabl… perf(query):Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists Jun 26, 2026
@wangxianghu wangxianghu changed the title perf(query):Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists fix(query):Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists Jun 26, 2026
@wangxianghu

Copy link
Copy Markdown
Contributor Author

hi @yihua, do you have time to review this ?

@wangxianghu wangxianghu changed the title fix(query):Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists fix(query): Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists Jun 26, 2026
@wangxianghu wangxianghu changed the title fix(query): Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists fix(query): Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists Jun 26, 2026
@wangxianghu wangxianghu changed the title fix(query): Use NoOpBootstrapIndex when is not bootstrap table to avoid check file exists perf(fs-view): skip bootstrap index init for non-bootstrap tables to cut S3 HEAD calls Jun 26, 2026

@hudi-agent hudi-agent left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ 🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR substitutes a NoOpBootstrapIndex for non-bootstrap tables to avoid the two storage.exists() probes (billable S3 HEAD requests) that HFileBootstrapIndex's constructor issues on every file-system-view init. The change is behavior-preserving: useIndex() is final in the base class and already requires a configured bootstrap base path, so when the path is absent every index implementation returns false, and all createReader() call sites are guarded by useIndex() — meaning the NoOp's throwing methods are never reached. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

@github-actions github-actions Bot added the size:S PR with lines of changes in (10, 100] label Jun 27, 2026
@hudi-bot

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua

yihua commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

hi @yihua, do you have time to review this ?

Thanks for raising this PR. Let me take a look. Previously we had this temporary fix #12076 but there was concern around the impact on existing tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants