Skip to content

MWPW-194951 Add auto-detect Lingo to CaaS bulk publisher#6142

Open
sheridansunier wants to merge 10 commits into
adobecom:stagefrom
sheridansunier:MWPW-194951
Open

MWPW-194951 Add auto-detect Lingo to CaaS bulk publisher#6142
sheridansunier wants to merge 10 commits into
adobecom:stagefrom
sheridansunier:MWPW-194951

Conversation

@sheridansunier

@sheridansunier sheridansunier commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an Auto-detect Lingo checkbox to the bulk publisher UI (next to Dry Run). When checked, the tool consults lingo-site-mapping.json to determine Language First Localization automatically rather than relying on the manual Language First checkbox.
  • Refactors isLingoLangFirstPath in utils.js to return true / false / null (null = origin not yet in the prod mapping), allowing safe handling of data streams that are staged but not yet onboarded to lingo.
  • Removes the hard-coded bacom-only gate; any origin present in the prod mapping is now handled automatically.
  • news is special-cased to always defer to the manual checkbox, since it operates outside the lingo mapping.

Jira Ticket

MWPW-194951

Auto-detect behavior matrix

Auto-detect Origin In mapping? Result
off any languageFirst checkbox value
on news languageFirst checkbox value
on other yes mapping result (true / false)
on other no (null) false (non-LFL)

Key points:

  • news never touches isLingoLangFirstPath — it always defers to the manual checkbox.
  • When auto-detect is on and an origin is absent from lingo-site-mapping.json (e.g. a new data stream being staged before prod onboarding), the tool defaults to non-LFL rather than silently misidentifying the locale.
  • The manual languageFirst checkbox remains fully functional as the primary control when auto-detect is off.

Test plan

  • Unit tests cover all four matrix cases (test/tools/send-to-caas/bulk-publish-to-caas.test.js)
  • Unit tests cover isLingoLangFirstPath return shapes: null (unknown origin), true (LFL baseSite/regional), false (English regional), false (fetch error) (test/blocks/caas/utils.test.js)
  • Manual: open bulk publisher, check Auto-detect Lingo → publish a bacom /de/ page → verify LFL lang/country in CaaS
  • Manual: uncheck Auto-detect Lingo → publish same page with Language First unchecked → verify non-LFL output
  • Manual: publish a news URL with auto-detect on → verify manual Language First checkbox controls the result

🤖 Generated with Claude Code

@jedjedjedM

Copy link
Copy Markdown
Contributor

Test mock regionalSites format doesn't match live data (utils.test.js, bulk-publish-to-caas.test.js)

Live data uses leading slashes and spaces:
"/ae_en, /africa, /ca, /be_en, ..."
Both test mocks use bare, space-free values:
regionalSites: 'at' // no slash
regionalSites: 'gb,au' // no slashes, no spaces
If isLocaleInRegionalSites doesn't normalize both formats, the function could silently return wrong results against real data. The mocks should mirror real data to make this guarantee visible and prevent future regressions.

@jedjedjedM

Copy link
Copy Markdown
Contributor

Via: Claude

I think these three items are worth considering

The news hardcode is redundant but intentional — document it
news is not in site-query-index-map, so isLingoLangFirstPath('news', ...) already returns null. However, the null fallback defaults to false (non-LFL), while the news special case preserves the manual checkbox value. That's a meaningful distinction and the right behavior — but it's not obvious from the code why news gets special treatment. A short comment explaining why would prevent future readers from removing it.

No fetch caching
isLingoLangFirstPath() fetches lingo-site-mapping.json on every call. In a bulk publish job across dozens or hundreds of rows, this fires one CDN request per row. The mapping JSON should be fetched once per session and memoized — a module-level cache variable with a single fetch would suffice.

Tri-state return (true / false / null) needs a JSDoc contract
null here means "origin not in mapping" — meaningfully different from false which means "known non-LFL." Callers that check if (!result) will correctly fall back, but the intent is invisible. Add a @returns JSDoc describing all three states.

Some of these might be intentional or a non-issue, but worth evaluating:

Four origins in site-locales have no entry in site-query-index-map
edu, genuine, cc, and federal all have locale data in site-locales but no caasOrigin in site-query-index-map. The lookup by CaaS origin will never resolve them, so auto-detect will always return null for these — silently falling back to non-LFL. If these origins are active in the bulk publisher, auto-detect is dead code for them. Worth either adding their caasOrigin to the mapping JSON or documenting the gap.

Four site-query-index-map entries have an empty caasOrigin
upp, events-milo, da-genuine, and dexter all have "caasOrigin": "". A lookup by origin string will never match an empty string, so auto-detect effectively does nothing for them either.

Most non-da-bacom origins only have /fr mapped
da-cc (hawks), da-dc (doccloud), milo, and others only have a single /fr → regional sites entry in site-locales. If these origins publish to /de, /es, /pt, etc., the function will return false (not LFL) — which may be correct today but could silently misclassify if those sites add LFL coverage later. This is a data completeness issue, but worth noting so the team knows auto-detect coverage is narrower than it appears.

sheridansunier and others added 10 commits June 22, 2026 11:58
Rename isLingoEnglishRegionalSite → isLingoLangFirstPath with corrected
logic (positive assertion: locale is in lingo mapping AND not an English
regional site). Restructure getCountryAndLang to route bulk-publisher
requests through getBulkPublishLangAttr first, add IETF fallback for
language-code URL prefixes, and apply per-URL LFL detection in processData
for bacom repos.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add explicit opt-in checkbox (next to Dry Run) that queries the prod
lingo-site-mapping to determine Language First per URL automatically.
Removes the hard-coded bacom-only gate; the mapping is now the sole
source of truth for all hosts. isLingoLangFirstPath returns null when
an origin is absent from the mapping (unboarded data stream), which
triggers non-LFL fallback when auto-detect is on. news always defers
to the manual Language First checkbox regardless of auto-detect state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atrix

isLingoLangFirstPath: 5 tests covering null (origin absent), true (LFL
baseSite and regional), false (English regional), false (fetch failure).

getBulkPublishLangAttr: 5 tests covering all four matrix rows — auto-detect
off respects manual checkbox, news always uses manual checkbox, origin in
mapping overrides languageFirst=false, origin not in mapping forces non-LFL
even when languageFirst=true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… remove duplicate mapping call

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ading slash, spaces)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…auto opacity style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sanrai sanrai requested a review from jedjedjedM June 24, 2026 17:05
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This PR has not been updated recently and will be closed in 7 days if no action is taken. Please ensure all checks are passing, https://github.com/orgs/adobecom/discussions/997 provides instructions. If the PR is ready to be merged, please mark it with the "Ready for Stage" label.

@github-actions github-actions Bot added the Stale label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants