Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
9ad33d3
feat: bump windows image version for 2026-03B (#8074)
rchincha Mar 13, 2026
609fdad
feat(rcv1p): unify cert bootstrap flow and add Windows CA refresh task
rchincha Mar 16, 2026
b480d82
feat: enhance CA certificates refresh task with endpoint mode based o…
rchincha Mar 18, 2026
a0ee082
feat: add tests for certificate endpoint mode handling in AKS custom …
rchincha Mar 19, 2026
0b781af
feat: simplify certificate endpoint mode handling and refresh task re…
rchincha Mar 19, 2026
3ee2a7a
feat: implement conditional CA certificates refresh task registration…
rchincha Mar 19, 2026
b316838
feat: enhance CA certificates refresh task registration for legacy CS…
rchincha Mar 19, 2026
b738dee
feat: update tests for certificate endpoint mode handling and refresh…
rchincha Mar 20, 2026
4a64c63
feat: refactor test setup functions for improved readability and cons…
rchincha Mar 20, 2026
852b7e6
feat: update Get-CustomCloudCertEndpointModeFromLocation to clarify e…
rchincha Mar 20, 2026
16b6cf3
feat: enhance tests for Should-InstallCACertificatesRefreshTask and G…
rchincha Mar 20, 2026
71d8c41
feat: update cse_cmd.sh and cse_cmd.sh.gtpl to ensure consistent logg…
rchincha Mar 25, 2026
93440be
feat: update CA certificates functions for backward compatibility wit…
rchincha Mar 26, 2026
7e94123
feat: remove deprecated Ubuntu repository initialization logic from i…
rchincha Mar 27, 2026
25fe4b6
Split init-aks-custom-cloud.sh to fix Flatcar/ACL customData size limit
rchincha Apr 2, 2026
e78ca61
feat(e2e): add RCV1P cert mode end-to-end tests
rchincha Apr 13, 2026
daec885
Address PR review feedback: fix multi-subscription, validation, and e…
rchincha Apr 14, 2026
97a1576
Add Windows not-opted-in negative test for RCV1P cert mode
rchincha Apr 14, 2026
a28917b
e2e: add VM instance-level tag update for RCV1P wireserver opt-in
rchincha Apr 16, 2026
38bf707
e2e: use JSON injection for VM profile tags at VMSS creation time
rchincha Apr 16, 2026
b83cf88
e2e: use lightweight PATCH for VM instance tags instead of JSON injec…
rchincha Apr 16, 2026
48bae5a
Revert "e2e: use lightweight PATCH for VM instance tags instead of JS…
rchincha Apr 16, 2026
d0fa085
e2e: use Microsoft.Resources/tags API for VM instance tag patching
rchincha Apr 16, 2026
195e476
e2e: use BeginUpdate + deferred CSE for VM instance tagging
rchincha Apr 16, 2026
a495382
e2e: add feature flag check for RCV1P subscription
rchincha Apr 17, 2026
2c6b703
REVERT ME: poll wireserver IsOptedInForRootCerts with retry loop
rchincha Apr 17, 2026
0d1694a
e2e: always log PlatformSettingsOverride feature flag status
rchincha Apr 17, 2026
f815b79
fix(windows): parse wireserver IsOptedInForRootCerts JSON with Conver…
rchincha Apr 17, 2026
bb594e1
e2e: make RCV1P_SUBSCRIPTION_ID optional with feature flag auto-detec…
rchincha Apr 18, 2026
d40f6e7
e2e: always collect Windows CSE logs (not just on failure)
rchincha Apr 18, 2026
40e7553
fix: add wireserver HTTP error diagnostic logging for cert endpoints
rchincha Apr 19, 2026
9742f2c
e2e: use testDir() for Windows CSE output log path consistency
rchincha Apr 20, 2026
f7a6079
fix(e2e): filter CSE extension to fix empty Windows CSE log files
rchincha Apr 21, 2026
bb5fd8f
fix(e2e): re-fetch VM instance view for fresh CSE extension status
rchincha Apr 21, 2026
2131c91
e2e: trim whitespace from RCV1P_SUBSCRIPTION_ID to fix gating
rchincha Apr 21, 2026
f82563f
e2e: add gen2 Windows RCV1P tests, fix Windows2025 TrustedLaunch
rchincha Apr 22, 2026
f86de39
e2e: switch RCV1P tests to Azure CNI Overlay to fix IP exhaustion
rchincha Apr 22, 2026
bea6624
e2e: revert RCV1P from overlay back to kubenet
rchincha Apr 22, 2026
6ef8fa3
REVERT ME: use dedicated kubenet cluster for RCV1P tests
rchincha Apr 23, 2026
bd71472
REVERT ME: use Azure CNI cluster for Windows RCV1P tests
rchincha Apr 23, 2026
cfa3152
REVERT ME: add wireserver endpoint diagnostics to Windows RCV1P valid…
rchincha Apr 23, 2026
dfc79bb
fix: use correct wireserver JSON field name for rcv1p cert download
rchincha Apr 23, 2026
89afde4
REVERT ME: add azcopy error logging to Windows log collection
rchincha Apr 23, 2026
08954ba
REVERT ME: enable verbose test output for azcopy/wireserver diagnostics
rchincha Apr 23, 2026
d863e83
REVERT ME: canary check to prove whether SSH validators are broken
rchincha Apr 24, 2026
ad7cb38
Remove canary check - validators confirmed working
rchincha Apr 24, 2026
6150672
fix: make wireserver cert retrieval failures fatal on Linux
rchincha Apr 24, 2026
d5927c3
revert: remove diagnostic commits used during RCV1P development
rchincha Apr 25, 2026
0ef6dd0
fix: make wireserver unreachable fatal for RCV1P opt-in check
rchincha Apr 26, 2026
268c891
fix: use RCV1P Azure CNI cluster for Windows tests when explicit subs…
rchincha Apr 27, 2026
147cfe1
fix: replace legacy ca-refresh cron entry with location-aware version
rchincha Apr 27, 2026
8685a99
fix: align Windows wireserver retries to 10 to match Linux parity
rchincha Apr 27, 2026
53e488b
fix: enhance RCV1P opt-in tag handling in VMSS creation process
rchincha Apr 29, 2026
4015224
fix: use Azure CNI cluster for Windows RCV1P tests
rchincha May 6, 2026
85dd9c9
revert: drop 'REVERT ME' cluster switching commits (now superseded)
rchincha May 6, 2026
e1f5b85
revert: drop canary validator and wireserver polling debug commits
rchincha May 6, 2026
9913e0d
feat(e2e): auto-detect RCV1P feature flag on E2E subscription
rchincha May 7, 2026
8ed0fe1
fix(e2e): skip NotOptedIn tests on auto-detected enrolled subscriptions
rchincha May 7, 2026
476efb2
fix(e2e): use caller context in getCustomScriptExtensionStatus
rchincha May 7, 2026
daeab33
fix(e2e): remove TrustedLaunch from non-Gen2 Windows 2025 RCV1P test
rchincha May 7, 2026
970bfba
fix: return code 2 when wireserver is unreachable in is_opted_in_for_…
rchincha May 7, 2026
ce58de7
fix: throw when opted-in but no certs downloaded with -FailOnError
rchincha May 7, 2026
d149247
e2e: use branch-built CSE zip for Windows RCV1P tests
rchincha May 7, 2026
f9e6702
fix: parse wireserver IsOptedInForRootCerts JSON response with jq
rchincha May 8, 2026
a33ed3f
fix(e2e): update BootstrapConfigMutator signatures after rebase
rchincha May 8, 2026
91f842d
fix: fail process_cert_operations when no cert bodies are saved
rchincha May 8, 2026
e833c14
fix: pass repodepot_endpoint explicitly to add_key_ubuntu and add_ms_…
rchincha May 8, 2026
2f2e091
chore(e2e): remove REVERT ME wireserver diagnostic block from Windows…
rchincha May 8, 2026
5270e32
fix: guard against unresolved ADO pipeline variable expressions in RC…
rchincha May 18, 2026
f465e4e
fix: update for main branch API changes (getClusterVNet, remove Windo…
rchincha Jun 1, 2026
de3ffe8
fix: fail fast if LOCATION is empty when installing ca-refresh schedule
rchincha Jun 2, 2026
1e93ea2
e2e: filter transient waagent ProtocolError in ValidateWaagentLog
rchincha Jun 2, 2026
35e1d3d
e2e: simplify RCV1P to single-subscription-per-job model
rchincha Jun 2, 2026
7f82fad
init-aks-custom-cloud: add telemetry events for cert provisioning
rchincha Jun 4, 2026
e612a60
e2e: skip NotOptedIn tests when tags are auto-injected
rchincha Jun 11, 2026
dc31b21
e2e: use ab-e2e-tme-rcv1p variable group for RCV1P pipeline
rchincha Jun 15, 2026
2b8f638
fix: correct typo 'usuable' → 'usable' in chrony comment
rchincha Jun 17, 2026
b55d529
fix: remove duplicate Register-NodeResetScriptTask call in BasePrep
rchincha Jun 17, 2026
45af31e
style: add missing space before inline comment in rcv1p win test
rchincha Jun 17, 2026
d0e9d44
fix: fail hard if legacy CA cert trust store install fails
rchincha Jun 17, 2026
d503293
fix: remove confusing '(true on MSFT tenant)' from displayName
rchincha Jun 18, 2026
0dedd6e
docs: add comment explaining rcv1pTagsAutoInjected=false
rchincha Jun 18, 2026
7fd3907
fix: remove redundant cron schedule from e2e-rcv1p pipeline
rchincha Jun 18, 2026
b6e6361
fix: reword skip message to reference environment, not tenant
rchincha Jun 18, 2026
7d81e39
fix: parse AFEC feature flag response as JSON instead of string contains
rchincha Jun 18, 2026
0f78241
docs: clarify deferred extension pattern is E2E-specific
rchincha Jun 18, 2026
44c32b7
fix: remove vmssResp2 code smell, assign directly to vmssResp
rchincha Jun 18, 2026
2634bb5
fix: add logs_to_events telemetry for chrony restart
rchincha Jun 18, 2026
749c990
fix: guard initAKSCustomCloudRepos with IsAKSCustomCloud in nodecusto…
rchincha Jun 18, 2026
c02ec7c
fix(e2e): expose subscriptionId parameter on e2e-tme.yaml
rchincha Jun 25, 2026
188c615
fix(init-aks-custom-cloud): use fixed-string grep for crontab cleanup
rchincha Jun 25, 2026
abd9e61
refactor(e2e): remove dead per-scenario SubscriptionID override
rchincha Jun 25, 2026
a0f3726
test(parts): add ShellSpec coverage for init-aks-custom-cloud-repos.sh
rchincha Jun 25, 2026
ef75027
fix(e2e): close file immediately in CSE zip walk to avoid FD accumula…
rchincha Jun 25, 2026
0b7ff16
test(cse-windows): re-stub Set-ExitCode after dot-sourcing to prevent…
rchincha Jun 25, 2026
f642106
fix(e2e): override E2E_SUBSCRIPTION_ID pipeline variable with subscri…
rchincha Jun 25, 2026
71b7776
fix(rcv1p): fail hard when installing rcv1p CA certs to trust store f…
rchincha Jun 25, 2026
f056ae6
fix(e2e): use empty default for subscriptionId param to avoid cyclica…
rchincha Jun 25, 2026
0811f20
fix(rcv1p): address copilot review comments
rchincha Jun 25, 2026
e94da52
fix(e2e): override RCV1P-incompatible settings when running TME e2e a…
rchincha Jun 26, 2026
625badf
fix(e2e): detect RCV1P sub via E2E_SUBSCRIPTION_ID_RCV1P instead of h…
rchincha Jun 26, 2026
f20c87e
docs(e2e): expand reviewer comments on RCV1P override block
rchincha Jun 26, 2026
3f1c640
fix(e2e): make blob storage account name subscription-unique
rchincha Jun 26, 2026
669db6c
docs(e2e): correct RCV1P Windows test docstrings
rchincha Jun 26, 2026
2926323
diag(windows): pre-write provision.complete in CSE catch block
rchincha Jun 26, 2026
30ab591
Revert "diag(windows): pre-write provision.complete in CSE catch block"
rchincha Jun 27, 2026
7e59539
fix(windows): always refresh CSE scripts when explicit zip URL provided
rchincha Jun 27, 2026
18ddf06
fix(e2e): truncate BlobStorageAccount base to fit Azure 24-char limit
rchincha Jun 27, 2026
555e114
fix(shell): make check_url POSIX-compliant in init-aks-custom-cloud-r…
rchincha Jun 27, 2026
9f4af93
Revert "fix(windows): always refresh CSE scripts when explicit zip UR…
rchincha Jun 27, 2026
9f5e36b
diag(windows): pre-write provision.complete at top of script (REVERT ME)
rchincha Jun 27, 2026
70ab404
Revert "diag(windows): pre-write provision.complete at top of script …
rchincha Jun 27, 2026
9f963d5
diag(windows): wrap CSE script resolution + dot-source in try/catch t…
rchincha Jun 27, 2026
b450d67
BISECT (REVERT ME): reset staging/cse/windows/kubernetesfunc.ps1 to m…
rchincha Jun 27, 2026
99068ce
fix(e2e): self-grant Storage Blob Data Contributor on per-sub storage…
rchincha Jun 27, 2026
85a91bf
fix(e2e): bump RCV1P cluster names v1 -> v2 to refresh firewall rule
rchincha Jun 27, 2026
af51a92
Revert "BISECT (REVERT ME): reset staging/cse/windows/kubernetesfunc.…
rchincha Jun 27, 2026
8ee90d4
Revert "diag(windows): wrap CSE script resolution + dot-source in try…
rchincha Jun 27, 2026
ad7e2d1
fix(e2e): reconcile shared firewall app rules when blob storage FQDN …
rchincha Jun 27, 2026
b5b33ee
fix(e2e): ensure shared storage account exists before Windows CSE zip…
rchincha Jun 28, 2026
1b4e514
fix(custom-cloud-repos): emit telemetry event before exit on repo ini…
rchincha Jun 29, 2026
2cf6757
docs(init-aks-custom-cloud): align IsOptedInForRootCerts comment with…
rchincha Jun 29, 2026
9923f06
fix(rcv1p): propagate errors from install_certs_to_trust_store
rchincha Jun 29, 2026
7423215
fix(rcv1p,windows): reject CA filenames containing path separators
rchincha Jun 29, 2026
f345afa
test(windows): remove redundant Set-ExitCode stub before dot-source
rchincha Jun 29, 2026
9bd95a7
fix(rcv1p): skip cert.pem copy when source and dest are the same file
rchincha Jun 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .pipelines/e2e-rcv1p.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: $(Date:yyyyMMdd)$(Rev:.r)
variables:
TAGS_TO_RUN: "rcv1pcertmode=true"
SKIP_E2E_TESTS: false
E2E_GO_TEST_TIMEOUT: "75m"
trigger: none
pr: none
jobs:
- template: ./templates/e2e-template.yaml
parameters:
name: RCV1P Cert Mode Tests
IgnoreScenariosWithMissingVhd: false

@r2k1 r2k1 May 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is going to monitor this pipeline and address any issues?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably include an explicit run of this pipeline within our daily build system we use for official releases, that way we're guaranteed to have visibility during official release flows

though at the end of the day it's going to be on us to deal with failures

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be enabled in the TME tenant and probably as a async nightly so that it doesn't interfere with "immediate" tests (PRs, etc)

variableGroup: ab-e2e-tme-rcv1p
# The RCV1P testing subscription does not have platform auto-injection enabled,
# so the E2E framework explicitly injects opt-in tags on each VMSS.
rcv1pTagsAutoInjected: "false"
Comment thread
rchincha marked this conversation as resolved.
6 changes: 6 additions & 0 deletions .pipelines/e2e-tme.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
name: $(Date:yyyyMMdd)$(Rev:.r)
parameters:
- name: subscriptionId
type: string
displayName: Subscription ID to use for E2E tests (empty = use variable group default)
default: ""
variables:
SKIP_E2E_TESTS: false

Expand All @@ -8,4 +13,5 @@ jobs:
name: Linux Tests
IgnoreScenariosWithMissingVhd: false
variableGroup: ab-e2e-tme
subscriptionId: ${{ parameters.subscriptionId }}

3 changes: 3 additions & 0 deletions .pipelines/scripts/e2e_run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ set -euo pipefail
az account set -s "${E2E_SUBSCRIPTION_ID}"
echo "Using subscription ${E2E_SUBSCRIPTION_ID} for e2e tests"

# Map E2E_SUBSCRIPTION_ID to SUBSCRIPTION_ID which the Go test framework reads
export SUBSCRIPTION_ID="${E2E_SUBSCRIPTION_ID}"

# Setup go
export GOPATH="$(go env GOPATH)"
go version
Expand Down
16 changes: 13 additions & 3 deletions .pipelines/templates/e2e-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,24 @@ parameters:
default: ab-e2e
- name: subscriptionId
type: string
displayName: Subscription ID to use for E2E tests
default: $(E2E_SUBSCRIPTION_ID)
displayName: Subscription ID to use for E2E tests (empty = use variable group default)
default: ""
- name: rcv1pTagsAutoInjected
type: string
displayName: Whether the platform auto-injects RCV1P opt-in tags on all VMSSes
default: "true"

jobs:
- job: e2e
condition: and(succeeded(), ne(variables.SKIP_E2E_TESTS, 'true'))
variables:
- group: ${{parameters.variableGroup}} # all variables prefixed with E2E_* come from this variable group
# When a caller (e.g. aks-rp orchestrator) explicitly passes subscriptionId,
# override E2E_SUBSCRIPTION_ID from the variable group so the run targets the
# requested subscription (e.g. RCV1P). When empty, keep the variable group default.
- ${{ if ne(parameters.subscriptionId, '') }}:
- name: E2E_SUBSCRIPTION_ID
value: ${{parameters.subscriptionId}}
pool:
name: $(E2E_POOL_NAME)
timeoutInMinutes: 90
Expand All @@ -41,13 +51,13 @@ jobs:
bash .pipelines/scripts/e2e_run.sh
displayName: Run AgentBaker E2E
env:
E2E_SUBSCRIPTION_ID: ${{parameters.subscriptionId}}
SYS_SSH_PUBLIC_KEY: $(SYS_SSH_PUBLIC_KEY)
SYS_SSH_PRIVATE_KEY_B64: $(SYS_SSH_PRIVATE_KEY_B64)
BUILD_SRC_DIR: $(System.DefaultWorkingDirectory)
DefaultWorkingDirectory: $(Build.SourcesDirectory)
VHD_BUILD_ID: $(VHD_BUILD_ID)
IGNORE_SCENARIOS_WITH_MISSING_VHD: ${{parameters.IgnoreScenariosWithMissingVhd}}
RCV1P_TAGS_AUTO_INJECTED: ${{parameters.rcv1pTagsAutoInjected}}

- task: PublishTestResults@2
displayName: Upload test results
Expand Down
7 changes: 6 additions & 1 deletion aks-node-controller/parser/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ func getFuncMap() template.FuncMap {
return template.FuncMap{
"getInitAKSCustomCloudFilepath": getInitAKSCustomCloudFilepath,
"getIsAksCustomCloud": getIsAksCustomCloud,
"getCloudLocation": getCloudLocation,
}
}

Expand Down Expand Up @@ -538,11 +539,15 @@ func getIsAksCustomCloud(customCloudConfig *aksnodeconfigv1.CustomCloudConfig) b
return strings.EqualFold(customCloudConfig.GetCustomCloudEnvName(), helpers.AksCustomCloudName)
}

func getCloudLocation(v *aksnodeconfigv1.Configuration) string {
return strings.ToLower(strings.Join(strings.Fields(v.GetClusterConfig().GetLocation()), ""))
}

/* GetCloudTargetEnv determines and returns whether the region is a sovereign cloud which
have their own data compliance regulations (China/Germany/USGov) or standard. */
// Azure public cloud.
func getCloudTargetEnv(v *aksnodeconfigv1.Configuration) string {
loc := strings.ToLower(strings.Join(strings.Fields(v.GetClusterConfig().GetLocation()), ""))
loc := getCloudLocation(v)
switch {
case strings.HasPrefix(loc, "china"):
return "AzureChinaCloud"
Expand Down
3 changes: 2 additions & 1 deletion aks-node-controller/parser/templates/cse_cmd.sh.gtpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
echo $(date),$(hostname) > ${PROVISION_OUTPUT};
{{if getIsAksCustomCloud .CustomCloudConfig}}
REPO_DEPOT_ENDPOINT="{{.CustomCloudConfig.RepoDepotEndpoint}}"
{{getInitAKSCustomCloudFilepath}} >> /var/log/azure/cluster-provision.log 2>&1;
{{end}}
LOCATION="{{getCloudLocation .}}"
Comment thread
rchincha marked this conversation as resolved.
Comment thread
rchincha marked this conversation as resolved.
{{getInitAKSCustomCloudFilepath}} >> /var/log/azure/cluster-provision.log 2>&1;
Comment thread
rchincha marked this conversation as resolved.
Comment thread
rchincha marked this conversation as resolved.

@cameronmeissner cameronmeissner May 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the name of this template func: maybe getInitCertificateTrustStoreFilepath or something - keeping the notion of "custom cloud" tied to this script at this point doesn't really make sense to me since we're running it everywhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning a follow up PR that cleans up references to "custom" after this PR lands. Also see my comment below. But ok either way.

/usr/bin/nohup /bin/bash -c "/bin/bash /opt/azure/containers/provision_start.sh"
16 changes: 16 additions & 0 deletions e2e/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,22 @@ func clusterCiliumNetwork(ctx context.Context, request ClusterRequest) (*Cluster
return prepareCluster(ctx, model, false, false)
}

var ClusterRCV1PKubenet = cachedFunc(clusterRCV1PKubenet)

// clusterRCV1PKubenet creates a kubenet cluster for RCV1P cert mode testing.
func clusterRCV1PKubenet(ctx context.Context, request ClusterRequest) (*Cluster, error) {
return prepareCluster(ctx, getKubenetClusterModel("abe2e-rcv1p-kubenet-v2", request.Location, request.K8sSystemPoolSKU), false, false)
}

var ClusterRCV1PAzureNetwork = cachedFunc(clusterRCV1PAzureNetwork)

// clusterRCV1PAzureNetwork creates an Azure CNI cluster for Windows RCV1P cert mode testing.
// Windows tests require Azure CNI (not kubenet) because baseTemplateWindows() configures the NBC for
// Azure CNI overlay mode.
func clusterRCV1PAzureNetwork(ctx context.Context, request ClusterRequest) (*Cluster, error) {
return prepareCluster(ctx, getAzureNetworkClusterModel("abe2e-rcv1p-azure-v2", request.Location, request.K8sSystemPoolSKU), false, false)
}

// isNotFoundErr checks if an error represents a "not found" response from Azure API
func isNotFoundErr(err error) bool {
var respErr *azcore.ResponseError
Expand Down
Loading
Loading