Skip to content

Gear for OpenShift CI#285

Open
Jakob-Naucke wants to merge 14 commits into
trusted-execution-clusters:mainfrom
Jakob-Naucke:openshift-ci
Open

Gear for OpenShift CI#285
Jakob-Naucke wants to merge 14 commits into
trusted-execution-clusters:mainfrom
Jakob-Naucke:openshift-ci

Conversation

@Jakob-Naucke

@Jakob-Naucke Jakob-Naucke commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary by Sourcery

Adapt the operator, tests, and tooling for running on OpenShift-based CI while modernizing Kubernetes waiting logic and simplifying dependencies.

New Features:

  • Expose operator services on OpenShift via LoadBalancer services with certificate-driven endpoints instead of Routes.
  • Update TrustedExecutionCluster instances dynamically with the externally reachable trustee address once services are exposed.

Bug Fixes:

  • Ensure Azure VMs in tests use cloud-init user-data instead of custom-data so configuration is reliably available.
  • Add RBAC permissions for all custom resource finalizers to avoid deletion failures on OpenShift clusters.
  • Fix default component image tag formatting by including the leading 'v' to match published images.

Enhancements:

  • Replace custom polling helpers in tests with kube-runtime await_condition and timeouts for more robust, race-free readiness checks.
  • Wait for CRD-backed resources to be garbage-collected rather than deleting them manually in tests, reducing flakiness.
  • Refactor operator component installation to fail fast with contextual errors and trigger reconcile requeues on partial installation.
  • Derive OpenShift service endpoints from LoadBalancer status instead of OpenShift Route and Ingress APIs, removing those dependencies.
  • Simplify Makefile tooling and Go module dependencies by dropping Kind and OpenShift-specific APIs and improving kopium version discovery.

Tests:

  • Revise integration tests to use condition-based waiting for deployments, custom resources, and ConfigMaps, improving stability under CI load.
  • Adjust test cleanup logic to verify absence of Machines, ApprovedImages, AttestationKeys, and related Secrets instead of force-deleting them.

@openshift-ci

openshift-ci Bot commented Jun 19, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jakob-Naucke

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai

sourcery-ai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Reviewer's Guide

Refactors test utilities and operator logic to better support OpenShift CI by switching from Route-based exposure to LoadBalancer services with certificate-driven endpoints, replaces custom polling logic with kube::runtime::wait::await_condition-based waits, tightens cleanup/resource lifecycle handling, and simplifies dependencies/RBAC for OpenShift and other platforms.

Sequence diagram for updated reconcile component installation flow

sequenceDiagram
    participant Reconciler as reconcile
    participant Cluster as TrustedExecutionCluster
    participant Installer as install_components
    participant Trustee as install_trustee_configuration
    participant RegServer as install_register_server
    participant AttestKeyReg as install_attestation_key_register

    Reconciler->>Installer: install_components(&Client, &TrustedExecutionCluster)
    activate Installer
    Installer->>Trustee: install_trustee_configuration(Client, &TrustedExecutionCluster)
    activate Trustee
    Trustee-->>Installer: Result<()> (propagated error on failure)
    deactivate Trustee
    Installer->>RegServer: install_register_server(Client, &TrustedExecutionCluster)
    activate RegServer
    RegServer-->>Installer: Result<()> (propagated error on failure)
    deactivate RegServer
    Installer->>AttestKeyReg: install_attestation_key_register(Client, &TrustedExecutionCluster)
    activate AttestKeyReg
    AttestKeyReg-->>Installer: Result<()> (propagated error on failure)
    deactivate AttestKeyReg

    alt all components installed successfully
        Installer-->>Reconciler: Ok(())
        Reconciler->>Reconciler: installed_condition(...)
        Reconciler-->>Reconciler: Action::await_change()
    else any installation error
        Installer-->>Reconciler: Err(e)
        Reconciler->>Reconciler: warn!("Installation of a component failed: {e:?}")
        Reconciler-->>Reconciler: Action::requeue(Duration::from_secs(60))
    end
    deactivate Installer
Loading

File-Level Changes

Change Details Files
Refactor K8s/OpenShift test utilities to use LoadBalancer services, certificate updates, and await_condition-based readiness checks instead of Routes and custom pollers.
  • Extend K8sPlatform::expose signature to include deployment and certificate name and update platform impls accordingly
  • Implement OpenShift::get_url helper and use LoadBalancer Service status instead of Routes/Ingress CRDs for external addresses
  • Patch OpenShift Services to type LoadBalancer, wait for ingress readiness via await_condition+timeout, reissue Certificates based on hostname/IP, and restart affected Deployments
  • Adjust get_cluster_url for OpenShift to derive host/IP from Service status, returning empty string when service is not yet created
  • Change CR manifest application to patch the existing TrustedExecutionCluster resource with publicTrusteeAddr via server-side merge after services are exposed
  • Replace custom deployment readiness Poller with await_condition on Deployment Available condition for all operator components
  • Use wait_for_resource_created for image-pcrs ConfigMap creation and simplify its usage in tests
test_utils/src/lib.rs
Improve test cleanup and resource lifecycle handling, waiting for finalizer-managed resources to disappear instead of deleting them explicitly.
  • Replace explicit Machine deletion helper with generic check_no_resources function that verifies absence of AttestationKey, ApprovedImage, and Machine instances before namespace cleanup
  • Update delete_trusted_execution_cluster and cleanup_namespace to use await_condition-based wait_for_resource_deleted with Duration timeouts
  • Adapt tests to the new wait_for_resource_deleted signature and semantics across multiple resource types
test_utils/src/lib.rs
tests/trusted_execution_cluster.rs
tests/attestation.rs
Convert ad-hoc Poller-based waits in tests and KubeVirt backend to await_condition+timeout, reducing custom logic and improving robustness.
  • Introduce small predicate helpers (e.g., ak_approved, populated, chk_removed) and use await_condition+timeout for AttestationKey approval, ConfigMap content updates, ApprovedImage status, and Secret ownership
  • Update KubeVirt backend wait_for_running to rely on await_condition against VirtualMachine.printable_status
  • Simplify image disallow/readoption and other lifecycle tests by replacing manual polling loops with await_condition predicates
tests/trusted_execution_cluster.rs
test_utils/src/virt/kubevirt.rs
Harden operator reconcile component installation and make image versioning consistent for OpenShift deployment.
  • Change COMPONENT_VERSION to include the leading 'v' to match Trustee image tagging convention
  • Factor component installation into install_components and have reconcile requeue on any installation error instead of only logging
  • Update install_trustee_configuration, install_register_server, and install_attestation_key_register to use anyhow::Context for errors and to fail-fast instead of swallowing errors
  • Extend unit test fake HTTP handler to return appropriate stubbed objects (ConfigMap, Service, Deployment) for the different component installations
operator/src/main.rs
Adjust Azure CI VM provisioning to use user-data and clean up Kind/OpenShift-specific build and dependency configuration.
  • Switch Azure VM creation from --custom-data to --user-data to ensure data is available at fetch stage
  • Remove Kind host image and Kind/OpenShift Route/Ingress generation bits from Makefile and Go tooling
  • Derive KOPIUM_VERSION via cargo metadata/jq instead of grepping Cargo.toml
  • Relax OPERATOR_IMAGE definition for override, and drop unused OpenShift/Kind-related Go dependencies (openshift/api, kind, related indirect deps) from go.mod/go.sum
test_utils/src/virt/azure.rs
Makefile
go.mod
go.sum
tools.go
Update RBAC and Rust/OpenShift library surface to match new finalizer usage and removal of Route/Ingress dependencies.
  • Expand kubebuilder RBAC annotations to allow update on finalizers for TrustedExecutionCluster, Machine, AttestationKey, and ApprovedImage resources
  • Remove Route/Ingress kopium modules and associated re-exports since OpenShift exposure no longer relies on these CRDs
  • Drop stale comment about Kind node version tracking in Cargo.toml
api/v1alpha1/crds.go
lib/src/kopium.rs
lib/src/lib.rs
Cargo.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In OpenShift::get_url the chain of unwrap() calls on status, load_balancer, and ingress will panic if the Service exists but those fields are not yet populated; consider handling these as Options (and possibly reusing the same readiness condition used in expose) to fail gracefully instead of crashing tests.
  • The new OpenShift::expose implementation uses fixed 60s timeouts for both LoadBalancer ingress readiness and certificate reissuance, whereas other helpers use scaled_duration; you might want to align these timeouts with the scaling logic to reduce spurious failures on slower CI clusters.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `OpenShift::get_url` the chain of `unwrap()` calls on `status`, `load_balancer`, and `ingress` will panic if the Service exists but those fields are not yet populated; consider handling these as `Option`s (and possibly reusing the same readiness condition used in `expose`) to fail gracefully instead of crashing tests.
- The new `OpenShift::expose` implementation uses fixed 60s timeouts for both LoadBalancer ingress readiness and certificate reissuance, whereas other helpers use `scaled_duration`; you might want to align these timeouts with the scaling logic to reduce spurious failures on slower CI clusters.

## Individual Comments

### Comment 1
<location path="Makefile" line_range="19" />
<code_context>
 YQ_VERSION ?= $(shell go list -m -f '{{.Version}}' github.com/mikefarah/yq/v4)
 YQ ?= $(LOCALBIN)/yq-$(YQ_VERSION)
-KOPIUM_VERSION ?= $(shell grep kopium lib/Cargo.toml | sed -E 's/.*"(.*)"/\1/')
+KOPIUM_VERSION ?= $(shell cargo metadata --format-version 1 | jq -r '.resolve.nodes[] | select(.deps[]?.name == "kopium") | .deps[] | select(.name == "kopium") | .pkg | split("@")[1]')
 KOPIUM ?= $(LOCALBIN)/kopium-$(KOPIUM_VERSION)

</code_context>
<issue_to_address>
**suggestion:** The new KOPIUM_VERSION detection adds a hard dependency on `jq` and a relatively complex `cargo metadata` query.

This approach is more robust than grepping `Cargo.toml`, but it now relies on `jq` being available everywhere the Makefile runs and on the stability of the `cargo metadata` layout. If that’s not guaranteed (e.g., minimal CI images, some dev setups), please either document/enforce the `jq` requirement with a clear failure message, or simplify how the version is resolved (e.g., restrict the query scope or move the logic into a small helper script/binary) to reduce fragility.

Suggested implementation:

```
JQ ?= jq

ifndef KOPIUM_VERSION
ifeq (, $(shell command -v $(JQ) 2>/dev/null))
$(error jq is required to auto-detect KOPIUM_VERSION; install jq or set KOPIUM_VERSION explicitly, e.g. 'make KOPIUM_VERSION=...')
endif
endif

KOPIUM_VERSION ?= $(shell cargo metadata --format-version 1 | $(JQ) -r '.resolve.nodes[] | select(.deps[]?.name == "kopium") | .deps[] | select(.name == "kopium") | .pkg | split("@")[1]')
KOPIUM ?= $(LOCALBIN)/kopium-$(KOPIUM_VERSION)

```

If the Makefile already defines a `JQ` variable elsewhere (not visible in the provided snippet), you should remove one of the definitions or consolidate them to avoid redefinition. Otherwise, this change will:
- Fail early with a clear error if `jq` is missing and `KOPIUM_VERSION` is not provided.
- Still allow CI/devs to bypass the `jq` requirement by explicitly supplying `KOPIUM_VERSION` on the command line or in the environment.
</issue_to_address>

### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="301-303" />
<code_context>

-    // Poll for the AttestationKey to be approved, have owner reference, and have a Secret created
+    // Timeout for the AttestationKey to be approved, have owner reference, and have a Secret created
+    let approved = await_condition(attestation_keys.clone(), &ak_name, ak_approved);
+    let ctx = format!("waiting for AttestationKey {ak_name} to be approved");
+    timeout(Duration::from_secs(30), approved).await.context(ctx)??;
+    let chk_machine_owner = |ak: Option<&AttestationKey>| {
+        let chk_owner = |owner: &OwnerReference| owner.kind == "Machine" && owner.name == machine_name;
</code_context>
<issue_to_address>
**suggestion (testing):** Use scaled_duration for timeouts to keep tests robust across slower environments

This and a few other tests in this file hard-code `Duration::from_secs(30)` instead of using `scaled_duration(..)`. That can make tests flaky on slower CI or busy clusters, particularly for controller-driven operations. Please switch to `timeout(scaled_duration(30), ...)` here for consistency and to respect the global test timeout scaling knob.

Suggested implementation:

```rust
    // Timeout for the AttestationKey to be approved, have owner reference, and have a Secret created
    let approved = await_condition(attestation_keys.clone(), &ak_name, ak_approved);
    let ctx = format!("waiting for AttestationKey {ak_name} to be approved");
    timeout(scaled_duration(30), approved).await.context(ctx)??;
    let chk_machine_owner = |ak: Option<&AttestationKey>| {

```

```rust
    let has_machine_owner = await_condition(attestation_keys.clone(), &ak_name, chk_machine_owner);
    let ctx = format!("waiting for AttestationKey {ak_name} to be owned by Machine {machine_name}");
    timeout(scaled_duration(30), has_machine_owner).await.context(ctx)??;
    let secrets_api: Api<Secret> = Api::namespaced(client.clone(), namespace);

```
</issue_to_address>

### Comment 3
<location path="tests/trusted_execution_cluster.rs" line_range="106-107" />
<code_context>

         // Wait until it disappears
-        wait_for_resource_deleted(&api, TEC_NAME, scaled_timeout(120), 5).await?;
+        wait_for_resource_deleted(&api, TEC_NAME, 120).await?;

         let deployments_api: Api<Deployment> = Api::namespaced(client.clone(), namespace);
</code_context>
<issue_to_address>
**suggestion:** Consider using scaled_timeout for resource deletion waits instead of fixed seconds

This changes the deletion wait from `scaled_timeout(120)` to a fixed `120`, while other calls here still use `scaled_timeout(..)`. To stay consistent and reduce flakiness on slower clusters, pass a scaled value (e.g. `scaled_timeout(120)`) into `wait_for_resource_deleted` instead of a hard-coded timeout.

```suggestion
        // Wait until it disappears
        wait_for_resource_deleted(&api, TEC_NAME, scaled_timeout(120)).await?;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread Makefile
YQ_VERSION ?= $(shell go list -m -f '{{.Version}}' github.com/mikefarah/yq/v4)
YQ ?= $(LOCALBIN)/yq-$(YQ_VERSION)
KOPIUM_VERSION ?= $(shell grep kopium lib/Cargo.toml | sed -E 's/.*"(.*)"/\1/')
KOPIUM_VERSION ?= $(shell cargo metadata --format-version 1 | jq -r '.resolve.nodes[] | select(.deps[]?.name == "kopium") | .deps[] | select(.name == "kopium") | .pkg | split("@")[1]')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The new KOPIUM_VERSION detection adds a hard dependency on jq and a relatively complex cargo metadata query.

This approach is more robust than grepping Cargo.toml, but it now relies on jq being available everywhere the Makefile runs and on the stability of the cargo metadata layout. If that’s not guaranteed (e.g., minimal CI images, some dev setups), please either document/enforce the jq requirement with a clear failure message, or simplify how the version is resolved (e.g., restrict the query scope or move the logic into a small helper script/binary) to reduce fragility.

Suggested implementation:

JQ ?= jq

ifndef KOPIUM_VERSION
ifeq (, $(shell command -v $(JQ) 2>/dev/null))
$(error jq is required to auto-detect KOPIUM_VERSION; install jq or set KOPIUM_VERSION explicitly, e.g. 'make KOPIUM_VERSION=...')
endif
endif

KOPIUM_VERSION ?= $(shell cargo metadata --format-version 1 | $(JQ) -r '.resolve.nodes[] | select(.deps[]?.name == "kopium") | .deps[] | select(.name == "kopium") | .pkg | split("@")[1]')
KOPIUM ?= $(LOCALBIN)/kopium-$(KOPIUM_VERSION)

If the Makefile already defines a JQ variable elsewhere (not visible in the provided snippet), you should remove one of the definitions or consolidate them to avoid redefinition. Otherwise, this change will:

  • Fail early with a clear error if jq is missing and KOPIUM_VERSION is not provided.
  • Still allow CI/devs to bypass the jq requirement by explicitly supplying KOPIUM_VERSION on the command line or in the environment.

Comment thread tests/trusted_execution_cluster.rs Outdated
Comment thread tests/trusted_execution_cluster.rs Outdated
@Jakob-Naucke Jakob-Naucke force-pushed the openshift-ci branch 3 times, most recently from 6aebf10 to 44865dd Compare June 19, 2026 17:11
Comment thread operator/src/main.rs
const TRUSTEE_VERSION: &str = "v0.17.0";
/// Default version tag for operator-managed component images
const COMPONENT_VERSION: &str = "0.2.0";
const COMPONENT_VERSION: &str = "v0.2.0";

@alicefr alicefr Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but the versioning is actually wrong since we already have the release 0.2.1. This should be latest and we should set dynmically when we do the release. Filed #286

Comment thread test_utils/src/lib.rs
Comment thread test_utils/src/lib.rs
Comment on lines 307 to 314
async fn expose(
&self,
_: &Client,
client: &Client,
namespace: &str,
service: &str,
deployment: &str,
cert_name: &str,
_: &str,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we simplify the signature of this function? parameter seems a lot to me

Comment thread test_utils/src/lib.rs Outdated
let info = format!("Waiting for deployment {depl} to be ready");
test_info!(&self.test_name, "{info}");
let done = await_condition(depls.clone(), depl, depl_ready);
timeout(scaled_duration(300), done).await??;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
timeout(scaled_duration(300), done).await??;
timeout(scaled_duration(300), done).await.context(ctx)??;

Comment thread operator/src/main.rs Outdated
// warn with `:?` to also get context
warn!(
"Installation of a component failed: {e:?}\n \
This can occur due to owner references to incompletely propagated CRDs. Requeueing..."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it is another type of error? You are returning an error also on creation for example

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended for on creation. Do you mean that we should have a maximum failure and installation failed status?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just if the only cause for the error can be the owner reference or maybe something like create the one of the object because it already exists

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for "if exists", we have create_or_info_if_exists, but there are more possible errors here. ideally, the second line of the warning would only be present in the specific case I saw this, but then you're getting into error parsing.

@alicefr alicefr Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just keep it generic and just print the error

@Jakob-Naucke Jakob-Naucke force-pushed the openshift-ci branch 2 times, most recently from 2271a31 to ae237e7 Compare June 22, 2026 15:01
Kind CI, if we make one, will not run on Azure.

This partially reverts commit cc6bf85.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
This reverts commit 2e4f6d5.

CI requires cargo anyhow.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
should be v0.2.0, not 0.2.0

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
following trusted-execution-clusters#266

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
also works for deployments that don't exist yet

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
does not need fields to come or parameters

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
to avoid too many parameters

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Patch services to LoadBalancer ones to expose on OpenShift. Patch
certificates once address is known. Depending on platform, ingress may
be an IP address and thus entirely unpredictable. Therefore, patch
TrustedExecutionCluster with Trustee address after service creation.

This also reverts commit f2ad17b.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
following trusted-execution-clusters#246

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
which can also be set for `make integration-tests`.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Custom data is sometimes not yet available at Ignition fetch
stage, i.e. fetch skips our merge even though other parts of config
are available. Use user-data (IMDS) instead.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
which can occur because creation of resources with owner refereces to
resources of incompletely propagated CRDs can fail

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
PR trusted-execution-clusters#278 correctly recognized that Machines sometimes stayed behind
when deleting the namespace immediately after TEC. This extends to
ApprovedImages and AttestationKeys, but because all these
resources are owned, wait for their removal before namespace removal
instead of deleting them.

This also reverts commit ed64522.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
important especially for OpenShift exposure, but can matter on all
platforms

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
Comment thread test_utils/src/lib.rs
};
let ingress_ready = await_condition(services, service, has_ingress);
let ctx = format!("waiting for ingress on {service} to be ready");
let duration = scaled_duration(60);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly are we estimating the multiplier scale env and scale timeout duration across all files? Is it a ballpark estimate or are we using some heuristic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not backed by measurements or anything; they're set based on experience and can be increased (if we see a failure being an issue of merely a timeout that's too short) or decreased (if a test frequently takes more time to fail than it would need to).

Similar story with the multipliers, just a tradeoff of how fast is your environment/how long are you willing to wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants