Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions Sample Compose Files/CHAOS-1506-repro/Dockerfile.shared
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# CHAOS-1506 reproduction.
#
# Two services in docker-compose.yml build this multi-stage Dockerfile in
# parallel. The shared `builder` stage produces `/install` content; the
# `runtime` stage then runs `COPY --from=builder /install /usr/local` —
# the trace pattern from the Linear issue.
#
# The builder stage is deliberately LIGHT (mkdir + random payload) so the
# 2 CPU / 2 GB default builder VM completes it in seconds. The original
# failure was triggered by a heavier Python pip-install workload but the
# COPY race is independent of how /install gets populated. Keeping it
# light makes the fixture deterministic on a stock apple/container setup.

FROM alpine:3.20 AS builder
RUN mkdir -p /install && \
head -c 1048576 /dev/urandom > /install/payload && \
echo "chaos-1506 builder stage complete" > /install/marker

FROM alpine:3.20 AS runtime
COPY --from=builder /install /usr/local
CMD ["cat", "/usr/local/marker"]
100 changes: 100 additions & 0 deletions Sample Compose Files/CHAOS-1506-repro/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# CHAOS-1506 reproduction sample

apple/container's buildkit-shim emits inconsistent platform identifiers
(`linux/arm64` ↔ `linux/arm64/v8`) across stages of a **single** build —
even when the client never passes `--arch` or `--platform`. Under load
this drift triggers the `COPY --from=builder /install /usr/local`
failure that originally surfaced in production (Linear CHAOS-1506).

## Primary repro — apple/container only (for upstream filing)

```sh
cd "Sample Compose Files/CHAOS-1506-repro"
./repro.sh 5
```

The script fires two `container build --no-cache` invocations in parallel
per iteration against the same Dockerfile. **No container-compose
involvement** — this is the form to attach to the apple/container issue
since the upstream maintainers can reproduce it in their own tool.

What you'll see (consistently, every iteration):

```
#6 [linux/arm64 builder 1/2] RUN mkdir -p /install ...
#7 [linux/arm64/v8 runtime 1/2] COPY --from=builder /install /usr/local
```

Same build. Two stages. Two different platform strings. Apple/container
is introducing the `v8` variant internally somewhere between argv parsing
(`BuildCommand.swift:305` — `Set<Platform>`) and stage execution.

Script exit codes:
- `0` — all iterations completed (drift observed but no COPY failure)
- `1` — at least one iteration failed (COPY error or non-zero build exit)

## Compose-driven repro (how the bug originally surfaced)

```sh
cd "Sample Compose Files/CHAOS-1506-repro"
container-compose up
```

Same upstream behavior, driven through `container-compose`'s parallel
image-prep fan-out (`runBoundedThrowingFanOut`). Useful as context for
how the bug originally surfaced in user workloads. **Not** the primary
upstream artifact — apple's maintainers prefer reproducing in their own
tooling.

This fixture also exercises two compose-spec parity fixes that landed
alongside CHAOS-1506:

- **CHAOS-1510** — `image:` alongside `build:` is now accepted (per
compose-spec, `image:` becomes the build tag).
- **CHAOS-1511** — auto-derived project names are lowercased (the
parent directory `CHAOS-1506-repro/` produces project name
`chaos-1506-repro` instead of failing apple/container's lowercase
network-ID validator).

## Why the COPY failure doesn't always fire

The platform-string drift is the **trigger**; the `COPY --from=builder`
race is the **symptom under load**. This lightweight fixture (mkdir + 1
MB random payload + alpine base) completes the builder stage too quickly
to widen the race window. The original Linear failure used a heavier
Python pip-install workload that produced large content-hash collisions
and saturated the builder VM.

To exercise the COPY race specifically, either:
- Use a heavier Dockerfile (pip-install, large `RUN` stages with multi-MB
output), or
- Bump the builder VM: `container builder stop && container builder start --cpus 4 --memory 8192`

The platform drift alone is sufficient for upstream filing; the race
the drift can trigger is documented in the original Linear trace.

## Existing client workaround

Until upstream apple/container fixes the buildkit-shim, container-compose
users hit by the bug can serialize image preparation at the fan-out
layer:

```sh
container-compose up --parallel 1
# or
COMPOSE_PARALLEL_LIMIT=1 container-compose up
```

This makes `compose up`'s image-prep phase single-flight, which avoids
two concurrent `BuildCommand` clients dialing the same buildkit
container. Pulls also serialize as a side effect — acceptable until
upstream lands.

## Upstream

- apple/container issue: https://github.com/apple/container/issues/1542
- Linear: https://linear.app/fullchaos/issue/CHAOS-1506

Once apple/container#1542 closes, the `--parallel 1` workaround
documented in `docs/guides/migration-from-docker-compose.md` can be
removed.
32 changes: 32 additions & 0 deletions Sample Compose Files/CHAOS-1506-repro/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# CHAOS-1506 reproduction.
#
# Two services build the same multi-stage Dockerfile in parallel. The trace
# shows apple/container's buildkit-shim emitting inconsistent platform
# identifiers (`linux/arm64` ↔ `linux/arm64/v8`) across stages of one build
# — the upstream signal motivating the issue. The original `COPY --from=builder`
# race that this signal can trigger is timing-dependent and does not always
# fire at this fixture's scale; see README.md for details.
#
# This fixture deliberately uses `image:` alongside `build:` (CHAOS-1510:
# accepted per compose-spec, with `image:` as the build tag) and a
# `CHAOS-1506-repro/` parent directory (CHAOS-1511: auto-derived project
# names are lowercased to `chaos-1506-repro` for apple/container's
# network-ID validator).
#
# Repro: `container-compose up`
# Existing workaround: `container-compose up --parallel 1`
# or `COMPOSE_PARALLEL_LIMIT=1 container-compose up`
# Upstream: apple/container issue (link pending).

services:
api:
build:
context: .
dockerfile: Dockerfile.shared
image: chaos-1506-api:latest

worker:
build:
context: .
dockerfile: Dockerfile.shared
image: chaos-1506-worker:latest
104 changes: 104 additions & 0 deletions Sample Compose Files/CHAOS-1506-repro/repro.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
#!/usr/bin/env bash
# CHAOS-1506: minimal apple/container-only reproduction.
#
# Fires two `container build` invocations in parallel against the SAME
# Dockerfile (same context, same content hash). This is the smallest
# input that surfaces apple/container's buildkit-shim platform-string
# drift (`linux/arm64` ↔ `linux/arm64/v8` across stages of one build)
# and — under heavier load than this fixture creates — the downstream
# `COPY --from=builder` race the Linear issue captured.
#
# No container-compose involvement: the upstream maintainers care about
# reproducing in their own tool, so this is the form to attach to the
# apple/container issue. `Sample Compose Files/CHAOS-1506-repro/docker-compose.yml`
# is preserved for context (how the bug originally surfaced in the wild)
# but is NOT the primary repro.
#
# Usage:
# ./repro.sh # one iteration
# ./repro.sh 5 # five iterations (race is intermittent)
# COUNT=5 ./repro.sh # same via env
#
# Exit:
# 0 — all iterations completed without a COPY/build error
# 1 — at least one iteration's build failed (preserve trace for filing)

set -u -o pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
DOCKERFILE="${SCRIPT_DIR}/Dockerfile.shared"
CONTEXT="${SCRIPT_DIR}"
TAG_A="chaos-1506-a:latest"
TAG_B="chaos-1506-b:latest"
COUNT="${1:-${COUNT:-1}}"

if [[ ! -f "${DOCKERFILE}" ]]; then
echo "missing Dockerfile: ${DOCKERFILE}" >&2
exit 2
fi

if ! command -v container >/dev/null 2>&1; then
echo "container CLI not on PATH" >&2
exit 2
fi

overall_status=0

for i in $(seq 1 "${COUNT}"); do
echo "=== RUN ${i}/${COUNT} ==="

# Remove prior images so each iteration is a true fresh build (no
# content-addressed cache short-circuit). Ignore errors — images may
# not exist on the first iteration.
container image rm "${TAG_A}" "${TAG_B}" >/dev/null 2>&1 || true

# Fire two parallel `container build` invocations against the same
# Dockerfile + context. Capture each build's stdout+stderr separately
# so the streams don't interleave inside one line.
container build --no-cache -t "${TAG_A}" -f "${DOCKERFILE}" "${CONTEXT}" \
> "/tmp/chaos-1506-a-${i}.log" 2>&1 &
PID_A=$!

container build --no-cache -t "${TAG_B}" -f "${DOCKERFILE}" "${CONTEXT}" \
> "/tmp/chaos-1506-b-${i}.log" 2>&1 &
PID_B=$!

wait "${PID_A}"; EXIT_A=$?
wait "${PID_B}"; EXIT_B=$?

echo "build A exit: ${EXIT_A} log: /tmp/chaos-1506-a-${i}.log"
echo "build B exit: ${EXIT_B} log: /tmp/chaos-1506-b-${i}.log"

# Merge both logs into a single per-iteration view for the trace
# archive. Tag each line with which build produced it.
{
echo "--- run ${i} build A (exit ${EXIT_A}) ---"
sed 's/^/[A] /' "/tmp/chaos-1506-a-${i}.log"
echo "--- run ${i} build B (exit ${EXIT_B}) ---"
sed 's/^/[B] /' "/tmp/chaos-1506-b-${i}.log"
} > "/tmp/chaos-1506-run-${i}.log"

# Grep the two logs for the specific COPY-from-builder error pattern
# the Linear issue reported, plus the broader platform-string drift
# signal we've already confirmed.
if grep -q "ERROR.*COPY --from=builder" "/tmp/chaos-1506-a-${i}.log" \
"/tmp/chaos-1506-b-${i}.log" 2>/dev/null; then
echo ">> COPY --from=builder error observed in run ${i}"
overall_status=1
fi

if [[ ${EXIT_A} -ne 0 || ${EXIT_B} -ne 0 ]]; then
echo ">> build failure in run ${i}"
overall_status=1
fi

if grep -q "linux/arm64/v8" "/tmp/chaos-1506-a-${i}.log" \
"/tmp/chaos-1506-b-${i}.log" 2>/dev/null; then
echo ">> platform-string drift (arm64/v8) observed in run ${i}"
fi
done

# Best-effort cleanup of test images so successive runs don't accumulate.
container image rm "${TAG_A}" "${TAG_B}" >/dev/null 2>&1 || true

exit "${overall_status}"
12 changes: 6 additions & 6 deletions Sources/Container-Compose/Codable Structs/DockerCompose.swift
Original file line number Diff line number Diff line change
Expand Up @@ -628,13 +628,13 @@ extension DockerCompose {
throw ComposeValidationError.noServicesDefined
}

// 2. Per-service checks: image/build conflict, image-or-build presence, ports, resources.
// 2. Per-service checks: image-or-build presence, ports, resources.
// CHAOS-1510: image+build coexistence is permitted per compose-spec —
// `image:` is the tag for the built image. The runtime side already
// honors this at Compose+BuildService.swift:68
// (`service.image ?? "\(serviceName):latest"`), reversing the prior
// CHAOS-1417/1442 decision that surfaced the combo as an error.
for (name, service) in concrete {
// image + build conflict: both present is ambiguous
if service.image != nil && service.build != nil {
throw ComposeValidationError.imageBuildConflict(serviceName: name)
}

// Must have at least one of image or build
if service.image == nil && service.build == nil {
throw ComposeValidationError.serviceNeedsImageOrBuild(serviceName: name)
Expand Down
8 changes: 0 additions & 8 deletions Sources/Container-Compose/Errors.swift
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,6 @@ public enum ComposeValidationError: Error, Equatable {
/// A resource-constraint field (e.g. `deploy.resources.limits.cpus`)
/// falls outside the allowed range.
case resourceConstraintOutOfRange(field: String, value: String, min: Int, max: Int?)

/// A service declares both `image` and `build`, which is ambiguous.
/// The Compose spec says `image` acts as the tag for the built image, but
/// having both is often a user mistake and is surfaced as an error so they
/// can make their intent explicit.
case imageBuildConflict(serviceName: String)
}

extension ComposeValidationError: LocalizedError {
Expand All @@ -111,8 +105,6 @@ extension ComposeValidationError: LocalizedError {
} else {
return "Resource constraint '\(field)' value '\(value)' must be ≥ \(min)."
}
case .imageBuildConflict(let name):
return "Service '\(name)' declares both 'image' and 'build'. Remove one or use 'image' only as the tag for the built image (set it alongside 'build.context')."
}
}
}
Expand Down
28 changes: 21 additions & 7 deletions Sources/Container-Compose/Helper Functions.swift
Original file line number Diff line number Diff line change
Expand Up @@ -560,17 +560,31 @@ public func effectiveContainerName(
return "\(projectName)-\(serviceName)"
}

/// Derives a project name from the current working directory. It replaces any '.' characters with
/// '_' to ensure compatibility with container naming conventions.
/// Derives a project name from the current working directory.
///
/// Per compose-spec (Naming): "Project names must contain only lowercase
/// letters, decimal digits, dashes, and underscores, and must begin with a
/// lowercase letter or decimal digit." `.` is replaced with `_` (the container
/// runtime forbids dots in resource names), and the result is lowercased so
/// directories like `CHAOS-1506-repro/` produce the spec-compliant
/// `chaos-1506-repro` rather than an uppercase form that apple/container's
/// network/container ID validators reject.
///
/// Note: CLI overrides (`-p`/`--project-name`) bypass this normalization —
/// explicit user input is honored as-is to preserve intent. See
/// `resolveProjectName` for precedence.
///
/// CHAOS-1511 — adds `.lowercased()` to align with docker compose's auto-derive
/// behavior. Conservative scope: dots → underscores + lowercase only; broader
/// regex sanitization (forbidden chars, leading-digit rule) can be a follow-up.
///
/// - Parameter cwd: The current working directory path.
/// - Returns: A sanitized project name suitable for container naming.
/// - Returns: A sanitized, lowercased project name suitable for container naming.
public func deriveProjectName(cwd: String) -> String {
// We need to replace '.' with _ because it is not supported in the container name
// We need to replace '.' with '_' because it is not supported in the container name.
let lastComponent = FilePath(cwd).lastComponent?.string ?? cwd
let projectName = lastComponent.replacingOccurrences(of: ".", with: "_")
return projectName
return lastComponent
.replacingOccurrences(of: ".", with: "_")
.lowercased()
}

/// Resolves the effective project name with `docker compose` precedence:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,25 +272,24 @@ struct ComposeParsingEdgeCaseTests {
#expect(app.build != nil)
}

@Test("Service with both image and build is rejected by validate() (CHAOS-1417, CHAOS-1442)")
func imageAndBuildRejectedByValidation() throws {
// Container-Compose deliberately rejects the image+build combination
// even though compose-spec permits it (with `image` acting as the tag
// for the built image). Rationale per Errors.swift:114 — this combo
// often masks user mistakes, so we surface it as an error to make
// intent explicit. Pinning tests for this contract live in
// ComposeValidationTests.swift:209-255 (CHAOS-1417 / PR #101); this
// test exists as cross-coverage in the parsing edge-case suite.
@Test("Service with both image and build is accepted by validate() (CHAOS-1510)")
func imageAndBuildAcceptedByValidation() throws {
// CHAOS-1510: image+build coexistence is permitted per compose-spec —
// `image:` acts as the tag for the built image. Reverses the prior
// CHAOS-1417/1442 contract. Canonical assertions in
// ComposeValidationTests.swift "image + build coexistence"; this is
// cross-coverage from the parsing edge-case suite.
let yaml = """
services:
app:
image: myapp:latest
build: ./app
"""
let dc = try YAMLDecoder().decode(DockerCompose.self, from: yaml)
#expect(throws: ComposeValidationError.imageBuildConflict(serviceName: "app")) {
try dc.validate()
}
#expect(throws: Never.self) { try dc.validate() }
let app = dc.services["app"]!!
#expect(app.image == "myapp:latest")
#expect(app.build != nil)
}

// MARK: - Variable interpolation edge cases
Expand Down
Loading
Loading