CloudTranscode is bFAN's distributed media transcoding pipeline. It's a set of PHP-based activity workers that poll AWS Step Functions for transcoding jobs, then execute FFmpeg (for video) or ImageMagick (for images) to transcode media files and upload results to S3. The architecture allows horizontal scaling by running multiple workers in ECS containers.
- Language: PHP 7+ (legacy codebase, but clean)
- Container: Docker (ECS deployment)
- FFmpeg: 4.2 (video/image processing) — EOL, from 2019. See Security Findings.
- ImageMagick: convert commands for image transcoding
- AWS Services: Step Functions (SFN), S3, ECS, EC2, IAM
- Orchestration: AWS Step Functions state machines (see
state_machines/) - SDK: CloudProcessingEngine-SDK (bFAN fork) for activity polling and lifecycle
- Dependencies: AWS SDK for PHP 3.x, JSON Schema validation
- Monitoring: None configured — no CloudWatch alarms, dashboards, or custom metrics. See Security Findings.
# Setup
make # Installs composer dependencies
# Run activities locally (requires AWS credentials and SFN ARNs)
./src/activities/ValidateAssetActivity.php -A arn:aws:states:REGION:ACCOUNT:activity:ValidateAsset
./src/activities/TranscodeAssetActivity.php -A arn:aws:states:REGION:ACCOUNT:activity:TranscodeAsset
# Run in Docker (recommended)
docker build -t cloudtranscode:local .
docker run cloudtranscode:local ValidateAssetActivity -A <arn>
docker run cloudtranscode:local TranscodeAssetActivity -A <arn>
# Run tests
<!-- Ask: Does this repo have tests? If so, what command runs them? -->src/activities/— Activity workers (ValidateAssetActivity, TranscodeAssetActivity, BasicActivity base class)src/activities/transcoders/— Transcoder implementations (video, image, thumbnail)src/scripts/— Utility scriptssrc/utils/— Helper classesstate_machines/— AWS Step Functions state machine JSON definitionsinput_samples/— Example JSON input payloads for testing workflowspresets/— FFmpeg preset configurations (may be deprecated; check CloudTranscode-FFMpeg-presets repo)benchmark/— FFmpeg performance benchmarks on AWS EC2 instancesDockerfile— Base image for ECS workersbootstrap.sh— Docker entrypoint scriptMakefile— Composer dependency installation
Internal:
- CloudProcessingEngine-SDK (bFAN fork) — activity polling, client interface callbacks, lifecycle management
External:
- AWS S3 — input/output media storage
- AWS Step Functions — task orchestration and distribution
- FFmpeg 4.2 — video/audio/image transcoding (bundled in Docker base image)
- ImageMagick — image manipulation (bundled in Docker base image)
Docker base images:
sportarc/ffmpeg:4.2— FFmpeg binariessportarc/cloudtranscode-base:4.2— PHP + FFmpeg + ImageMagick base
Input: JSON payloads posted to AWS Step Functions (see input_samples/ for examples). Structure:
input_asset— source file (S3 bucket, key, type)output_assets[]— array of desired outputs (type, bucket, path, codec/size/preset, watermark, etc.)
Output: JSON result returned from Step Functions to client app. Includes transcoded file S3 locations, metadata, errors.
Client Integration: Implement CpeClientInterface.php from CloudProcessingEngine-SDK to receive callbacks:
onStart— workflow initiatedonHeartbeat— worker is aliveonFail— transcoding failedonSuccess— workflow completedonTranscodeDone— one output asset completed
Pass custom client class to activity workers via -C <client class path> option. For Docker, extend the base image and copy client classes into it.
- Step Functions orchestration: Workflow is defined in
state_machines/JSON. SFN distributes tasks to activity workers, handles retries and failure routing. This is the control plane — workers are the data plane. - Activity polling: Workers use long-polling to fetch tasks from AWS SFN
- Sequential output processing: One TranscodeAssetActivity worker processes all outputs in the
output_assetsarray sequentially, not in parallel. This is a performance bottleneck — a 10-output job ties up one worker for the full duration. To parallelize, split the workflow into separate SFN executions or use Map states. - Stateless workers: Workers are horizontally scalable Docker containers. State lives in S3 and SFN.
- Preset-based transcoding: FFmpeg commands can be templated using presets (e.g.,
360p-4.3-generic) - Custom FFmpeg commands: JSON input supports raw FFmpeg command strings for advanced use cases. WARNING: command injection risk — see Security Findings.
- Watermarking: Overlay images on video with custom position, opacity, size
- HTTP input: Workers can pull source files from HTTP/S URLs instead of S3
Required AWS credentials (IAM role or env vars):
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION
Required IAM permissions:
- Step Functions:
states:GetActivityTask,states:SendTaskSuccess,states:SendTaskFailure,states:SendTaskHeartbeat - S3:
s3:GetObject,s3:PutObject,s3:PutObjectAclon input/output buckets
Runtime: PHP 7+, FFmpeg 4.2, ImageMagick (all bundled in Docker image)
Current setup:
- Docker image built from
Dockerfileand pushed to ECR (eu-west-1) - ECS cluster runs workers as tasks
- Each worker polls a specific SFN activity ARN
- Note: AWS account ID
501431420968is hardcoded in the Dockerfile/configs. Use an environment variable or SSM parameter instead.
Deployment steps:
- Build Docker image:
docker build -t <ecr-repo>:tag . - Push to ECR
- Update ECS task definition with new image tag
- Deploy new ECS service revision
Manual testing:
- Use
input_samples/JSON files to initiate test workflows via AWS SDK - Monitor Step Functions console for workflow execution
- Check S3 output buckets for transcoded files
- Review CloudWatch Logs for worker output
AI audit — 2026-02-17. These findings should be tracked as issues and resolved.
The transcoder code passes user-supplied JSON parameters (codec, size, preset names, custom command strings) into FFmpeg and ImageMagick shell commands without escaping or sanitization. A crafted output_assets payload could inject arbitrary shell commands.
Affected files: src/activities/transcoders/ — anywhere parameters are interpolated into shell commands.
Fix: Use escapeshellarg() on every user-supplied parameter before interpolation. Better: build argument arrays and use proc_open() instead of exec()/shell_exec() with string concatenation. Validate inputs against an allowlist of known codecs, presets, and sizes.
There is no throttling on Step Functions task submission. A misconfigured client or runaway automation can flood the pipeline with jobs, exhausting ECS capacity and S3 write throughput.
Fix: Add SFN execution concurrency limits, or use an SQS queue with a controlled consumer rate in front of the pipeline.
AWS account ID 501431420968 appears in ECR URIs and potentially in SFN ARNs throughout the codebase. This leaks infrastructure details and makes multi-account deployment impossible.
Fix: Replace with environment variables, SSM parameters, or CDK/CloudFormation references.
FFmpeg 4.2 is from August 2019 and no longer receives security patches. Known CVEs in older FFmpeg versions include heap overflows in demuxers and decoders that can be triggered by malformed input media.
Fix: Upgrade the sportarc/ffmpeg and sportarc/cloudtranscode-base Docker images to FFmpeg 6.x or 7.x. Test transcoding presets for compatibility.
Transcoding temp files (downloaded source, intermediate outputs) are stored on local ECS disk unencrypted. If the disk is an EBS volume, data at rest is exposed unless the volume itself has encryption enabled.
Fix: Ensure ECS instances use encrypted EBS volumes. For sensitive media, consider encrypting temp files at the application level or using instance store with dm-crypt.
No CloudWatch alarms, custom metrics, or dashboards are configured. Worker failures, SFN execution errors, and S3 throughput issues are invisible without manual console checks.
Fix: Add CloudWatch alarms for SFN execution failures, ECS task stopped events, and worker heartbeat gaps. Publish custom metrics for transcode duration, queue depth, and error rates.
- Sequential processing bottleneck: TranscodeAssetActivity processes all outputs sequentially. A single job with many outputs blocks the worker. Split into parallel SFN branches or use Map states.
- Docker base image dependency: This repo depends on two SportArchive Docker images (
sportarc/ffmpeg,sportarc/cloudtranscode-base). If those images are updated, rebuild this image. - FFmpeg version: Locked to 4.2 (2019, EOL). Upgrading FFmpeg requires updating the base image and retesting all presets.
- Client interface requirement: For production use, you MUST implement a custom client interface class and extend the Dockerfile to include it. Without it, workers run but don't notify client apps of progress/completion.
- AWS SFN long polling: Workers block on GetActivityTask calls (long polling). If AWS SFN is unavailable, workers will hang until timeout.
- Temp disk space: Transcoding uses local disk for temporary files. Ensure ECS instances or Docker volumes have sufficient space for large video files. Temp files are not encrypted at the application level.
- Presets location: The
presets/directory in this repo may be deprecated. Check if CloudTranscode-FFMpeg-presets is the canonical source. - Hardcoded account ID:
501431420968is baked into ECR URIs and possibly SFN ARNs. Must be parameterized for multi-account use.