This Azure Function provides a serverless solution to automatically synchronize files from an Azure Blob Storage container to an AWS S3 bucket. It is triggered by blob creation events via Azure Event Grid for efficient, near real-time replication.
The primary goal of this project is to create a reliable, event-driven pipeline for copying data from Azure to AWS S3. It's designed with security and cloud-native principles in mind.
- Event-Driven: Uses Azure Event Grid for low-latency triggers, ensuring that files are copied shortly after they are created.
- Secure by Design: Natively supports using a Managed Identity to connect to Azure Blob Storage, eliminating the need to store storage account keys in application settings.
- Flexible Pathing: Includes a configuration option to either keep the full Azure blob path or strip the source container name from the destination S3 object key.
- Cloud Native: Built on the Azure Functions serverless platform for scalability and cost-efficiency.
- Azure Subscription
- AWS Account with an S3 bucket
- Azure CLI
- Azure Functions Core Tools v4
- Python 3.9+
- An Azure Storage Account with a container (e.g.,
upload). - An AWS S3 Bucket.
Follow these steps to run and debug the function on your local machine.
-
Clone the repository:
git clone <your-repo-url> cd BlobToS3
-
Set up a Python virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure local settings: Create a
local.settings.jsonfile in the project root. This file is ignored by git and should contain your local secrets.{ "IsEncrypted": false, "Values": { "AzureWebJobsStorage": "UseDevelopmentStorage=true", "FUNCTIONS_WORKER_RUNTIME": "python", "AzureBlobConnection": "DefaultEndpointsProtocol=https;AccountName=yourstorageaccount;AccountKey=YOUR_ACCOUNT_KEY;EndpointSuffix=core.windows.net", "CONTAINER_NAME": "upload", "AWS_ACCESS_KEY_ID": "YOUR_AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY": "YOUR_AWS_SECRET_ACCESS_KEY", "AWS_S3_REGION": "us-east-1", "AWS_S3_BUCKET_NAME": "your-s3-bucket-name", "STRIP_CONTAINER_PREFIX": "true" } } -
Run the function locally:
func start
This setup uses Managed Identity for a secure, passwordless connection to Azure Blob Storage and Azure Key Vault for storing AWS secrets.
-
Enable Managed Identity: In your deployed Azure Function App, navigate to Settings -> Identity and turn the System-assigned Managed Identity On.
-
Grant Storage Permissions: Navigate to your source Azure Storage Account, go to Access control (IAM), and assign the following roles to the Managed Identity of your Function App.
Storage Blob Data Reader: Allows the function to read the content of the created blob.Storage Queue Data Message Processor: Allows the function to manage messages in the poison queue, which is used by the Event Grid trigger for reliability.
-
Store AWS Secrets in Key Vault: Create an Azure Key Vault and add your AWS credentials as secrets (e.g.,
aws-access-key-id,aws-secret-access-key). Grant your Function App's Managed IdentityGetandListsecret permissions on the Key Vault's Access Policies. -
Configure Function App Settings: In your Function App, go to Settings -> Configuration. Add the following Application Settings.
For the Blob Trigger Connection (using Managed Identity): The function code refers to a connection named
AzureBlobConnection. In Azure, you define its properties by using a double-underscore__separator.Name Value AzureBlobConnection__accountNameyour-storage-account-nameAzureBlobConnection__credentialmanagedidentityAzureBlobConnection__blobServiceUrihttps://your-storage-account-name.blob.core.windows.netAzureBlobConnection__queueServiceUrihttps://your-storage-account-name.queue.core.windows.netFor the AWS and Function Logic: Use Key Vault references to securely load your AWS secrets.
Name Value (Example) CONTAINER_NAMEuploadAWS_S3_BUCKET_NAMEyour-s3-bucket-nameAWS_S3_REGIONus-east-1AWS_ACCESS_KEY_ID@Microsoft.KeyVault(SecretUri=https://your-vault.vault.azure.net/secrets/aws-access-key-id)AWS_SECRET_ACCESS_KEY@Microsoft.KeyVault(SecretUri=https://your-vault.vault.azure.net/secrets/aws-secret-access-key)STRIP_CONTAINER_PREFIXtrue(orfalse)
| Environment Variable | Description | Default |
|---|---|---|
CONTAINER_NAME |
The name of the source Azure Blob Storage container that the function will monitor for new blobs. | N/A |
AzureBlobConnection |
(Local only) The full connection string to the source Azure Storage Account. In Azure, this is configured via __ properties. |
N/A |
AWS_ACCESS_KEY_ID |
The access key for your AWS IAM user. | N/A |
AWS_SECRET_ACCESS_KEY |
The secret key for your AWS IAM user. | N/A |
AWS_S3_REGION |
The AWS region of your S3 bucket (e.g., us-east-1). |
N/A |
AWS_S3_BUCKET_NAME |
The name of the destination S3 bucket. | N/A |
STRIP_CONTAINER_PREFIX |
Set to "true" to remove the container name from the S3 object key. For a blob at upload/folder/file.txt, the S3 key becomes folder/file.txt. |
"false" |