Skip to content

Ufoton/BlobToS3

Repository files navigation

Azure Blob to AWS S3 Synchronizer

This Azure Function provides a serverless solution to automatically synchronize files from an Azure Blob Storage container to an AWS S3 bucket. It is triggered by blob creation events via Azure Event Grid for efficient, near real-time replication.

Overview

The primary goal of this project is to create a reliable, event-driven pipeline for copying data from Azure to AWS S3. It's designed with security and cloud-native principles in mind.

Features

  • Event-Driven: Uses Azure Event Grid for low-latency triggers, ensuring that files are copied shortly after they are created.
  • Secure by Design: Natively supports using a Managed Identity to connect to Azure Blob Storage, eliminating the need to store storage account keys in application settings.
  • Flexible Pathing: Includes a configuration option to either keep the full Azure blob path or strip the source container name from the destination S3 object key.
  • Cloud Native: Built on the Azure Functions serverless platform for scalability and cost-efficiency.

Setup and Deployment

Prerequisites

  • Azure Subscription
  • AWS Account with an S3 bucket
  • Azure CLI
  • Azure Functions Core Tools v4
  • Python 3.9+
  • An Azure Storage Account with a container (e.g., upload).
  • An AWS S3 Bucket.

1. Local Development Setup

Follow these steps to run and debug the function on your local machine.

  1. Clone the repository:

    git clone <your-repo-url>
    cd BlobToS3
  2. Set up a Python virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure local settings: Create a local.settings.json file in the project root. This file is ignored by git and should contain your local secrets.

    {
      "IsEncrypted": false,
      "Values": {
        "AzureWebJobsStorage": "UseDevelopmentStorage=true",
        "FUNCTIONS_WORKER_RUNTIME": "python",
        "AzureBlobConnection": "DefaultEndpointsProtocol=https;AccountName=yourstorageaccount;AccountKey=YOUR_ACCOUNT_KEY;EndpointSuffix=core.windows.net",
        "CONTAINER_NAME": "upload",
        "AWS_ACCESS_KEY_ID": "YOUR_AWS_ACCESS_KEY_ID",
        "AWS_SECRET_ACCESS_KEY": "YOUR_AWS_SECRET_ACCESS_KEY",
        "AWS_S3_REGION": "us-east-1",
        "AWS_S3_BUCKET_NAME": "your-s3-bucket-name",
        "STRIP_CONTAINER_PREFIX": "true"
      }
    }
  5. Run the function locally:

    func start

2. Azure Deployment (Recommended Secure Setup)

This setup uses Managed Identity for a secure, passwordless connection to Azure Blob Storage and Azure Key Vault for storing AWS secrets.

  1. Enable Managed Identity: In your deployed Azure Function App, navigate to Settings -> Identity and turn the System-assigned Managed Identity On.

  2. Grant Storage Permissions: Navigate to your source Azure Storage Account, go to Access control (IAM), and assign the following roles to the Managed Identity of your Function App.

    • Storage Blob Data Reader: Allows the function to read the content of the created blob.
    • Storage Queue Data Message Processor: Allows the function to manage messages in the poison queue, which is used by the Event Grid trigger for reliability.
  3. Store AWS Secrets in Key Vault: Create an Azure Key Vault and add your AWS credentials as secrets (e.g., aws-access-key-id, aws-secret-access-key). Grant your Function App's Managed Identity Get and List secret permissions on the Key Vault's Access Policies.

  4. Configure Function App Settings: In your Function App, go to Settings -> Configuration. Add the following Application Settings.

    Minimum Required Environment Variables

    For the Blob Trigger Connection (using Managed Identity): The function code refers to a connection named AzureBlobConnection. In Azure, you define its properties by using a double-underscore __ separator.

    Name Value
    AzureBlobConnection__accountName your-storage-account-name
    AzureBlobConnection__credential managedidentity
    AzureBlobConnection__blobServiceUri https://your-storage-account-name.blob.core.windows.net
    AzureBlobConnection__queueServiceUri https://your-storage-account-name.queue.core.windows.net

    For the AWS and Function Logic: Use Key Vault references to securely load your AWS secrets.

    Name Value (Example)
    CONTAINER_NAME upload
    AWS_S3_BUCKET_NAME your-s3-bucket-name
    AWS_S3_REGION us-east-1
    AWS_ACCESS_KEY_ID @Microsoft.KeyVault(SecretUri=https://your-vault.vault.azure.net/secrets/aws-access-key-id)
    AWS_SECRET_ACCESS_KEY @Microsoft.KeyVault(SecretUri=https://your-vault.vault.azure.net/secrets/aws-secret-access-key)
    STRIP_CONTAINER_PREFIX true (or false)

Configuration Details

Environment Variable Description Default
CONTAINER_NAME The name of the source Azure Blob Storage container that the function will monitor for new blobs. N/A
AzureBlobConnection (Local only) The full connection string to the source Azure Storage Account. In Azure, this is configured via __ properties. N/A
AWS_ACCESS_KEY_ID The access key for your AWS IAM user. N/A
AWS_SECRET_ACCESS_KEY The secret key for your AWS IAM user. N/A
AWS_S3_REGION The AWS region of your S3 bucket (e.g., us-east-1). N/A
AWS_S3_BUCKET_NAME The name of the destination S3 bucket. N/A
STRIP_CONTAINER_PREFIX Set to "true" to remove the container name from the S3 object key. For a blob at upload/folder/file.txt, the S3 key becomes folder/file.txt. "false"

About

This Azure Function provides a serverless solution to automatically synchronize files from an Azure Blob Storage container to an AWS S3 bucket. It is triggered by blob creation events via Azure Event Grid for efficient, near real-time replication.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages