Skip to content

Slow deployment with litefs on fly.io on 6.7gb database #446

@luisiacc

Description

@luisiacc

So I'm having this issue where my app with 2 machines on fly.io, is currently taking about 3 minutes each machine to deploy, and about 80% of that time is occupied by litefs.

I'm looking for help on how to fix this and make it fast, I think 6.7gb is not that big of a database, while talking to the support team at fly.io, there is also a heavy increase in cpu usage to almost 100% usage during that time. Here are some pictures and logs about what happened.

Litefs version: 0.5.14

Image

Here is a log of one of my deployments where 1 step takes alomst 2 minutes:

08:13:15 level=INFO msg=”initializing consul: key=iuspro-20250530/notario url=https://:656117fa-394f-1726-8904-f31ddd6cce70@consul-iad-11.fly-shared.net/notario-yexkqwp8dm79m38d/ hostname=4d894672a06148 advertise-url=[http://4d894672a06148.vm.notario.internal:20202”](http://4d894672a06148.vm.notario.internal:20202%22/)

08:13:15 2025/12/15 08:13:15 INFO SSH listening listen_address=[fdaa:2:fb49:a7b:1eb:5c53:e84d:2]:22

08:13:15 level=INFO msg=”wal-sync: short wal file exists on \”cache.db\”, skipping sync with ltx”

08:13:16 Health check ‘servicecheck-01-http-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.

08:13:16 Health check ‘servicecheck-00-tcp-8080’ on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.

08:13:23 level=INFO msg=”wal-sync: database \”sqlite.db\” has wal size of 2319592 bytes within range of ltx file (@1161872, 1157720 bytes)”


====================== 2 minutes here ===============================================

08:15:00 level=INFO msg=”using existing cluster id: \”LFSCD2310317B4BCC32D\””

08:15:00 level=INFO msg=”LiteFS mounted to: /litefs/data”

08:15:00 level=INFO msg=”http server listening on: [http://localhost:20202”](http://localhost:20202%22/)

During one of the deployments, while debugging, I ran litefs mount -tracing -fuse.debug to get more info here are relevant logs (full logs of that session at https://pastebin.com/RR2nw7Yr):

2025/12/14 04:55:37.093550 [ApplyLTX(sqlite.db)]: txid=000000000001ba80-000000000001ba80 chksum=aa0e807bec87e251-c05a18b70e4846a8 commit=1582701 pageSize=4096 timestamp=2025-12-14T03:44:17Z mode=(WAL_MODE→WAL_MODE) path=000000000001ba80-000000000001ba80.ltx
2025-12-13 22:55:37.093	
2025/12/14 04:55:37.093445 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:55:37.093	
2025/12/14 04:55:37.093122 [UpdateSHM(sqlite.db)]
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.090544 [TruncateDatabase(sqlite.db)]: pageN=1582701 prevPageN=1582701 pageSize=4096
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.090107 [WriteDatabasePage(sqlite.db)]: pgno=1582492 chksum=90fa73c90670675e prev=90fa73c90670675e
2025-12-13 22:55:37.090	
2025/12/14 04:55:37.089989 [WriteDatabasePage(sqlite.db)]: pgno=1449939 chksum=ce7cdf1ee1e5c640 prev=ce7cdf1ee1e5c640
2025-12-13 22:55:37.089	
2025/12/14 04:55:37.089562 [AcquireWriteLock.DONE(sqlite.db)]:
2025-12-13 22:55:37.089	
2025/12/14 04:55:37.089229 [AcquireWriteLock(sqlite.db)]:
2025-12-13 22:55:34.151
======================================== 2 minutes here ==========================================
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153495 [Recover(sqlite.db)]:
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153420 [CheckpointDone(sqlite.db)] <nil>
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153378 [UpdateSHMDone(sqlite.db)]
2025-12-13 22:53:28.153	
2025/12/14 04:53:28.153112 [UpdateSHM(sqlite.db)]
2025-12-13 22:53:28.153	

My setup:

  • I have Docker file with an entrypoint
// entrypoint
#!/usr/bin/env sh

if [ "$FLY_PROCESS_GROUP" = "app" ]; then
    export LITEFS_EXEC_CMD="npm start"
    exec litefs mount
elif [ "$FLY_PROCESS_GROUP" = "worker" ]; then
    exec npm run worker
fi
// litefs.yml
fuse:
  # Required. This is the mount directory that applications will
  # use to access their SQLite databases.
  dir: '${LITEFS_DIR}'

data:
  # Path to internal data storage.
  dir: '/data/litefs'

proxy:
  # matches the internal_port in fly.toml
  addr: ':${INTERNAL_PORT}'
  target: 'localhost:${PORT}'
  db: '${DATABASE_FILENAME}'

# The lease section specifies how the cluster will be managed. We're using the
# "consul" lease type so that our application can dynamically change the primary.
#
# These environment variables will be available in your Fly.io application.
lease:
  type: 'consul'
  candidate: ${FLY_PROCESS_GROUP == "app"}
  promote: true
  advertise-url: 'http://${HOSTNAME}.vm.${FLY_APP_NAME}.internal:20202'

  consul:
    url: '${FLY_CONSUL_URL}'
    key: '<key-here>/${FLY_APP_NAME}'

exec:
  - cmd: 'npx --yes prisma migrate deploy'
    if-candidate: true

  # Set the journal mode for the database to WAL. This reduces concurrency deadlock issues
  - cmd: 'sqlite3 $DATABASE_PATH "PRAGMA journal_mode = WAL;"'
    if-candidate: true

  # Set the journal mode for the cache to WAL. This reduces concurrency deadlock issues
  - cmd: 'sqlite3 $CACHE_DATABASE_PATH "PRAGMA journal_mode = WAL;"'
    if-candidate: true

  - cmd: 'npx prisma generate'

  # Execute the command appropriate to the proceess group based on the environment variable set for LITEFS_EXEC_CMD
  - cmd: '${LITEFS_EXEC_CMD}'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions