How to Upgrade a Legacy Heroku Database

     

“Grayed out upgrade buttons on Heroku”

The number of concurrent users on our service has been growing by leaps and bounds. This is great news for our business but also means our primary database on Heroku keeps running out of cache. In our experience, Heroku’s Postgres databases see a significant performance drop when cache utilization exceeds 85%.

Ten days ago our growth caught up with us and our app began to slow. Engineering isolated the cause of the slowdown to the database, and our cache utilization. We then upgraded to the next plan (Fugu) following the fast database changeovers procedure. The process required less than 90 seconds of downtime. The team rejoiced as we believed we’d found a relatively easy way to keep scaling Postgres, without application changes, for the near future.

Five days ago I checked our cache utilization. We were already at 2.3GB/3.75GB-. The data size was growing at more than 100MB per day. This meant that in less than 8 days we would already have performance issues again. I logged in to Heroku to initiate another fast-database-changeover, but to my surprise all further upgrades were grayed out (see image below).

After contacting support, I learned that SendHub was one of the very first clients to start using Heroku’s managed database services back in January 2012. It turns out that all early adopter Postgres instances are running on 32-bit Postgres, and we were already on the maximum plan available (Fugu). There is a binary incompatibility between the two architectures which renders the WALs (Write-Ahead-Logs) not interoperable, which means that a 32-bit database cannot have 64-bit followers. The end result was that a fast-database-changeover procedure was impossible. Heroku advised doing a “dump and restore” operation but this would mean 45 minutes or more of downtime. Even in the middle of the night SendHub has significant activity, making that option unacceptable.

The SendHub stack is homogeneous and straightforward, consisting of Python (Django/Celery), Postgres and RabbitMQ. All the database interactions use single statement autocommit (no multi-statement transactions). Since multiple concurrent sessions of transactions do not exist on our system, I thought that maybe the following replay scheme could work:

  1. Take the SendHub site offline and make a snapshot of the database
  2. Deploy a branch that will send all database queries to a logging server
  3. Turn the site back on and begin importing the snapshot to the new database
  4. When the import is finished, take the site offline
  5. Replay the collected queries
  6. Promote the new database
  7. Bring the site back online

When I first described the above approach to one of my co-workers, his first response was, “Jay, you’re crazy”; however, given that the only alternative was a massive amount of downtime, I decided to write a remote logger and test out my hypothesis.

I wrote a simple query log capturing and replay system in PHP. Then I modified the SendHub Django application to forward all of its data modification SQL statements to the capturing system. I had everyone at the office help test on staging and the results were promisingthe replay system cleanly reassembled the data.

The end result was that Ryan (our engineering lead) and I teamed up to execute the changeover procedure on production with a total of only 6 minutes downtime instead of 45 minutes. There was one minor hiccup thoughwe learned that when “debug” mode is not turned on Django query logging is disabled, so that set us back one or two minutes. All in all, it was an incredible success. 

Below you can find our process and code for performing the upgrade.

At SendHub, we streamline our production site maintenance by preparing a document with detailed instructions including all the commands which will be run. Here is the planning document we used for this procedure:

Source code & guide to upgrading a legacy Heroku database

1. Create the new “Ika” database at https://postgres.heroku.com/databases, wait until the status gets to “available”

Prod: “HEROKU_POSTGRESQL_GOLD”

Host ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com

Database <db-name>

User <username>

Port <port>

Password <password>

2. Merge latest master into branch jay/db-logging

git fetch

git checkout jay/db-logging

git pull origin master

git push

3. Turn on maintenance mode and scale down to 0

heroku maintenance:on -asendhug

heroku ps:scale worker=0 web=0 -asendhug

4. Create database snapshot on heroku.com, then deploy branch jay/db-logging, turn on debug mode (otherwise statements will not be logged), and wait until the snapshot completes

heroku config:add debug=true -asendhug

./deploy.sh prod jay/db-logging

5. Scale workers back up and turn off maintenance mode (db statements willl now be captured on dbupgrade.sendhub.com)

heroku ps:scale worker=50 web=50 -asendhug

heroku maintenance:off -asendhug

6. Download the snapshot and use pg_restore to load it into the new database

wget $(heroku pgbackups:url -asendhug)

PGPASSWORD=<password> pg_restore –verbose –clean –no-acl –no-owner -h <hostname> -U <username> -d <database> -p <port> <LOCAL_FILE_PATH>

7. Turn on maintenance mode and scale workers down to 0, wait until the app finishes flushing it’s async queue

heroku maintenance:on -asendhug

heroku ps:scale worker=0 web=0 -asendhug

8. Run the replay (either by curl or from the web interface at http://dbupgrade.sendhub.com/replay.php) and wait for the commits to completely flush

curl -v -vv ‘http://dbupgrade.sendhub.com/replay.php?sourceHost=sendhub.com&since=0&host=<hostname>&database=<database>&port=<port>&user=<username>&password=<password>’

9. Do a sanity check on the database info on Heroku

heroku pg:info -asendhug

10. Wait for commits to finish, then promote the new database

Prod: heroku pg:promote HEROKU_POSTGRESQL_GOLD -asendhug

11. Scale workers back up and turn off maintenance mode

heroku ps:scale worker=50 web=50 -asendhug

heroku maintenance:off -asendhug

12. Disable debug mode and deploy master back to the environment

heroku config:remove debug -asendhug

./deploy.sh prod master

… and now, back to Time Crisis!

If you have any questions or comments, send us a text or email!

Jay & the SendHub team

About The Author