We didn’t plan to migrate to Heroku for another month as there were plenty of other things on our plate.
On Monday, January 9th 2012, that all changed…
We had been down for almost an hour before we were alerted to the situation. It quickly became clear that our current host was not going to be able to support our growing needs. We’d long grown tired of maintaining our fabric scripts and needed a host that would restart our servers instead of just killing them when we ran into problems. Moving to Heroku had become priority number 1. Over the next 5 days, we pulled 18 hour days to make the move to Heroku happen. It was no small effort, but in terms of long-term developer productivity, its the best thing we’ve done yet.
What follows is an overview of what we did and how we did it.
But first, a little about our stack.
Before we migrated, we were running Python-django, stack on top of a MySQL DB in an All-in-one configuration. Its about as simple of a stack as it can get. However, by the time we were done, we had separated our DB tier from our web tier and had set ourselves up with multiple web front-ends capable of easily switching to a web-worker pattern.
Strategy: Minimize Moving Parts.With any migration to a totally new set of infrastructure, there are going to be a lot of moving parts. Before we started making the switch the very first thing we did is create a new branch in git from the HEAD of our Master branch. Along with the migration, we agreed that there would be no new feature changes. Only changes to the infrastructure would be allowed. It got progressively more difficult to enforce this along the way but it made the whole process a lot easier.
Environment Setup on Heroku
The Heroku Dev Center has a lot of good documentation on setting up an environment to fit your needs. We started with Getting Started with Django on Heroku/Cedar. However, we found that when we tried to deploy from our “migrate” branch, the site would not deploy. After reading, Deploying with Git we realized that we could do this by calling git push [heroku_remote_repo] migrate:master. Heroku ignores everything except the master branch on its apps.
Another point of interest is that most teams will have more than one environment. At Sendhub.com, we have the Production environment, One staging environment that is an exact replication of what is on production, and a “mini” staging environment for each engineer on the team. Getting this all set up was easy once we readManaging Multiple Environments for an app.
Environment Configuration on Heroku: Understand the “Heroku way”
Heroku does a lot of things different than your regular, everyday hosting provider…and that’s a good thing. However, it’s important to understand those differences and their implications before you jump into it. In particular, its important to understand how Heroku does environment configuration. A lot of people are used to having a settings.py file which provides default configuration for an environment and having that file being overridden by a local_settings.py file containing all of the stuff specific to the environment.
However, on Heroku, files not in the main “slug” (a compressed artifact representing your application) are not guaranteed to exist between server restarts and new deployments. Therefore, aside from checking in your important keys and configs into git (which is NOT a good idea) pushing up a local_settings.py is not an option. Instead, it’s recommended that you store your configuration in environment variables. See the article onConfiguration and Config Vars. What we really geek out about is the following line of text in the article:
“Environment variables set with heroku config are persistent. They will remain in place across deploys and app restarts, so unless you need to change values you only need to set them once.”
Switching from MySQL to Postgres
While still on our old hosting provider, we were using MySQL for our database, and it seemed to work for us. Yet, at the same time, we found that Heroku has some pretty amazing support for PostgresSQL. After reading through their docs and supported features, we really liked the idea that once we were using Postgres, we would be able to migrate seamlesly to progressively larger databases. Not to mention the IO improvements we would get by running the whole DB in memory. Reading this page was enough to convince us that Heroku Postgress is a really good idea.Setting up a simple Ronin Dedicated DB on Heroku was simple…
$ heroku addons:add heroku-postgresql:ronin –app <APP>
$ heroku pg:wait ;#– waits for the DB to come up
$ heroku pg:info #– get the <COLOR> of the database
$ heroku pg:promote HEROKU_POSTGRESQL_<COLOR>
Setting up postgress on your local system is just as easy on a Mac… $ sudo brew install postgresql
NOTE: Heroku automatically appends the db configuration to your settings.py file once you deploy your app. By running the promote command above, it will allow you to hook up the new db to your Heroku app once you are ready.
…ok, but how do we migrate the data from MySQL?
This turned out to be the hardest part of the whole process. Here is a step by step process on how we did it for the real migration.1. Set our Site into maintenance mode, so that no one was allowed to write anymore data.
2. Next, we had to get a local copy of our production database. We did this by using SequelPro. Here are the steps for that:
a. Open Up SequelPro and Connect to your prod db, (the steps for this will vary based on who hosts your db). For us we used an SSH connection.
b. Once connected, use File → Export to export the db to a local sql file.
c. Create a new local database in SequelPro and import the file you just exported. There were some errors that occurred on import, which we just ignored as they were not going to affect us once we moved to Postgres.
3. To do the transformation of data we used a tool called mysql2postgress which was recommended by this heroku dev center doc. The first time you run the tool it will create a yml file which tells it what database to connect to and where to place the converted file. once you’ve configured the file, its as simple as calling $ mysql2psql prodDump.yml
a. WARNING. Its important to note that this tool only gets you 98% of the way there. Before running the real migration, we tested this to see how well it worked. One thing it will not do for you is properly set up sequence objects that auto-increment the primary keys. The key thing that was missing for us is that the ownership of these sequence objects was not assigned to the appropriate column. Therefore when we tested writing new records to the DB, we started getting a bunch of errors related to objects not being able to be saved without primary keys.
b. SOLUTION: The solution to this was found by initializing our Django app against an empty PostgresSQL database. When we did this we noticed we weren’t getting any errors. When we did an SQL export of the working db, we found that our migrated data was missing sql statements that resembled the following…ALTER SEQUENCE auth_user_id_seq OWNED BY auth_user.id;
We then collected a list of all of the statements that resembled this and added these statements manually to our converted sql dump during the real migration.
4. Loading the database into our new Heroku Ronin DB was merely a process of loading the postgres formatted sql into a local db, creating a compressed .dump of the file, and then uploading it securely to an amazon S3 bucket where heroku could pull from:First create the local database
$ psql -U <yoursysusername>
=# CREATE USER mydb_prod WITH PASSWORD ‘mydb_prod’;
=# CREATE DATABASE db_prod OWNER db_prod;
=# GRANT ALL ON DATABASE mydb_prod to mydb_prod;
=# \q Then load in the converted data into the db
$ psql -U mydb_prod -d mydb_prod < ./prodDump-converted.sql Next, create a dump of the data…
$ PGPASSWORD=mydb_prod pg_dump -Fc –no-acl –no-owner -h localhost -U mydb_prod mydb_prod > mydb_prod.dump
5. We learned via the Support Engineers at Heroku that currently (at least at the time of writing) the best way to import your data into postgres is NOT using the instructions here which recommends to use the pg restore utility, but rather using the free heroku addon called pgbackups.$ heroku addons:add pgbackups –app <appname> Upload the dump somewhere that heroku can find it. (like Amazon S3. We recommend using step five of this article) Next, reset your Heroku database
$ heroku pg:reset DATABASE –app <appname>
Then upload your data
$ heroku pgbackups:restore DATABASE 'http://example.com/path/to/mydb_prod.dump’ –app <appname>
File Storage switch to Amazon S3
One, thing that we almost overlooked was the user uploaded file storage piece. Previously on our old hosting service, we were able to host all of the user uploaded files on the same filesystem. However, since Heroku’s filesystem is “ephemeral” we would not be able to use the same system. This meant we had to make the switch to moving to Amazon S3 for all of our file storage. Here’s how we did that…
- Create an amazon account and sign up for Amazon S3.
- Create a bucket for your production user files
- pip install django-storages
- Add “boto==2.1.1” and “django-storages==1.1.4” to your requirements.txt file
- To configure your django site to use Amazon S3 for the static and default file storage, Add the following lines to your settings.py file
STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage’
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage’
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID’);
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY’);
AWS_STORAGE_BUCKET_NAME = os.getenv('AWS_STORAGE_BUCKET_NAME’);
- Set the appropriate keys in your heroku app’s configuration using “heroku config:add key=value” To test this locally you will also have to set these values in your .bash_profile
- Setup s3cmd. You will use this to upload the files to amazon s3.
- When the site was in maintenance mode, log into the production server and create an archive of all of the user uploaded files.
- We used CyberDuck to pull down the archive locally.
- Extract the archive on your local machine and cd to where those files are.
- $ s3cmd put –recursive user_files_dir s3://my_prod_bucket_name/files/
Why CloudFlare made this easy
Switching our domains over to Heroku might have caused some serious downtime, but thankfully we were already using CloudFlare to manage our DNS (and speed up the site). Before starting deployment, we added our domains to Heroku so our apps were ready. All we needed to do was edit the IP address of our A records and add the necessary CNAME records for our subdomains.
As all but one of our subdomains were used internally, we set those up to point at the relevant Heroku apps ahead of time. When it came time to switch over the production domain, we only had to edit 3 A records and the WWW subdomain. If you’ve ever used a domain registrar’s DNS tools you’ll know they say it can take up to 2 days for the changes to propogate; with CloudFlare it was almost instant as our domain nameservers remained with CloudFlare and they updated the modified A records and CNAME straight away.
Moving to Heroku has been easily the largest productivity increase we’ve had. We went from pushing once every few days to pushing a few times a day. We’ve seen a massive reduction in our downtime, from going down every couple of days to just 30 minutes of total downtime in the last three weeks. It’s also been a big win for our users as our app’s response times are now consistently under 100ms and Heroku makes that easy to monitor via New Relic.
Overall the switch was not completely smooth and there were certainly some moments of stress, especially with the database migration, but the results speak for themselves. If you’re looking to scale your site, we highly recommend moving to Heroku.