Deploying FogBugz 8 and Kiln 2
Of the many positive comments about the new FogBugz and Kiln releases, that was my favorite. Not so much because we did anything better than Twitter, as I haven’t heard of any problems with the rollout of their new version, but mostly because it meant…we did it! As the new build and release manager for Fog Creek, it was satisfying to get to this point—to have shipped two new major releases simultaneously with some interesting changes to our deployment process to handle new releases mixed in along the way.
I’ve mentioned before that your deployment scripts may have many different deployment targets: developer machines, the QA team, internal users, beta testers, and finally a full release. I’d like to document some of our processes here at Fog Creek for deploying FogBugz and Kiln. There are some things we do well…really well. And there are some things we’re still working to improve. Though my focus will be on deploying (rather than building) our products, I will mention our build processes when they affect deployment, and discuss some of the improvements we could make there too.
It’s worth mentioning now that all of the credit for what works in our build and deployment process goes to the great Fog Creek developers that set up this process over the last few years. And one of the reasons our releases went so smoothly this time around is because the current FogBugz and Kiln teams did a great job squashing all of the bugs that our QA team did a great job finding. Besides storing away knowledge in hopes of making some improvements going forward, I didn’t do much more than run a few script files. If that sounds exciting to you, read on.
Dev Deployment
Before anyone else can deploy a new version of FogBugz or Kiln, the FogBugz and Kiln developers do. Work began in earnest for FogBugz 8 and Kiln 2 when our summer interns arrived in late May and early June. Each of them needed to get FogBugz built and deployed on their machine and the Kiln interns needed to get Kiln deployed.
The set of steps a new intern goes through to get their FogBugz and/or Kiln deployments up and running is, sadly, not short. And it’s all manual. It involves getting the source code in the correct directories, installing a large number of tools (all at the correct version for development, of course), setting up virtual directories, preparing the database, building everything, etc. This whole process typically took one or two days, and not all setups were the same—something that occasionally bit our interns. Typical problems included getting the wrong version of some build tool, setting up the databases using the wrong named db instance, naming directories incorrectly, etc. As you might expect, once everything is set up, you don’t want to change anything, because it can be painful.
On the other hand, once everything is set up, doing daily development is pretty darn easy. Building FogBugz or Kiln in place will update your locally hosted instance of that application. There really aren’t any ongoing deployment needs, except when new features necessitate them. If you want two different development environments to make it easier to switch between bug fixing and feature development (we use one repository for the currently shipping version—bugfixes, and another for the “next” version—new features), you have to go through the same steps to create a new deployment as you did for your initial set up. It is no fun repaving your machine, but that’s not something the interns needed to worry about too much, and only occasionally causes problems for us full-time developers.
It’s worth noting here that development deployments most closely resemble a “licensed” deployment (i.e., the type of installation a customer uses when they install FogBugz or Kiln on their own servers). Setting up a development machine to behave like our “hosted” environment (i.e., the type of installation used for our FogBugz On Demand service where Fog Creek hosts everything for you) is much more complicated both to set up in the first place, as well as to update with new code. And honestly, doing the same for Kiln is almost unheard of.
In summary, initial deployment to development machines can be somewhat painful, but ongoing deployment of changes for testing purposes is easy and relatively painless. We can work to make initial deployment more automated, as long as we preserve the ease of ongoing deployment. There are additional development deployment features we could also add, such as deploying to multiple virtual directories from a single copy of the source, scripted deployment to virtual machines, and scripted deployment to a test hosted (as opposed to licensed) environment.
Ourdot
After two to three weeks of intense development, code reviews, and hallway usability discussions, it was time for the interns to show off what they had done to the rest of the company. First, they did demos of their features after lunch one day. We got to see the first iterations of cool new features like Kiln search, the new FogBugz wiki editor, all the wiki navigation changes, repository aliases, “big files” support in Kiln, as well as the much-improved FogBugz permissions model. The more important aspect of their demo was deploying their code to the FogBugz and Kiln instance we use internally, so that everyone else in the company could start using it on a daily basis…and start reporting bugs on a daily basis. This install of FogBugz and Kiln is affectionately known as “ourdot”.
Deployments to ourdot are unique. Unlike deployments to customers, our QA team, or dev machines, a separate set of scripts handles the work of building and deploying both products to ourdot, though they have much in common with our other build/deploy scripts. The ourdot install is a kind of cross between our licensed and hosted environments, and it’s often used as a testing ground for build and deployment changes as well as new features.
We continued deployments to ourdot every two weeks during major feature development and then more frequently as the interns transitioned into bug fixing mode in preparation for the beta release. At the start of the summer, deploying both FogBugz and Kiln to ourdot was a very manual process. Scripts did much of the work of building and copying files, but modifying the configuration on the servers in order to handle a new build had to be done manually, and involved 5-10 manual steps for both products. One of my first projects after taking over build and release responsibilities was to get both of these scripts running completely autonomously. All that I have to do now is just kick off the build and deploy process, and then deal with failures if they occur, which is quite rare.
Generations
In preparation for our first beta release, we needed to make sure that everything was working correctly in our hosted environment. The hosted servers use an extension of blue-green deployment to manage the switchover of accounts from one version to another and also to support having different accounts running different versions at the same time. Each version deployed to our hosted servers becomes a “generation” and each account is running on a specific generation. Individual accounts can be moved from one generation to another (later) generation, and scripts make it easy to move some subset of accounts or to change the default generation. Changing the default generation upgrades accounts on the old default generation and ensures that new accounts are created on that generation.
The reasons this deployment process works is because 1) deploying a new generation to the servers does not affect anyone on existing generations and 2) moving accounts from one generation to another does not affect other accounts. So we don’t have to deploy new generations at horrible hours of the night, or warn all accounts about downtime when only some are actually being upgraded.
While this process works quite well from our users’ standpoint, there are important changes we can make to improve our efficiency. The scripts that build both FogBugz and Kiln for deployment to our hosted servers are not the same scripts that do builds and deployments for ourdot, licensed, or dev machine installs. However, they do share a lot of common functionality that could be consolidated (and will be, in time). Another improvement we’re looking at is combining our generation numbers with our version numbers, so we don’t have to track two different numbers when filing bugs and discussing changes to the software. Moving accounts to a new generation can be surprisingly quick, but deploying a new generation still takes too long, and is something we want to improve. Also, FogBugz generations and Kiln generations are unrelated, despite the tight integration between the two products.
And now, back to the story of deploying FogBugz and Kiln! We went through a few deployments of new generations to get all the bugs worked out of the hosted deployment scripts. These “bugs” were typically things that had changed due to new features being added, but that had not yet been incorporated into our deployment scripts. Once we had a generation that built and deployed cleanly, we moved some testing accounts over so that our QA team could hammer it, and then proceeded to move alpha accounts over. Once the alpha accounts had been using the new code for a while we were ready to give the new version to beta customers.
Beta
As Joel described in Top Twelve Tips for Running a Beta Test, one key to a great product launch is a successful beta program. And the first thing you need for a good beta program are willing guinea pigs…um…I mean, users. For this rollout, we used an old web app that had been coded up for collecting beta applicant information for previous versions of FogBugz and Kiln. While not pretty, it got the job done, and when it didn’t we could always go straight to the beta applicant database and enter things directly if needed.
Another important aspect of running a successful beta is getting the word out, so we gradually let our customers know with announcements on the FogBugz and Kiln support sites, email, announcements on the blog, and via Twitter. We spaced these out with the goal of having applicants come in at about the same rate that we were approving them, in hopes that no one would have to wait too long before being accepted into the beta. The beta application allowed anyone to sign up for the hosted beta, and all current licensed customers to sign up for the licensed beta.
Once we had a stable generation for our hosted beta users to try out, we moved just 3 customer accounts to that generation and started watching for bug reports, support questions, etc. This may seem like an awfully small number, but we targeted customers who were already active on the support sites in hopes of getting good feedback. As that feedback started to come in and we were able to iterate on the issues raised, we slowly at first and then more quickly added more and more beta customers, roughly doubling the number each week until we had around 150 hosted beta accounts active, including some of our biggest customers.
Licensed Beta
Our licensed beta deployment process is probably the weakest link of our entire launch process. It did not break or cause major problems at any point, but it is almost completely manual. Basically, customers who signed up for the licensed beta are added to a list of potential beta users. We can then email any of these users with a unique download link that they can use to access the installer. The process of deploying a new installer involves manually copying each installer to the right location on our web servers, modifying the beta applicant database with the filename and size of each installer, manually verifying that the download links work for each installer, and then emailing all accepted beta customers a notification that the beta installer has been updated, and adding any new beta applicants to the beta.
Overall, the first step in improving this process was just to automate it. Since we were only accepting existing licensed users, we could also improve it by integrating it with our existing shop website (the website that tracks orders, maintenance contracts, etc.), where these customers were already able to download copies of the installer they had purchased.
Despite being so manual, and offering such obvious areas for improvement, we really had no major problems releasing licensed beta installers to our customers. Yes, it was kind of a pain, but it didn’t happen too often, and could have been a lot worse. Just as with the hosted beta, we initially informed just a few licensed beta applicants of the new download. Then as we received bug reports and fixed issues, we updated the download with new builds and informed more and more beta applicants until we had roughly 40 accounts using the new release, putting our total beta accounts (i.e., both hosted and licensed) at almost 200. At this point, the developers and QA staff on both the FogBugz and Kiln teams were feeling ready to start shipping to customers who had not applied for the beta. The final release was about to begin!
Leaking
Just as with the beta release, it’s helpful to roll out the final release to our hosted customers in stages. We call this “leaking the release,” which sounds kind of gross, but I like to think of it like sugar water leaking out of a bird feeder rather than, …well, you can come up with an appropriately negative metaphor yourself. The birds love it. As before, this gradual process helps us to catch issues before they have a chance of affecting everyone. Now that we’re dealing with 2-3 orders of magnitude more customers, it’s much more likely that we’ll run into performance problems with the new code. One serendipitous benefit of this type of release was that buzz about the new versions progressively increased over the course of the month leading up to the full rollout.
Initially we bumped 5% of our hosted customers up to the new release generation. In doing so, we found some performance issues in the background processes that do maintenance work on the FogBugz database. Taking that back to the FogBugz developers, they were able to quickly come up with a fix, and we verified it a week later when we bumped to 15% of customers using the release generation. The rest of the generation bumps went flawlessly up to 65% of customers on the new version just a few days before our final 100% bump was scheduled. Once two thirds of our customers had the new versions of FogBugz and Kiln we saw a noticeable increase in comments on Twitter and other outlets about the new version.
Even though there was still deployment work to do, this felt like a turning point to me. More of our hosted customers were using FogBugz 8 and Kiln 2 than were using versions 7 and 1 (respectively). Deployments had gone surprisingly well, with very few BugzScout reports popping up. The performance issues had been squashed, and no major hiccups occurred. If anything, it almost felt too easy. We did the bump to 65% on a Wednesday night. Thursday’s project was a big one: deploy the new website, release our licensed installer, and update our shop website to handle purchases of and upgrades to the new licensed version.
The Big Switch
Deploying changes to our website is a fairly straightforward process, though it does (currently) take too long. A simple script is run that re-indexes the site, copies the files to our webservers, updates the server configuration, and then checks a few key files to make sure nothing broke. It takes over an hour to run; about two thirds of which is spent copying the files over to the web servers.
But before we could pull the trigger, we needed to make sure that our shop website could handle purchases of the new versions of FogBugz and Kiln. Because major versions of our software are released relatively infrequently, this process has not yet been automated. So I got to get down and dirty with the shop database in order to add the necessary information about the new FogBugz and Kiln licensed products (and the various platforms they operate on). As part of this process we got the actual licensed installers for both products copied over to the web servers.
These installers had gone through the normal beta process described above, and additionally had been hit pretty well by our QA team. With the changes to our shop website tested by doing some fake purchases, we were ready to actually deploy the website…which went off without a hitch. It was fun to announce to the company that the fancy new website (designed by Jason and the rest of our design team) was up and to hear the excitement from everyone in the office. Although we weren’t totally done, this was the day that it felt like we had really shipped the product. New licensed customers were getting the new version, the website was up, and life was good.
100%
…but we weren’t really done…at least not yet. We still had one more generation bump to do for our hosted customers in order to get the remaining 35% of accounts using the new versions. This bump went even more smoothly than our previous ones. It felt so good to look at a simple DB query that showed all of our active accounts on the latest generations and know that we had shipped. It also felt pretty cool to know that we had done so by using the bug tracking abilities of FogBugz and the great source control provided by Kiln.
This final bump to 100% of hosted customers to the new version also bumped our new signups to start getting the new version as well. One thing we really want to fix the next time around is to decouple that switch so that new customers can start getting the new version earlier on in the release process. As things stood, there were three days when new hosted customers were getting the old versions of FogBugz and Kiln despite the fact that we’d already updated our website and were actively promoting the new versions.
Final Notification
Once more, it was time to iteratively “leak” the new version of FogBugz and Kiln–this time by notifying our licensed customers that there was a new version available for them to download and install. True, some of them would notice it on their shop page after having read or heard about the new version online. But most will just wait for the notification in FogBugz, then go to our website to download the new installers and upgrade. Those who don’t have support contracts will need to pay for the new version or get their support contracts up to date.
So, we notified some percentage of our licensed customers every couple of days over the next two weeks until we had 100% of them notified by early October. This helped to spread out the bandwidth load from customers downloading the rather large installer files, and allowed us the same wiggle room to fix any bad bugs in the installers or any licensed-specific code issues before it got out to all of our customers.
Final Thoughts
With all that done, we’ve already shipped our first set of bug fixes to customers—within a week of finishing the process of notifying our customers about the new versions. This is the low-hanging fruit that was important to fix, but not so important as to delay shipping FogBugz 8 or Kiln 2 to our customers. It’s easy at the end of a cycle to get caught up in perfectionism and think that every bug absolutely needs to be fixed, but after a certain point it’s more important to get it to your customers and find out which bugs they are actually seeing (you might be surprised). And besides, your customers might find much worse bugs to focus on that you never would have found in-house. Or they might just want some small feature that didn’t manage to make its way into the initial launch release.
As I said earlier, everything good about this process is credited to the entire FogBugz and Kiln team that got it in place over the past few years and many releases. Ben Kamens owned all of this process before handing it off to me, and did a great job getting me up to speed as I tried to digest all of the new information.
And now that we’ve shipped, we’re already working to improve the way we work internally so that we can get bug fixes and new features to our customers more quickly. You know you work with a great team when an impromptu post-ship meeting where the only agenda item is “Congratulations for the team!” turns into a great technical discussion about next steps to improve the product because everyone is so geared up about making it even better.
