We need to accept that unforeseen regressions and late changes have consequences. Slipping the full Fedora release schedule when we don’t meet our release criteria is a good way to show that and maintain a baseline of quality for our releases. Working backwards from important milestones and starting earlier is how we ship on time.
At the April 1, 2010 Board IRC Q&A session John McDonough asked if we track the reasons for schedules slips and respond accordingly. I track the differences between our planned and actual major milestone dates. Exactly why a milestone slips is not specifically tracked and there are usually differing opinions for the exact reasons. Generally we slip because we cannot resolve blocker bugs in time to adequately retest them and stage the content for the coming Tuesday release day–articulated recently by Adam. The QA team is doing a great job tracking our experiences for Fedora 13 in living retrospective page.
This got me thinking more about the things detailed below.
The Release Schedule Affects Everyone
Each Fedora release has its themes and challenges. Fedora 13 was the first release to have detailed release criteria and set schedule methodology from the beginning. We also started with a contingency plan of slipping the release milestone and subsequent milestones by one week if we missed a milestone. For the Fedora 13 Alpha, we absorbed the slip, hoping for the best and justifying the change based on the newly implemented No Frozen Rawhide “not taking away developer time.”
This phrase doesn’t make sense to me. I wasn’t at the meeting so perhaps I missed the full context. The Fedora release schedule is bigger than whether or not we are “taking time away from developers.” It is also important to factor in the time period between test releases to clear blocker lists, the amount of PR and public testing soak time a test release gets, and how much an already compressed schedule is being compressed more.
Stop Iterating on the Scheduling
As we construct each new release schedule we try to factor in the lessons from the release before. This means no two release schedules are ever the same. They are often very similar, but the task durations change and the methodology gets tweaked. We’ve reached the point where we need to stop tweaking and run with a fixed schedule methodology for more than one release.
Given how long it takes us to get used to new processes, holding our schedule methodology constant for a couple releases and taking a break from experimenting might yield better insights into how to do our releases better and build the schedule accordingly.
Schedule Milestones are Not a Suggestion
If we really want our releases to be on time we must give interim milestones and tasks just as much value as the big ones. If we plan to compose a release candidate on Thursday so QA has six days to test before the Go/No-Go meeting, we should make a bigger deal when the release candidate isn’t ready until Monday or Tuesday. If history shows that we rarely if ever have a solid release candidate on the day it is scheduled we should start earlier than Thursday to create it. We already have a “Test Compose” milestone scheduled a week earlier to address this, but it suffers from the same approach.
It’s all about working backwards–just like we do for every day life events that must start or happen by a certain time. We work backwards, we start earlier, we build in buffers and contingency plans, and we say “no” even if other people don’t understand.
Consistently Slipping the Slip
Our schedules are constructed very tightly to maximize development time at the front of the schedule. This results in tasks scheduled for the shortest time possible in the rest of the schedule. It doesn’t make sense to think that when something goes wrong we can suddenly make up the time. Rarely, if ever, has Fedora “made up time” on the schedule. It is part of scheduling mythology that “time on a software schedule can be made up.”
Historically, failing to to “slip the slip” catches up with us in the end. As a distribution we underestimate the marketing and PR value the Alpha and Beta releases bring. With a short length of three weeks for each it doesn’t make sense to shorten them so that the final release date can be on time.
Reverting to Last Known Good
For as long as I can remember Fedora has been a time based release. Most successful time based releases have stricter practices around “last known good” content and rolling back to it when regressions are introduced. We don’t revert very often because it is usually deemed more disruptive to roll-back to the “last known good” than to keep the new package and fix the regression.
This is part of our process we should fix. It could go a long way towards reducing our need to slip. No Frozen Rawhide helps to make sure less broken stuff gets in, but it does not address how to fix broken stuff when it does get in.