Tinderbox jobs

9/13/2023

Tinderbox jobs

Read Now

Bad builds were colored “red”, and gave rise to “the tree is burning” or “the tree is closed”. The vertical columns of green reminded people of trees, giving rise to the phrase “the tree is green” when all builds looked good and it was safe for developers to land checkins. What do you call a new system that continuously integrates code checkins? Hmmm… how about “ a continuous integration server“?! Good builds were colored “green”. By sharing the results on a company-wide-visible webserver, it meant that any developer (not just the few Release Engineers) could now help detangle build problems. At first, this was triggered every hour, hence the phrase “hourly build”, but that was quickly changed to starting a new build immediately after finishing the previous build.īy integrating all the checkins and building continuously like this throughout the day, it meant that each individual build contained fewer changes to detangle if problems arose. In these desperate times, Netscape RelEng built a system that grabbed the latest source code, generated a build, displayed the results in a simple linear time-sorted format on a webpage where everyone could see status, and then start again… grab the latest source code, build, post status… again. But doing builds more frequently would also be disruptive because everyone had to stop and help manually debug-build-problems twice as often. If you could do builds twice a day, you only had half-as-many changes to sort through and detangle, so you could more quickly identify and fix build problems. With so many people involved, this was expensive to the organization in terms of salary as well as opportunity cost.

Obviously, this was disruptive to the developers who had landed a change, to the other developers who were waiting to land a change, and to the Release Engineers in the middle of it all…. More rare, but not unheard of, was that the build bustage halted development for multiple days in a row. Sometimes, it could take all day to debug and fix the build problem – no new checkins happened on those days, halting all development for the entire day. Worst case, some checkins were fine by themselves, but cause problems when combined with, or integrated with, other changes, so even the best-intentioned developer could still “break the build” in non-obvious ways. However, this 10am build was frequently broken, causing checkins to remain blocked until the gathered developers and release engineers figured out which change caused the problem and fixed it.įixing build bustages like this took time, and lots of people, to figure out which of all the checkins that day caused the problem. If you were lucky, this 10am build worked first time, took “only” a couple of hours, and allowed new checkins to start lunchtime-ish. Only after the 10am build completed successfully were Netscape developers allowed to start checking-in more code changes on top of what was now proven to be good code.

This integration process was so fragile that all developers who did checkins in a day had to be in the office before 10am the next morning to immediately help debug any problems that arose with the build. In fact, as this was the first time that all the checkins from the previous day were compiled together, or “integrated” together, surprise build breakages were common. Even if a given individual change was “good”, it was frequently possible for a combination of “good” changes to cause problems.

At 10am each morning, Netscape RelEng would gather all the checkins from the previous day, and manually start to build. Instead, developers would have to wait until the next morning to find out if their change caused any problems. This is a historic moment for Mozilla, and for the software industry in general, so I thought people might find it interesting to get some background, as well as outline the assumptions we changed when designing the replacement Continuous Integration and Release Engineering infrastructure now in use at Mozilla.Īt Netscape, developers would checkin a code change, and then go home at night, without knowing if their change broke anything. Now, just over 17 years later, in May 2014, the tinderbox server was finally turned off. In April 1997, Netscape ReleaseEngineers wrote, and started running, the world’s first? second? continuous integration server.

0 Comments

Tinderbox jobs

Leave a Reply.

Author

Archives

Categories