The following is the first in a series of blogs aimed at providing a concise comparison of the most salient features of the leading Agile scaling frameworks. Hopefully, this information will help you choose the framework that is most suitable for your organization to meet their business needs.

For large software development projects involving up to a 100 to several 100’s of software developers, analysts and testers, the inherent techniques of agile methodologies such as Scrum or XP prove inadequate for effectively managing the progress of such enormous effort.

In this blog, we look at a quick comparison between two leading frameworks for scaling the Agile approach for large software development projects: Scaled Agile Framework (SAFe 4.5) and Large-Scale Scrum (LeSS).   Each has its strong points that may fit different organizational situations of large software product development.

Let’s get started: the “Big Picture”

See below the overview pictures for each of SAFe (www.scaledagileframework.com) and LeSS (less.works).

Right off the bat, you can see that while the SAFe Framework appears more comprehensive, it also appears more process-heavy.  In fact, the inventors of the LeSS framework are proud of its acronym indicating less process, less artifacts, and fewer roles, remaining faithful to having mainly the original Scrum roles of PO, SM and Team.

For example, SAFe offers the role of Product Manager, who is in charge of setting the priorities and overall scope of functionality to be delivered by a Program containing many Agile teams.  The Product Owner in SAFe performs the usual Scrum role for up to a couple of Agile Teams that typically work from a similar/shared backlog.

In contrast, LeSS offers the regular Scrum role of Product Owner (PO) for up to 8 Teams.  This is because in LeSS, the PO is not a liaison with the end-Customer: the Teams get to interact directly with the end-Customer to understand the details of the requirements, giving the PO the opportunity to focus on the overall priorities and scope for up to 8 Scrum Teams.

Hence if an organization can afford the opportunity for the Agile Teams to interact directly with the end-Customer LeSS can be a good fit in this particular aspect. Otherwise, SAFe can accommodate both the Team-direct and the liaison-PO situations.

SAFe 4.5 Framework

Organizational Structure

The Inventors of LeSS very much believe that culture follows structure.  To that end they offer LeSS not just as a practice to scale up the Scrum approach, but as a direct impetus for change of organizational structure.  The picture below shows what LeSS advocates for organizational structure for up to 8 Scrum Teams working together to develop a software product in order to provide what an Agile culture needs from an organization to succeed.

In this picture, you can see that there are no functional departments (e.g. development vs. testing) or a PMO.  Instead, in addition to the Scrum Teams, there is the Head of the Product Group, which LeSS views (as it views all other managers similar to the “Toyota Way”) as a teacher of those reporting to him/her, there is the Product Owner team that provides a pool of PO’s for every Scrum team (large or small scale) effort, and then there is the Undone Department.

The latter is a curious thing.  In LeSS, a permeating theme is that the Teams are supposed to do everything needed to put a high-quality software product in the hands of end-Customers: from analysis to development to testing to packaging, all while coordinating with other Teams.  All of that is represented in the Definition of Done of the Teams.  But it may take the Teams a few years to mature to that set of comprehensive capabilities.  Hence the Undone Department is a placeholder for resources that fill in for whatever the Teams are yet to be able to do (e.g. DevOps) until the Teams mature.

In contrast, SAFe does not advocate drastic organizational change as emphatically as LeSS.  It presents its approach for adoption even with the current organizational structure, and lets the organization take its time deciding when it may want to restructure to be more efficient with Agile.  That’s not to say that LeSS presents its approach as an “all or nothing deal” – it just emphasizes structural change in the organization more strongly than SAFe does.

Differences in Planning

SAFe stipulates that sprints should be grouped in sets of 4-5 consecutive sprints, each set being called a Program Increment (PI).  And while the Teams (and the Product as a whole) are expected to demonstrate incremental achievements at the end of each sprint (i.e. completed Stories), it is at the end of a PI that complete “Features” of the software product are expected to be available.  SAFe, however, maintains the option of releasing on-demand any time during a PI with the Features, or even Stories, that happen to be complete at that point in time.

Planning in SAFe happens in a 2-day session at the beginning of each PI, in addition to the usual sprint planning at the beginning of each sprint.  In the PI planning session all the Teams working together in what SAFe calls an Agile Release Train

(ART) attend to commit to delivering a set of Features for the PI, and to have each Team present a plan showing which stories (which are children of Features in SAFe) the Team plans to complete for each sprint in the PI.  Finally, in addition to the usual sprint demos and retrospective, SAFe has an overall Inspect and Adapt Workshop (analogous to the Sprint Demo and Retrospective) at the end of the PI which includes a PI demo, Quantitative Measurement, and a Problem Solving Workshop that dives into deeper root cause analysis than a normal Sprint Retrospective.

In contrast, LeSS remains faithful to just the usual sprints of Scrum, with the following additions:

  • Sprint Planning happens in 2 stages. The1st stage is attended by 2 representatives of each Team, which do not usually include the Team’s Scrum Master.  This stage decides which items from the common Product Backlog each Team will develop.  It also has cross-team estimations to unify the estimation numbers.  This is in contrast to SAFe, which suggests normalizing cross-Team estimations by equating a story point to a story that would take ½ day to code and ½ day to test.The second stage of sprint planning is the same as sprint planning in regular Scrum.
  • Each sprint review is held with all Teams as a “science fair”, where each Team has a station to demonstrate its accomplishments for the sprint. Attending stakeholders can visit the stations in which they are interested.
  • The Sprint Retrospective is held in two stages: the first being the same as regular Scrum; the second is for the overall progress of the software product being developed by the Teams.

Portfolio Management

As represented in the top level of the SAFe “Big Picture” shown earlier, SAFe offers a comprehensive approach of prioritizing “projects” (represented as Epics or a set of related Epics in SAFe) and budgeting for them in an Agile manner.  In its latest version, SAFe 4.5, there is an additional, optional, level for Large Solutions (shown below the Portfolio level in the aforementioned diagram) – it is usually relevant to projects with hundreds, or thousands of participants comprising multiple Agile Release Trains.

In contrast, LeSS does not delve into Portfolio Management: it only offers techniques that can be compared to the Program and Team levels of SAFe.

2 Versions of LeSS

LeSS has two versions:  the one we saw earlier for 2 to 8 Teams, and Less Huge for more than 8 Teams, depicted below.

LeSS Huge is formed by having several regular LeSS frameworks working in parallel with each other.  The most notable addition in LeSS Huge is making each regular LeSS belong to a separate Requirements Area with its own Area Product Owner (APO) under the overall Product Owner.

If you were thinking “Well, isn’t an ART the same as a Requirements Area?” you’d be partially right; a similarity is that the Product Backlog relationship to the Area Product Backlog is analogous to the relationship of a Portfolio Backlog to a Program Backlog in the sense that items on the former are coarser grained than items on the latter.  However, one of the differences is that an APO is still only one for 8 Teams, whereas the SAFe PO covers very fewer Teams.

Other Differences between LeSS and SAFe

LeSS can appear to offer one seemingly shocking advice (which is not offered by SAFe):  Don’t scale! (But if you have to scale, use LeSS J). It advocates that even very large software products can be built more successfully by a relatively small Team of co-located master programmers and testers.  They cite at least one example on their website (less.works) of a huge software project that followed a torturous path to completion.  When the overall project director was asked if he were to do it again, what would he do differently, he said that he’d pick the 10 best programmers and have them build it all.  I can cite a more recent example with the Affordable Care Act, where a traditional government contractor put an enormous number of resources on the project that failed miserably.  Later, about a dozen master developers and testers were put together in a house to work on fixing the ACA, which they did within a period of several months. (See http://www.theatlantic.com/technology/archive/2015/07/the-secret-startup-saved-healthcare-gov-the-worst-website-in-america/397784/)

  • Whereas SAFe is generally tool-neutral from a specific vendor perspective, SAFe does encourage early and often automation as much as possible, utilizing the system team to support that.
    LeSS, on the other hand, strongly recommends that you not use automated tools until after your organization becomes quite proficient with LeSS, opting instead to use manual resources like very big white boards and wall charts. Otherwise, LeSS declares that if you automate a mess, you get an automated mess.  And even after the Teams become proficient with LeSS, it recommends that you only use open source tools, which you can easily jettison if they don’t work out for you, without losing high-dollar investments in them.
  • SAFe takes a more customary view of the role of Scrum Master. In SAFe, the SM is pretty much a permanent role with the Scrum Team and does a lot of intra-Team and inter-Team coordination.  In LeSS, the SM is first and foremost a Teacher.  He can fade away from the day-to-day Team dynamics once the Team becomes proficient in the Scrum and LeSS approaches.
  • In SAFe: Epics, Capabilities, Features and Stories are explicitly handled as integral parts of the SAFe backlogs. LeSS, on the other hand, only talks about coarser vs. finer grained Backlog Items, staying faithful to Scrum by relegating Epics, Features and Stories as instruments of XP, which is not part of Scrum proper.

Conclusion

The quick comparison between LeSS and SAFe in this blog is by no means comprehensive.  Yet it shows SAFe to be more wide-ranging in offering processes and roles to handle the development of software from the highest profile levels down to the individual Agile Team for large-scale Agile efforts, while making those roles and levels configurable according the organization’s needs.  Furthermore, for a typical traditional large organization it is perhaps more palatable to begin to adopt SAFe than LeSS, since the latter strongly advocates some major changes to the structure of the organization as early as possible in the adoption of LeSS.


This information piece is a kind of monologue on the ability to organize, launch and avoid the death traps of an Agile Transformation in medium to large sized Enterprises. It is the “What” to be aware of, and to an extent the “Whys”, not the “How”. It is delivered over several blog entries, so stay tuned for the entire series.

Our planned topics are:

  • The Agile Transformation Game – Organizing for Stability
  • Picking the Right One, the Right Way – What Scaled Method is Right for My Enterprise?
  • Are we There Yet? – When Does an Agile Transformation Hit Critical Mass?
  • “INCOMING!” Agile Adoption Perils; Land Mines, Hand Grenades and Shell Shock along the Road to Agile Adoption
  • “The Man Behind the Curtain” – Mentors and Their Role in an Agile Adoption

Several scaled agile practices are exposed, and you should make the correct choice of the one that meets your needs best. The focus is also not so much on the practices themselves, but how to launch a successful Transformation regardless of the chosen practice or method.

The “How” is something that as a company we provide through our services as experts in the Agile Transformation space.

Common Reasons for Seeking out a Scaled Agile Method

There are some common reasons for seeking out a Scaled Agile Method. These may be referred to as “Pain Points”. You may find that in this list one or more of these situations apply (and others not listed):

  • Your Agile transformation is working ok, but you are finding that transforming teams is not enough
  • You are a business under change seeking new markets to exploit faster
  • You want better communications with business customer
  • As IT Management you are seeking better Governance
  • Your Management wants to make the project noise go away
  • You find you need a better method of complying to Audit Requirements
  • You need Metrics, Measurements, Commonality
  • You are merging companies, Blending IT Enterprises
  • You want to stop the bleeding of wasted capabilities in IT

Why Scaling the Enterprise is Necessary

Typically, common SCRUM practices are for small, co-located teams working on small to medium sized efforts.

Scrum recommends:

  • 7 members plus or minus 3 for teams
  • Co-Located within sight distance
  • Embedded Product Owners (Voice of the Customer)
  • Simple Roles – ScrumMaster, Product Owner and Team
  • Self Directed Teams accomplishing small units of work
  • Simple Delivery ceremonies – Planning, Daily stand-ups, Product Reviews and Retrospectives

Our teams do not look like this model, ours look like:

  • Multiple teams of 20 to 400 staff
  • Highly distributed geographically (Including Offshore Development)
  • Specialized and silo’ed roles
  • Highly focused on Command and Control
  • Mix of Contractors, Vendors, Specialists and internal support orgs all with their own agendas
  • Complicated and complex delivery cycles with silo’ed groups with arcane and sometimes black box processes
  • Highly integrated systems of systems that comprise the Enterprise IT capability utilizing a mixed bag of diversified languages, data structures, technologies and operating platforms

Most commercial Enterprises require the ability to “Scale up” to a larger capability utilizing the practices in an agile based SCRUM practice to reflect a more complex and complicated world. So we need to look for a Scaled Agile Method to support these real world models. This is what this monologue is all about.

Make sure to read next blog in series, coming soon: The Agile Transformation Game – Organizing for Stability

Tom Weinberger has been practicing Enterprise Transformations for the last 18 years with both tools and process. He specializes in the conversion of large enterprises of 2000 or more staff. His experience extends to many industries including two global banks, 6 large insurance companies and many other industries. He is currently an Agile Transformation coach for Blue Agility, LLC.

Join Tom on Linkedin at: https://www.linkedin.com/in/tomweinberger

 


Any scrum team knows that the target is to get stories to done within the sprint.  Teams establish a Definition of Done (DoD) which includes all of the items that the teams needs to get the story to done.

Often teams that are new to scrum will follow many of the ceremonies, but still operate within a traditional waterfall context, meaning they won’t release to production until everything is done.  Today I will share a painful story on why getting stories to “Done” is not enough and scrum teams need to actually release the code to production more frequently than at the end of the project.

2014-02-05-Definition-of-Done-300x270

The Background

I was working with a team as part of a large company to upgrade some of their systems. The company historically used a waterfall process, but was starting to transition to scrum.  Initially the team was told that since this was an upgrade it would be run as a waterfall project, but as we dove in we saw how it would fit very well into an agile methodology.

Below were some key characteristics for the project:

  • There were 2 key drivers for the upgrade: End of support for the version of the operating system and concerns over performance during heavy usage periods
  • There was a “hard” delivery date needed as a result of the 2 key drivers above, which was ~1 year out
  • The project included the following: OS upgrade, database upgrade, splitting a single server for the application and database into 2 servers, building in BCP and HA to both the application and database server, rewriting a third party software with in house software and replacing the current job scheduler with another (among other items)
  • Additional scope items were added throughout the project and deemed “necessary”

The Plan

So even though we were told this project was not good for scrum, the team decided they would utilize Scrum.  Below was the basic approach:

  • There were 13 different applications on these servers.  The team used each application as an Epic.  Since the team did not have automated testing in place, it was not feasible to completely regression test every single system.  Instead, they evaluated the testing strategy and created stories around each major testing flow that they would perform.
  • We got together with the SMEs and used Planning Poker to estimate the size of each story (testing flow) and compared to our baseline.  After estimating each story (testing flow), we then compared the total rolled up points for each epic (application) to see if they compared relatively, which they generally did.
  • We then worked with the team to estimate their velocity, at which point we created a release plan.  Since we were given the end date, we knew that we could fit 13 3-week sprints.

I recommended that instead of doing one large production release, we break the project up into multiple smaller production releases along the way.  We received the buy-in to work in sprints, but we were told that we could not break up the releases and would release everything to production at once.

And this was where I failed as a ScrumMaster. When I analyzed the effort, I came to the following conclusions:

  • We were going to be standing up a new production server alongside the existing production server.  Our goal was to move all of the applications over to the new server, with the goal of retiring the old server.
  • There was quite a bit of risk with this project.  The code base was developed on the assumption that the application and database would be on the same server.  We were changing a lot (including application and database versions).  Additionally, these systems were critical to the business and the clients.
  • We had an entirely new system that we could release to, run in parallel against and not affect our current production system.

So I proposed that we prioritized the backlog, we pull a system that was low risk from an impact perspective, but contained as many of the characteristics that we could from the other systems (such as programming languages).  Once we finish all of the stories for this application, we then release it to the new production system and stop using the current production system.  This would accomplish several things:

  • It would allow us to test our release process.  Since there were a lot of new things with this system, it would be great to actually do a release and get quick feedback on anything that we had to adjust.
  • The risk of impacting our current production system was low, given that this was a completely new system.  If the release did not go well, we would gain valuable knowledge and could simply restore the previous version of the application.
  • If the application that we released had issues, we could learn from that any apply changes to the other system.  An example would be if the way we were trying to connect from the new application server to the new database was not allowed in production (but was allowed in our test environment), one of the few ways that we could catch it is to actually put it on that production environment.
  • If the release went well, then we would begin reducing the load on our existing production system.  If you remember back, one of the drivers for this project was the fear that the current production system could not keep up during peak times.  Moving applications off one at a time would reduce risk by removing the “hard” date.

Even with laying out these items (and several others), the organization just was not ready to separate from “the way they had always done things”.  They decided to mitigate some of the risk by doing the single big bang release over a 3 day weekend, ensuring that no one on the team would sleep or get to enjoy the weekend.

The Problems

“In software, if something is painful, do it more often.”

The main fear that I could identify is that the company was not structured to do frequent releases.  Release cycles were long, the release process (production and between the various test environments) was manual and the business was expecting a long UAT (User Acceptance Testing) phase at the end instead of several smaller, incremental UATs throughout.

Image of character avoiding obstacle

Problem 1: Maintenance, Merging & Rework

  • You can imagine that 13 different applications have a lot of code.  They also most likely require changes for reasons outside of our effort.  This meant that as our project got further along, we had to constantly merge code in from the other projects.  Not only did this take time away from us, we had the added complexity of changing code to work on the new servers and operating systems.  The code we were merging in was designed for the old servers/systems, so these were complicated merges.
  • Since we were operating in sprints, we focused on developing and testing in small cycles, with the goal to get ach story (testing flow) to done.  This code then sat in version control, and after performing a merge we needed to do some additional testing, which again took time away.

Problem 2: Not Knowing What We Don’t Know

  • One of the powers of getting to “done” is that there should be less surprises.  When we estimate that we are 80% done with something, it is just an estimate.  Especially if we have not completed testing or other items, we don’t know if there are hidden issues that might cause us to actually only be 40% done.
  • Even if we complete all of our testing, until we have gone through all of our steps and are using a system in production, we won’t know if it is truly working.  There could many reasons for this, including our test environments not being identical to our production environment.

Problem 3: Release Planning

  • Since the company did not have release automation in place, releases and everything related to them was done manually.  This means that as time goes on, the “release plan” continued to grow and had to constantly be maintained.  One can imagine the amount of mistakes in this plan, many of which were not caught because of infrequent releases.

Problem 4: Back Out Planning

  • An important aspect of any release plan is the ability to back out the changes and revert to the pre-release state.  Although the team had planned for this, everyone “knew” that this release just had to work, so there was not much focus on making sure the system could be reverted.

10-7-2004-7-02-41-AM-3940637-300x224

The Reality

So after various bumps in the road, many of which we were able to react to and overcome as a result of utilizing scrum, the release weekend was finally upon the team.  They stocked up their Red Bull, coffee and Mountain Dew and planted themselves together in a “command center” (large training room).  It had been a long road to get to this point.  The team expected to have some issues, but from the testing and several dry run releases to a production like environment, expected to be able to get the release out.  Below are a few of the key items that occurred:

  • As expected, the team ended up working all 3 days that weekend to try and get the release out.  The first night some of the team left because they were dependent on other team members to do work, but the second day was an all-nighter for the team.
  • At the end of the second day there were still several issues with the system.  The team decided to continue pushing forward with the release.
  • Team members had a few hours of sleep, then continued working the third day until ~2 am.
  • The first morning “live” on the new system, I ended up having to pull all of the team together.  We took over a large training room so that we could work together and efficiently address the growing list of defects.
  • The team put in on average 80 hours the first week of release (after the long 3 day release weekend).  This included pulling in several resources that were not on the project to assist.
  • The second and third week saw less time spent, but the team was still putting in ~60 hours each week.
  • It took several months and many maintenance efforts in order to get the system stable and to close out the majority of defects.
  • In total, there were 400+ production defects from this release, many of which were critical and the team had to resolve very quick while also being sleep deprived.
  • As a result of a great team the business and the clients had minimal impact (financially and missed SLAs), which was simply amazing given the number of high critical defects.
  • It looked like the project was going to come in under budget before release, and it ended up going 30%+ over budget after all of the extra time was spent after the release.

images

The Lessons

There were many lessons to learn after this effort.  Once the critical bugs were resolved, management began to try and figure out just what went wrong.  Of course there were many failure points that contributed and each person/group were blaming other groups to protect themselves.  A few key items included the following:

  • Just about any project can operate using scrum.  The riskier the project, the more using an incremental and iterative approach makes sense.
  • Getting code to meet the Definition of Done (DoD) is one thing, but it is difficult to truly know if everything is really “Done” until the software is running in production.
  • Think outside the box.  Just because your organization has not done it before does not mean you shouldn’t try.  Releasing the new applications to production on the new server one at a time is a great example of this.
  • Releasing often reduces risk.  Not only does it make sure that you have a great release process (and hopefully show the need for automation), but it also provided feedback to truly let you know if things are working.
  • It is essential to spend time developing a strategy for backing out code if there are issues.  The smaller the release, the easier the back out plan is.
  • Frequent releases (even if code is in a turned off state) prevent the ongoing maintenance of having to merge code from other branches/streams.
  • A “hard” date often doesn’t have to be a hard date.  In this case, we could have reduced the numbers of applications running in the previous system to hedge our risk of peak times.

After the project was finished and we were holding a retrospective, one manager said it best… “Even though we had a lot of issues with this effort, I truly believe operating with Scrum saved us.  If we had treated this as a waterfall project, we would have never hit our date and I fear there would have been a much larger client and financial impact.”