We are software developers. Our daily duty is to write programs. We spend a lot of time and effort into doing the right things and doing them right. At some point, the production launch day comes and depending on our level of confidence - we are calm and ready for the first wave of unpredictable users, or deadly terrified.
How to build a level of confidence that helps us releasing new versions of the system and preventing heart attacks? There is no simple answer, but there are reliable tools, techniques, and patterns every software developer should consider using. The book called "Release It! 2nd edition: Design and Deploy Production-Ready Software" by Michael T. Nygard covers most of them.
In the first chapter (Living in Production) the author explains what does it mean that the system is production-ready and how everything that happens before the production deployment is just a prelude. He gives several short examples that illustrate how big mistake people make when they believe that any lab-sterile testing can mimic real, and often crazy, users. The second chapter is a case study - a thriller starring an unexpected exception as a villain, and a huge airline company as a victim. This chapter made me love this book. Micheal Nygard is a great storyteller, and this chapter is the best proof. He guides the reader step by step, explaining the way he analyzed and solved that problem. Chapter three (Stabilize Your System) focuses on defining what stability of the system means. It asks a few questions that can be summarized with "Let’s think what may go wrong" sentence. Chapter four (Stability Antipatterns) provides a list of possible failures, their root causes, and anti-solutions people quite often apply to solve them. In the next chapter (Stability Patterns) the author focuses on the simplest and most reliable solutions like timeouts, circuit breakers or backpressure to name just a few.
Chapter six opens the second part of the book called "Design for Production" and it starts with the second case study, which shows how massive marketing campaign incorporated with the broken e-commerce system may cause damage. The seventh chapter (Foundations) introduces production-ready design layers. They are (from bottom to top): Foundation, Instances, Interconnect, Control Plane, and Operations. In this chapter the author focuses on the first layer - he briefly explains the networking basics, physical hosts, VMs and containers. The next chapter (Processes on Machines) uncovers the layer called "Instances". It provides the definitions of services, instances, executables, etc. It also focuses on the code (including the configuration as a code approach) and logging. In the ninth chapter, we are introduced to the instances' interconnection strategies. It starts with a simple (and error-prone) DNS-only approach and ends up with ready-to-scale service discovery solution example. Chapter ten (Control Plane) treats of automation, monitoring, distributed log collection, provisioning, and possible platforms that may help to achieve the desired level of control. It does not suggest any silver-bullet solution - it explains the possibilities and tells what should we focus on instead. Chapter eleven is a comprehensive tour on OWASP top 10 security vulnerabilities.
Chapter twelve opens the 3rd part of the book called "Deliver Your System" and it starts with yet another case study. Chapter thirteen (Design For Deployment) explains the importance of smaller and more frequent deployments towards making significant changes possible with a series of small and predictable steps. The next chapter (Handling Versions) focuses on different ways of handling API versioning. It explains the importance of supporting backward compatible changes only and shows how to make consumer-producer integration less painful.
Chapter fifteen (Trampled by Your Own Customers) opens the last part of the book called "Solve Systematic Problems". It starts with a case study - a history of a redesigned e-commerce system which was crashed by 250k active sessions in 30 minutes after the first launch. The next chapter (Adaption) teaches us how to adapt to the changing environment and grow over time through planning releases cycle. The last chapter seventeen (Chaos Engineering) gives a brief introduction to the idea of chaos engineering. It accepts that systems are fragile and failures show up sooner than later. And the more control over them we have, the better. Building resilient systems require breaking them regularly and in a controlled way.
"Release It!" is a great book every software developer, architect, designer or even QA engineer should read. It focuses on principles and guidance so it may feel like it misses some level of details. However, if it focused more on the specific tools instead of principles, it would quickly become outdated. You won’t regret spending time with the book - it is 336 pages of useful and never outdating knowledge. Highly recommended!