Culture and Methodologies Resources

DZone's Featured Culture and Methodologies Resources

Kubernetes in the Enterprise

In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.

Why I Prefer Trunk-Based Development

By Trisha Gee

These days, distributed version control systems like Git have "won the war" of version control. One of the arguments I used to hear when DVCSs were gaining traction was around how easy it is to branch and merge with a VCS like Git. However, I'm a big fan of Trunk-Based Development (TBD), and I want to tell you why. With trunk-based development, all developers work on a single branch (e.g., 'main'). You might have read or heard Martin Fowler or Dave Farley talking about it. It was when I was working with Dave (around about the time that Git was rapidly becoming the "go-to" version control system) that I really saw the benefits of trunk-based development for the team, particularly in an environment that was pioneering continuous delivery (Dave was writing the book with Jez Humble while I worked with him). In contrast, the branching model encourages developers to create separate branches for every feature, bug fix, or enhancement. Although branching may seem like a logical approach to isolate changes and reduce risk, several factors make me more comfortable with trunk-based development. 1. Speed and Efficiency In trunk-based development, the entire team works on a single branch. This model allows for quicker integrations and fewer merge conflicts. This is literally "continuous integration (CI)", as originally suggested by the practices of Extreme Programming. While these days we tend to mean "run your build and tests on a team server every time you commit" when we say CI, what CI really meant was actually integrate your code regularly. Code living on separate branches is, by definition, not integrated. And the longer these branches live, the more challenging it is to merge them back into the main codebase. It might seem fast to develop your fixes and improvements on a separate branch that isn't impacted by other developers' changes, but you still have to pay that cost at some point. Integrating small changes regularly into your code is usually less painful than a big merge at the end of a longer period of time. 2. Greater Code Stability Trunk-based development encourages frequent commits, which leads to smaller and more manageable changes. How and why? For the same reason that we don't want big merges from long-lived branches - the longer we leave it to commit our changes, the higher the chances our commit will clash with someone else's changes. By frequently pulling in the other developers' changes, and frequently pushing small changes of working code, we know the codebase is stable and working. Of course, this assumption of "stable and working" is easier to check if we have a CI server that's running the build and tests for each of these commits. We also have to stop making commits if the build breaks at any time and focus on fixing that build. Continuously pushing small, frequent commits when the build is already broken isn't going to do anyone any favors.In the branching model, large and infrequent merges can introduce bugs that are hard to identify and resolve due to the sheer size of the changes. Have you ever merged the trunk into your branch after someone else has merged their own big piece of work and found your code no longer works? It can take a LOT of time to track down why your tests are failing or the application isn't working the way you expect when you've made a whole bunch of changes and someone else has made a whole bunch of different, or overlapping, changes. And that's assuming you actually have reliable test coverage that can tell you there's a problem. 3. Enhanced Team Collaboration My favorite way of sharing knowledge between team members is pair programming. I know not everyone is a fan or is in a position to do it (especially now more people are working remotely, but if so, check out JetBrains' Code With Me). If you're not pairing, then at least you want to be working on the same code, right? If you're all working on your own branches, you are not collaborating. You are competing. To see who can get their code in fastest. To avoid being stomped on by someone else's code changes. If you're all working on the same branch, you tend to have a better awareness of the changes being made. This approach fosters greater team collaboration and knowledge sharing. In contrast, branching can create a siloed work environment where you're all working independently, leading to knowledge gaps within the team. 4. Improved Continuous Integration and Delivery (CI/CD) Practices Dave Farley's book, "Continuous Delivery," his blog posts, and videos argue something along the lines of "trunk-based development is inherently compatible with Continuous Integration and Continuous Delivery (CI/CD) practices." In a trunk-based model, continuous integration becomes more straightforward because your code is committed frequently to trunk, and that's the branch your CI environment is running the build and tests on. Any failures there are seen and addressed promptly, reducing the risk of nasty failures. It's usually easy to track down which changes caused the problem. If the issue can't be fixed immediately, you can back the specific changes that caused it. By now we should know the importance of a quick feedback loop - when you find problems faster you can locate the cause faster, and you can fix it faster. This improves your software's quality. Continuous delivery also thrives in a trunk-based development environment. Successful continuous delivery hinges on the ability to have a codebase that is always in a deployable state. The trunk-based development approach ensures this by promoting frequent commits, frequent integrations, and tests on all of these integrations. The small number of changes being introduced at any one time makes the software easier to deploy and test. In contrast, implementing effective CI/CD can be more complex and time-consuming with the branching model. While it's tempting to think, "Well, I run my build and all my tests on my branch," you're not actually integrating every time you commit. It's at merge (or rebase) time that you start to see any integration issues. All those tests you were running on your branch "in CI" were not testing any kind of integration at all. Merging and testing code from different branches can introduce delays and potential errors, which takes away some of the benefits of having a build pipeline in the first place. 5. Reduced Technical Debt Long-lived branches often lead to 'merge hell' where the differences between one branch (like 'main') and another (for example, your feature branch) are so great that merging becomes a nightmare. This can result in technical debt, as you may resort to quick fixes to resolve merge conflicts or accept suggestions from your IDE that resolve the merge but that you don't fully understand. With trunk-based development, frequent merges, and smaller changes make it easier to manage and reduce the build-up of technical debt. In Conclusion I personally think trunk-based development has clear advantages, and I have experienced them first-hand working in teams that have adopted this approach. However, it requires a mindset and a culture within the development team. You need to frequently merge other developers' changes into your own code. You need to commit small changes, frequently, which requires you to only change small sections of the code and make incremental changes, something which can be a difficult habit to get used to. Pair programming, comprehensive automated testing, and maybe code reviews are key practices to helping all the team to adopt the same approach and culture. Trunk-based development, done in a disciplined way, streamlines the development process, enhances team collaboration, improves code stability, supports efficient CI/CD practices, and may result in less technical debt. While it may be challenging to adapt to this approach if you've been working with a branch-based model, the long-term benefits are worthwhile. If you want to move to work this way, you may also want to read Dave's article addressing barriers to trunk-based development. More

Refcard #008

Design Patterns

By Jason McDonald

Introduction to Kanban Methodology

By Aditya Bhuyan

Scrum Master Success Indicators

By Stefan Wolpers CORE

The Magic of Compound Efficiencies in Engineering

As the milestone book Atomic Habits laid out, the key to life-changing habits is adopting one effectively and then layering another desirable habit on top of it. The same is true for efficiencies in software engineering. When your team adopts one efficiency, sees it bear fruit, then adds the next efficiency habit on top of it, the result is compounding efficiencies. In this conversation, LinearB’s CTO, Yishai Beeri, reveals the data on compound efficiencies as experienced by real dev teams out in the wild. “There are dev orgs that just visit their metrics dashboard every three weeks. That's nice, but that's not enough. That's not going to change behaviors that improve efficiency.” Episode Highlights (2:53) Sourcing the data (5:57) Visibility for devs & managers (12:15) Improving code reviews (19:30) What are compounding efficiencies? (21:48) Streamlining the PR process (25:52) Results from efficiencies gained (33:40) Giving devs back more focus time (40:45) How to get compounding efficiencies for your team Episode Excerpt Conor Bronsdon: There's a great analogy for how you are improving the efficiency for Dev teams and something we can all understand food. If you ask a nutritionist, what is most effective as a diet for losing weight, they might tell you that all diets are effective in helping you lose weight because they give you visibility into what you're eating and they make you think about it and how much you should be eating. By being conscious of what you're eating and knowing the right amount of food to consume for your activity level, you'll start to move towards your natural balance, and you'll see those opportunities for improvement. And I know you've said that this can apply to engineering as well. Can you tell me a bit more about that? Yishai Beeri: A lot of the change that we wanna create, and this could be about our personal lives or about the Dev process, a lot of it is down to human behavior. So changing human behavior is about changing habits. You mentioned the food analogy, like being aware of what I'm eating or not eating. That's a behavior change. Not like just the awareness, just being, not eating automatically, emotionally eating whatever, or tons of reasons why. Being aware helps me control and helps me improve my behavior. The same thing happens with development and Dev teams and Dev processes by being aware of the productivity, the inefficiencies, the bottlenecks, by talking about them, by measuring them, by surfacing them, by making them important, and part of what we are striving to improve. That alone starts to move the needle because if my focus is only on writing the code and I'm not aware of those inefficiencies, those wait times, those idle times, then I may be able to create a great code, but at a level of efficiency, that is not optimal. If I'm starting to think about and live, the pain and the, give it a name, give it a number like the contact switches, everything that is blocking the team from moving faster. Now I can begin to focus and change behaviors around it. Awareness is the first level, the first layer, of behavior change, and that begins to move the needle.

By Conor Bronsdon

The Unified Principles of Methodologies

In the realm of project management and software development, various methodologies have emerged, including Six Sigma (Define, Measure, Analyze, Improve, Control — DMAIC), Agile, FMEA, Scrum, Kanban, Extreme Programming (XP), and Feature-Driven Development (FDD). While proponents of each methodology often emphasize their uniqueness, a closer examination reveals that these methodologies share underlying principles and that the perceived disparities are often mere illusions. This article aims to explore the philosophical foundations that bind these methodologies, demonstrating that their variations serve primarily to perpetuate the belief in novelty. By uncovering their common principles, we can foster a deeper understanding and promote collaboration across these approaches. The Pursuit of Excellence: The Telos of Methodologies At the heart of all these methodologies lies the pursuit of excellence. Ancient philosophers such as Plato and Aristotle contemplated the notion of telos, the inherent purpose or goal of a thing. Similarly, these methodologies seek to improve processes, optimize outcomes, and achieve excellence in their respective domains. They share the recognition that constant refinement and the pursuit of perfection are crucial for success. Epistemology: The Foundation of Knowledge Epistemology, the branch of philosophy concerned with the nature of knowledge, plays a vital role in these methodologies. They acknowledge that knowledge forms the bedrock upon which improvement is built. From John Locke's empiricism to Immanuel Kant's transcendental idealism, philosophers have explored different avenues to understand how we acquire knowledge and apply it to the world. Similarly, methodologies like Six Sigma (DMAIC), Agile, FMEA, Scrum, Kanban, XP, and FDD emphasize the value of data-driven decision-making, continuous learning, and leveraging insights from experience. They recognize that knowledge is dynamic and ever-evolving, driving progress and innovation. Ontology: Embracing Change and Uncertainty Metaphysics and ontology delve into the nature of reality, change, and existence. Ancient philosopher Heraclitus contended that change is constant and central to the nature of the universe. These methodologies align with this perspective, recognizing the inherent complexity and dynamism of projects and software development. Agile methodologies, such as Scrum, XP, and FDD, embrace change and uncertainty, advocating for iterative development, adaptive planning, and responsiveness to evolving requirements. They understand the importance of being adaptable and flexible in a swiftly changing environment. Ethics: Empowering People and Generating Worth The moral principles and the pursuit of value are central to these approaches. Renowned thinkers like Aristotle and John Stuart Mill have delved into ethics, underscoring the significance of virtuous behavior and the well-being of individuals. In a similar vein, these approaches place great emphasis on empowering people and fostering value creation. Agile methodologies prioritize the formation of self-managed teams, collaboration, and valuing individuals above strict processes and tools. They create an atmosphere that fosters respect, transparency, and collective responsibility. In this way, these approaches align with ethical principles and foster the growth of individuals and organizations. The Illusion of Methodological Difference The perceived differences among these methodologies often stem from superficial variations in terminology, practices, or contextual nuances. Some proponents may present their methodology as a groundbreaking revelation, perpetuating the illusion of distinctiveness. However, a comprehensive examination reveals that these methodologies are built upon shared principles and ideas. For example, Lean Six Sigma (DMAIC) incorporates principles from Six Sigma (DMAIC) and Lean manufacturing, recognizing the symbiotic relationship between process efficiency and quality improvement. Similarly, Crystal, Kanban, and Dynamic Systems Development Method (DSDM) share core principles with Agile methodologies, focusing on iterative development, collaboration, and delivering value to stakeholders. Image 1: The Six Sigma Cycle Image 2: The Agile Cycle Image 3: The Continuous Improvement Cycle Waterfall, Spiral, and Everything in Between In the realm of software product development, agile methodologies have emerged as a significant trend. As software engineering gained attention from academia and industry experts, a compilation of practices known as waterfall methodology became the initial focus of discussion and evangelization. While the predominant discourse in software development process debates often centers around "waterfall vs. agile," it is important to recognize that these are not the only two methodologies that have existed. Although the concept of waterfall methodology did exist, it was not known by that term at its inception. The term "waterfall" emerged when it became evident that uncertainties and errors in early stages exponentially multiplied as subsequent phases progressed. Subsequently, the software development community introduced the concept of the "spiral" model, which allowed for slight iterations and adjustments in previous phases to accommodate changes in the development process. Another notable approach was the Rational Unified Process (RUP), which underwent several iterations aimed at incorporating agile principles. As an active participant in numerous panel discussions analyzing the merits and drawbacks of each approach, the consistent consensus reached was always focused on delivering maximum value to the customer and acknowledging the inevitability of change, requiring organizations to adapt accordingly. While agile methodologies have gained considerable attention in software development, it is crucial to acknowledge the historical progression from waterfall to spiral models, as well as the evolution of approaches like RUP. The fundamental objective remains consistent across methodologies: delivering exceptional value to customers and embracing the need for flexibility in the face of evolving requirements. Image 4: The Waterfall Model Image 5: The spiral model Image 6: The Rational Unified Process Model Consider the Source and Maintain Your Objective In the film "Devil's Advocate," during the climactic scene, Al Pacino, portraying the Devil, attempts to persuade Keanu Reeves' character that he, as the Devil, is the one to be followed, not God. Reeves responds with the phrase, "In the Bible, you lose," to which the Devil retorts, "Consider the source, son." There are undoubtedly circumstances in which strict adherence to a methodology is paramount. One such situation arises when a client imposes specific requirements. Others may include pursuing certification to enhance one's resume or fulfilling the prerequisite for participating in a bid process. The range of situations is extensive. Integration partners, service providers, certification courses, and consultant companies often emphasize the importance of methodology adherence. However, it is essential not to unquestioningly accept such assertions. Instead, consider the credibility and expertise of those offering advice — the question of whether compliance is genuinely necessary. Seek the guidance of trusted individuals with sound judgment and market experience. If determined to be necessary, then proceed accordingly. Nevertheless, when the primary objective is to establish an effective team or streamline processes to consistently deliver value to your product and customers, look no further than the guiding principles. Remain steadfast in adhering to these principles. In the realm of software development, this entails embracing the Agile Manifesto. Within the Agile Manifesto lie the essential principles required to establish an efficient process that consistently delivers tangible value. Conclusion In conclusion, despite claims of novelty and distinctiveness, methodologies such as Six Sigma (DMAIC), Agile, FMEA, Scrum, Kanban, XP, FDD, and their counterparts share common ground. It is unnecessary to be preoccupied with adhering to any specific methodology, regardless of how enticing or fashionable it may appear. What truly matters is aligning with and adhering to the fundamental principles. By doing so, you can ensure your path to success, as ultimately, these principles converge and lead to similar outcomes. References: Aristotle. (2009). Nicomachean Ethics. Penguin Classics. Title: "The Lean Six Sigma Pocket Toolbook: A Quick Reference Guide to Nearly 100 Tools for Improving Quality and Speed" Author: Michael L. George, John Maxey, David T. Rowlands, Mark Price Title: "The Agile Manifesto" Authors: Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, im Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. Martin, Ken Schwaber, Jeff Sutherland, Dave Thomas Title: "The Agile Samurai" Authors: Jonathan Rasmusson Title: "Agile Experience Design: A Digital Designer's Guide to Agile, Lean, and Continuous (Voices That Matter)" Authors: Lindsay Ratcliffe, Marc McNeill Title: "Agile Project Management with Scrum" Author: Ken Schwaber Title: "The Philosophy Book: Big Ideas Simply Explained" Author: DK Title: "The Ethics of Aristotle" Author: Aristotle (Translated by J.A.K. Thomson) Title: "Utilitarianism: On Liberty, Essay on Bentham" Author: John Stuart Mill

By Eduardo Mylonas da Silva

Part 3 of My OCP Journey: Practical Tips and Examples

This is the third and final post about my OCP-17 preparation. In part one, I explained how playing a human virtual machine and refreshing your mastery of arcane constructs is not pointless, even if the OCP doesn’t — and doesn’t claim to — make you a competent developer. In the second part, I showed you how intrinsic motivation keeps itself going without carrots or sticks, provided you can find ways to make your practice fun and memorable. It's time to share some of these examples and tips. Make It Quality Time But first some advice about logistics and time management. As with physical exercise, short and frequent trumps long and sporadic. It’s more effective and more likely to become a habit, like brushing your teeth. Choose the time of day when you are most energetic and productive. The early morning works best for me because I’m a morning person. And there is a satisfaction in getting the daily dose crossed off your to-do list, even when it doesn’t feel like a chore. Make a good balance between reading, practicing, and revising. Once you’ve worked through the entire textbook you will need to refresh much of the first few chapters. That’s okay. Keep revising them, doing a few questions from each chapter each day. You’ll get there slowly, but surely. Make It Practical and Productive Practice in the previous paragraph means writing novel code aimed to teach yourself a certain language construct. It’s about your productivity, so copying snippets from the book doesn’t count. If you’ve ever learned a foreign language the old-fashioned way, you will agree that cramming vocabulary and grammar rules does little for your oral skills. Only speaking can make you fluent, preferably among native speakers. It’s like swimming or playing the saxophone: you can’t learn it from a book. Never used the NIO2 API or primitive streams? Never done a comparison or binary search of arrays? Get your feet wet, preferably with autocomplete turned off. Better even, scribble in a plaintext editor and paste it into your IDE when you’re done. Understand the Why While Java shows its age, its evolution is managed carefully so new additions don’t feel as if they were haphazardly tacked on. Decisions take long to mature and were made for a reason. When the book doesn’t explain the reasoning behind a certain API peculiarity, try to explain it to yourself instead of parroting a rule. Here’s a case in point from the concurrency API. The submit() method of an executor has two overloaded versions for a Runnable or Callable argument. It returns a Future. The void execute() method only takes a Runnable, not a Callable. Why does that make good sense? Well, a Callable yields a value and can throw an Exception. Since execute() acts in a fire-and-forget fashion, the result of a Callable would be inaccessible, so it’s not supported. Conversely, submitting a Runnable with a void result is fine. Its Future returns null. The memory athletes from my previous post, who memorized random stacks of cards, have it much tougher than you and I. Learning Java is about memorizing a lot of facts, but they’re not random. Making a Visual Story The ancient Greeks taught us how to construct mental memory palaces to store random facts for easy retrieval. Joshua Foer added a moonwalking Albert Einstein to jog his memory. You should make your code samples equally fun and memorable. Here’s how to illustrate the fundamental differences between an ArrayList and a LinkedList. Imagine a movie theater with a fixed number of seats (the ArrayList) and a line of patrons (the LinkedList) at the ticket booth, who receive a numbered ticket. People arrive (offer(..) or add(..)) at the tail of the queue irregularly while every ten seconds the first person in the queue can enter the theater (poll(), element()) and is shown to their seat (seats.set(number, patron). Let’s add concurrency to the mix. Suppose there are two ticket booths, each with its own line, and a central ticket dispenser that increments a number. That’s right: getAndIncrement() in AtomicInteger to the rescue. I’d happily show you the code, but that wouldn’t teach you much. Or take access rights in class hierarchies. Subtypes may not impose stricter access rights or declare broader or new checked exceptions. Let’s put it less academically. Imagine a high rise with multiple company offices (classes) and several floors (packages). Private access is limited to employees of one company. Package access extends to offices on the same floor. Public access is everybody: other floors as well as external visitors. The proprietor provides a public bathroom that clearly shows when it’s occupied. You can dress it up with scented towels and music through a subclass, but you must obey this contract: public void visitRestRoom(Person p) throws OccupiedException { .. } Every outside visitor is welcome to use it. You are not allowed to restrict access to only employees on your floor (package access), much less your own employees (private access). Neither may you bother visitors with a PaymentExpectedException. It violates the contract. Code samples in the exam are meant to confuse you. Your own examples should do the exact opposite. You use real-life examples (a public office restroom, the queue outside a movie theater) and combine them in a way that is easy to visualize and fun to remember. Mnemonics Sometimes there’s nothing for it but to commit stuff to memory, like the types you can use as a switch variable (byte, int, char, short, String, enum, var). You can string them together in a mnemonic like this one: In one short intense soundbite, the stringy character enumerated the seven variables for switch. Or how about the methods that operate on the front of a queue (element, push, peek, pop, poll, and remove)? Elmer pushed to the front of the queue to get a peek at the pop star, but he was pulled out and removed. Yes, it’s far-fetched, silly, and outlandish. That’s what makes them memorable. To me at least. Or try your hand at light verse. The educational benefit may not be as strong for you as a reader, but the time I spent crafting it made sure I won’t quickly confuse Comparable and Comparator again. The Incomparable Sonnet You implement Comparable to sort(in java.lang: no reason to import).CompareTo runs through items with the aimto see if they are different or the same. If it returns a positive, it meantthat this was greater than the argument.For smaller ones a minus is supplied,a zero means the same, or “can’t decide”. Comparator looks similar, but bearin mind its logic is more self-contained.It has a single method called comparewhere difference of two args is ascertained.A range of default methods supplementyour lambdas. Chain them to your heart’s content! Some Closing Thoughts The aim of your practice is not to pass the exam as quickly as possible (or at all). It’s to become a more competent developer and have fun studying. I mentioned that there is some merit in playing human compiler, but that doesn’t mean that I fully agree with the OCP’s line of questioning in its current form and the emphasis on API details. Being able to write code from scratch with only your limited memory to save you is not a must-have skill for a developer in the coming decade. She will need to acquire new skills to counter the relentless progress of AI in the field. If I needed to assess you as a new joiner to our team, and you showed me a 90% OCP passing grade, I’d be seriously impressed and a little jealous, but I will still not be convinced that you’re a great developer until I see some of your work. You could still be terrible. And you can be a competent programmer and fail the exam. That’s where the OCP is so different from, say, a driving test. If you’re a bad driver you should not get a license, no exceptions. And if you fail the test, you’re not a great driver. Full disclosure: it took me four tries. If the original C language was a portable toolbox and Java 1.1 a toolshed, then Java 17 SE is a warehouse with many advanced power tools. The great thing is that you don’t have to wonder what all the buttons do. The instructions are clearly printed on the tools themselves through autocomplete and Javadoc. It makes sense to know what tools the warehouse stocks and when you should use them. But learning the instructions by heart? I can think of a better use of my time, energy, and memory.

By Jasper Sprengers CORE

Navigating Digital Assurance With a Scrum Master: Maximizing Quality in Agile Projects

The scrum master simplifies the scrum process, which includes assisting the team in expanding and improving their task management. They are responsible for providing emerging Scrum team members with real leadership and coordinating them to deliver high-quality products on time. So companies that are keen on providing digital assurance services can help their digital teams look into such processes. An Increase in Scrum Masters Working in Digital Assurance During the pandemic, every sector moved into a digital environment. During this phase, the scrum master played a vital role and delivered a high-quality product on time to the customer. Here we will see what challenges a scrum master faces in the world of digital assurance and how they bring out a quick fix to each day-to-day incident. Key Responsibilities of the Scrum Masters in Digital Assurance Facilitating communication: The Scrum Master acts as a mediator between the development team, product team, and digital assurance team, ensuring that all sides have a clear understanding of each other's needs and goals. This involves facilitating regular meetings between the three teams and ensuring that information is shared effectively. Encouraging collaboration: In an Agile environment, collaboration is key. The Scrum Master should encourage the development team and the digital assurance team to work together throughout the project, sharing knowledge and insights to ensure that the software meets quality standards. Managing the testing process: The Scrum Master is responsible for managing the testing process, including ensuring that tests are conducted in a timely and effective manner. They should also be aware of any issues or roadblocks that arise during testing and work with the development team to address them. Ensuring compliance: Digital assurance is essential for ensuring compliance with industry standards and regulations. The Scrum Master should work closely with the digital assurance team to ensure that all necessary compliance requirements are met. Continuously improving processes: Finally, the Scrum Master should work to continuously improve the digital assurance process, identifying areas for improvement and implementing new strategies and technologies to enhance quality and efficiency. Real-Life Challenges Faced by Scrum Masters in Digital Assurance Introducing Scrum to a Team When SCRUM is introduced to a new team, they face many challenges in adapting to the process. Individual members of a team will be in a different mindset; in this case, the scrum master will guide them and provide a proper roadmap. For instance, setting up meetings, understanding the procedure, assigning tasks to the team, and appreciating each other's work. Scrum Process can take an example from an old process called Shu Ha Ri, which is a Japanese martial art. Shu: This is the first phase of the agile process when a team is being formed or switching from waterfall to agile. Ha: They continue observing team ways of working. They would give clues and occasionally give advice, but they would mostly let the team self-evolve. Ri: The team is self-organized and self-evolving. If the team needs his or her advice, they will come to him, but SM is no longer observing the team. To better comprehend the steps, let's use the football team Gryffindor as a real-world example. Each member of a team will bring distinct skill sets and ways of thinking to the table. If a team is dispatched on the pitch without a coach, the team loses the game because there is an imbalance in the management of skills. Let's introduce a coach who serves as the scrum master in order to solve the aforementioned issue. a coach who is knowledgeable of each skill and directs players appropriately to achieve the team's objectives. Scrum Ceremony: Every member of the team will face questions and numerous obstacles when the aim is set. During this stage, some team members will work longer hours than others, which causes an uneven distribution of workloads. Sometimes teams are overworked, which leaves more tasks incomplete. A scrum process will have a list of meeting activities in the order in which they are to be taken up in order to avoid a gap between the team as a result of this. It must schedule daily and weekly general meetings (ex. Sprint Call). In my experience, there was a Digital Assurance issue. Since the crew interacted very little with one another, many of the problems were not appropriately tackled. Here, the scrum master encouraged everyone to ask questions and acquire confidence. People will eventually be able to conquer all of their obstacles and fears. They dealt with all the problems. Team Adapts to the Scrum (Mature Team) At this stage, the team offers suggestions for various implementation strategies. Transformation is more of a culture shift that involves well-balanced, unconventional thinking as well as delivery techniques. In order to achieve high-quality delivery, the scrum master gathers all of the ideas and works as a team to identify the finest ones. In Digital Assurance, a real-world Team adds innovative concepts or user-friendly elements to the product. This assists the client in putting their ideas into practice and bringing them into the digital realm. As a result, the application is as current and open as feasible. General Issues Faced by a Mature Team The team has general issues regarding the product, who shall do what work, etc. Sometimes there is a greater chance of bigger conflict in the team, which causes a lot of misunderstanding. In reference to the aforementioned problems, the expression "Too many cooks spoil the broth" is a perfect illustration. Because everyone is aware of the process at this point, they all attempt to assign tasks and overwhelm one another. To avoid controversy, the scrum master steps in and allocates the assignment to the team in accordance with their best judgment. Being productive in this situation is a challenge for the scrum master. Despite wanting to safeguard them, the team makes blunders. In conclusion, the Scrum Master is a key player in Agile development projects' digital assurance. He or she is a person who takes part in each stage of the process's execution. They are accountable for managing the testing process, guaranteeing compliance, and continuously improving processes, as well as promoting communication and collaboration between the development and digital assurance teams. Scrum masters can assure the success of Agile projects and the delivery of high-quality software by utilizing their experience.

By Zeppelin Bram

Tech Hiring: Trends, Predictions, and Strategies for Success

The tech industry has seen a significant change in the skills, qualifications, and titles listed in job postings over the past few years. What does that mean for companies — and for the candidates themselves? On this week’s episode of Dev Interrupted, we talk to Maryam Jahanshahi, co-founder and Head of R&D at Datapeople, who breaks down the biggest hiring trends in tech, from title inflation to salary transparency and the skyrocketing costs of recruitment. Maryam also discusses how the storytelling skills she picked up from data analysis have improved her abilities as a founder. Episode Highlights: (2:05) Introductions (6:13) Title inflation trend (11:00) Hiring trends: salary transparency (16:11) Bringing data to the recruiting process (21:55) How Datapeople is leveraging ML (27:30) AI job trends (31:30) The importance of storytelling (35:42) Maryam's advice for founders Episode Excerpt Conor Bronsdon: We are back on Dev Interrupted, I'm your host, Conor Bronsdon, and we're live from New York with another incredible guest. Welcome to the show, Miriam Jahanshahi. Maryam Jahanshahi: Thanks, Conor. I'm excited to be here. Conor Bronsdon: And I really love that you're here because we don't talk to data scientists that much, and you are not only the head of R&D and a data scientist at Datapeople, you are also a co-founder of that company. Maryam Jahanshahi: I am a co-founder. I also work very strongly with engineers, so I'm always up in their code and in their pull requests and all the fun side of things. I guess that's one of the things when you get to be a technical co-founder, you have to run the gamut of all the different things that you do. Conor Bronsdon: So yeah, it was fun talking to you as we were getting set up. And you mentioned you had this opposite journey where you really dove into the data side that were becoming like this strong data scientist, and then you realized you wanted to add these data engineering skills to the table. Maryam Jahanshahi: Yeah, it was a, it's an unusual experience. I think part of the reason why I had to do it was to figure out the systems that we needed to analyze data to get a data-driven product, and so my role now is such a weird mishmash. I was talking to my co-founder about it the other day. I'm neither like, nor do I run engineering, nor do I run like the data side of things. But it's almost a technical product manager. Conor Bronsdon: You're the fusion between the two of them. Maryam Jahanshahi: It's like a weird mix of many different things, and so we're realizing that requires a certain level of skills and different types of agility, and so it was easier for me to actually write my data pipelines then have them write the spec to give it to the engineers to do. It's, yeah, this isn't so bad. So we realized very early on, like with these systems you, I think increasingly as the tools as our data becomes bigger, we're gonna have new classes of product managers, including data-informed product management. I'm sure things like ChatGPT are bringing that to the fore. But it's not just that; it's anything that adds a level of analytics to your dashboards and things like that. You want someone who has a business interest but also is able to run the SQL query to figure out what the hell went wrong with that dashboard. And so it's an interesting transition that. I don't know whether I'm crazy for making it, but it's what the organization needs. Read the full episode transcript here.

By Conor Bronsdon

The SPACE Framework for Developer Productivity

Welcome to SPACE Developer productivity is a complex subject for which there is no magic bullet. However, economic pressure, increased market competition, and shorter delivery circles force many organizations to improve their efficiency and open up new models of operations. Measuring, maintaining, and eventually improving engineering productivity in an increasingly hybrid workplace are important discussions many organizations are having right now. As a result there are more and more companies investigating how to do more with the resources they have, how to remove bottlenecks in their processes and how to enable developers to be productive. Empirical evidence and understanding of productivity drivers are forming at the same time as some myths and misconceptions are getting debunked. One of the approaches that got a lot of attention is the SPACE framework. We give the background on it and explain some of its key concepts. Moreover, we give some additional examples of the application of SPACE in your organization. Introduction to the SPACE Framework “The SPACE of Developer Productivity” is a framework by authors from GitHub, the University of Victoria, and Microsoft that has gained attention because of its practical and multi-faceted approach. The authors debunk common productivity myths and misconceptions and then present drivers for developer productivity. They present those drivers structured as a holistic multi-dimensional model. Moreover, the authors show some example productivity metrics as well as counter-indicators. For convenience, we present a summary of the SPACE framework and give some assistance on how to utilize increasingly popular engineering intelligence to kick-start your SPACE tracking and reporting. Developer Productivity: Myths and Misconceptions Some misconceptions and myths are clarified by the authors of the SPACE Framework early on: One obvious myth is that (developer) productivity is not a one-dimensional metric. There is no single number that defines productivity and any approach that is too simplistic will provide little insights. Not only that but different organizations and different teams might be best served with a different set of KPIs to focus on. We will go into this a bit later. Software development is a team sport. As such, individual metrics are less relevant; in fact, can be counter-productive. What matters is the performance of teams or the organization as a whole. Also, we might add, not absolute numbers are essential but trends and the observability of countersignals highlighting problems to address. Outcomes are more important than output. While harder to quantify, the ability to ship customer-relevant features is obviously more important than just churning out code. As a result, pure activity metrics are not sufficient to make good productivity estimates. Having said all this, it is worth highlighting the value of measuring nonetheless: A good set of measurements can give insights into how the organization is performing, how it is trending, and which areas can improve. Moreover, key indicators are not only valuable to management, but if used correctly give a voice to engineering at the same time. Developers like to have evidence to show the value they bring to their teams and the organization. People generally like to show their worth and like to improve processes and themselves where they can. Having that evidence at your fingertips helps to improve self-worth and in turn, improve organizational productivity. The SPACE Framework Explained SPACE stands for Satisfaction, Performance, Activity, Communication, and Efficiency and reflects the multi-dimensional approach proposed by the creators. We summarize those in the following. The 5 SPACE dimensions are: Satisfaction and well-being: This dimension measures how satisfied development teams and individuals are with their tools, processes, and the work environment. For instance, are the right tools and resources in place to perform tasks efficiently? Are teams protected from overload or do individuals suffer from potential burnout? Is the management structure and environment supportive of growth and productivity? Performance: This relates to the actual outcomes created by teams and the absence of blockers to create those outcomes. For instance: What is the customer acceptance and satisfaction of features shipped? How did we improve on that over time? How did we improve our overall quality? Performance is closely related to organization and team performance and satisfaction to contribute to overall efficiency. Activity: As mentioned above, outcomes are preferable to output, but activity or output are often good proxy metrics that still enable some useful indicators – and especially counter indicators. This can be the release cadence, build system performance, or the number of incidences to manage. Simply speaking, do we get stuff done? And how did we improve over time? Communication and Collaboration: Effective collaboration and team cohesion has been shown as a significant contributing factor to developer productivity. For instance, brainstorming, collaboration, aligning on goals, and participating in outcomes improve productivity. Counter to these are scenarios where individuals work against each other, shift blame around, or feel abandoned by their management. Efficiency and Flow: It is one thing to produce valuable outcomes, but how efficient can we be in doing so? A key indicator is how much individuals and teams can be “in the flow” of performing their work; how much they can be free from blockers, interruptions or delays. The “flow” is something a lot of organizations are starting to pay attention to on a process or organizational level already. This shows up as a company’s Value Stream Metrics or DORA metrics. These are concepts that are often used to communicate with the executive team or even the board. Organizational Dimensions Lastly, the SPACE framework makes a distinction of where the productivity measures are taken and applied these are: Individuals: Helping individuals to feel more productive is important, but this is often best done by setting the right process, organizational, and team environments. As we have seen above, micro-managing individuals does not have the best impact and often does not produce the desired outcomes. Teams: Well-performing teams are at the core of well-performing organizations. Setting the right environment, context, and feedback loops for teams has been shown to improve productivity significantly. System: Improving processes, systems, and organizational metrics helps to improve the overall organization efficacy and deliver better outcomes to customers faster. These are high-level indicators that help to drive better performance across the board. This concludes the summary of key dimensions for the SPACE framework, but it is worthwhile to read the original article linked earlier. Next, we provide some examples augmented by our own experience that can easily be measured. SPACE Framework: Example Metrics Satisfaction For satisfaction and well-being, there are numerous ways to measure how things are going within the organizations. These can be explicit metrics such as: Results on NPS from formal or semi-formal surveys Quick emoji-like responses to survey emails, ticket support cases, or internal portal features: Retention rates of engineers and engineering managers These metrics require, however, dedicated effort and resources to implement, maintain and analyze/report on. While this is feasible in some organizations, this is often not the first step in moving to a SPACE process. Proxy metrics have been shown to be useful to equate with a certain level of satisfaction or rather the opposite, are indicators of frustration. Examples are: CI build failure rates and recovery times: Means; potential pain of waiting and uncertainty Code review cycles and review delays: The pain of context switching Bug numbers and issue fix times: The pain of customer dissatisfaction While proxy metrics are not suitable for more personalized sentiment analysis or sentiment around organization and management issues, they can highlight common triggers for frustration and dissatisfaction. Moreover, proxy metrics are often an easy starting point as they are a matter of mining existing data and do not require introducing new workflows or additional potentially distributing tasks. Performance The authors of the SPACE framework highlight a number of proxy metrics for performance around code reviews and related activities. This includes: Code review velocity/acceptance rates: How quickly/consistently are we delivering outcomes? Items shipped (epics/features/story points): How much do we get done? Reliability of infrastructure/product/build systems: Do we have infrastructure bottlenecks preventing us from performing? Again, while this does not necessarily give a complete picture all of the above metrics provide useful signals to gauge overall performance. Activity While performance measures the outcomes, activity is more focused on outputs. These are metrics that are typically easy to obtain, examples are: The number of code reviews completed Number of PRs done The number of issues/story points completed Time spent on development activities Deployment frequency Activity items are those that can often be accessed from data in your engineering tools and infrastructure. These metrics are especially useful when extracted continuously and automatically for reducing any friction and developer overhead, while at the same time being aggregated to teams or organization level for monitoring and trending. Communication/Collaboration Metrics in this category are a bit more open to interpretation and one needs to be careful when introducing any proxy metrics. While it is possible to detect negative signals, the converse is harder. A highly collaborative team is something that often cannot be determined by numbers alone and requires good personal management skills. Nonetheless, some proxy metrics that have shown to be beneficial are: Code review scores/number of reviewers/number of review cycles: Are reviews well distributed, include several active people per PR, and comment more than “LGTM”? Do numbers reflect a sense of collaboration and not blame shifting as it might be evident in long review cycles between the same people? PR cycle times: Are we efficient and work together well or are there any obvious blocking stages? Knowledge/review graphs: Is there a wider network of collaboration or do we have knowledge islands? Efficiency/Flow One of the key categories around developer productivity is the “flow” engineers are in, but also the flow enabled by supporting infrastructure and team processes. There are both positive and anti-signals that can be measured such as: PR velocity and trends Development cycle time Build times and reliability Blockers and delays in code reviews Aging of backlogs and ticket state changes Measuring flow, efficiencies, as well as blockers, is something that can be well approximated by hard data. Example Snapshot of SPACE Report in Logilica Summary Overall the SPACE framework introduces a multi-faceted approach to developer productivity. Looking both at some key dimensions of what productivity means, but also across individuals, teams, and the organization as a whole. Metrics to measure SPACE dimension can be direct or indirect through proxy data. The great thing is that many data points already exist in some shape or form in organizations and can be data mined. This can be done, for example, by in-house productivity engineering teams themselves or with the help of increasingly popular software engineering intelligence (SEI) platforms.

By Ralf Huuck

How Agile Works at Tesla [Video]

How Elon Musk Would Run Your Business With Joe Justice Joe Justice worked for Bill Gates, Jeff Bezos, and Elon Musk. In this hands-on Agile meetup, Joe shared DX, or Digital Transformation, the Agile operating system for TeslaSpeed — a term coined by the EU Commission to talk about how fast Tesla moves and how fast they need to move now. DX, or Digital Transformation Joe shared DX, or Digital Transformation, the agile operating system for TeslaSpeed — a term coined by the EU Commission to talk about how fast Tesla moves and how fast they need to move now. The 12-step DX process brings companies from where they are now toward their manifest destiny. Meet Joe Justice Joe Justice is a TED.com speaker and guest lecturer at MIT and Oxford University in England. He has been featured in Forbes five times to date, including as owner of a “Company to watch” by the Forbes Billionaire Club, cited in more than eight business paperbacks and hardcovers, and is the subject of a Discovery Channel documentary for his work creating the disciplines “Extreme Manufacturing,” “Agile Hardware,” and “The Justice Method.” Joe Justice founded WIKISPEED and operated Agile@Tesla from the global headquarters of Tesla‘s Fremont, California. Watch the recording of Joe Justice’s talk on How Elon Musk Would Run YOUR Business now: Q and A Session With Joe Justice During the Q and A session, Joe Justice answered the following questions from attendees: Who sets the rules for limiting autonomy? What patterns or principles did the Musk companies use to reduce the cost of change in hardware development? Regression testing and integration testing, I’d presume to be the first few mentioned, but what else? What is your view on Scrum or whatever framework at scale? Fundamentally, what considerations should you make once your organization can be expressed only in terms of the power of 3? What is your advice on bringing together teams who develop software components with other teams who develop hardware components of a common product (everything has to come together, but these teams have different working methods)? (Please note that the questions have been slightly edited to improve legibility.) Connect With Joe Justice Joe Justice’s Website

By Stefan Wolpers CORE

What to Pay Attention to as Automation Upends the Developer Experience

What a year to be a developer. As organizations rush to adopt more automated technologies driven by low code, generative AI, and other fast-moving innovations, developers accustomed to more traditional hardcoding practices will face increasing disruptions to set practices. But the transition will repay a willingness to change with significant dividends: developer automation promises superior efficiency, developer experience, and accelerated time-to-market with new application features and iterations. Capturing this automation opportunity will allow more developers to access powerful emerging technologies, eliminating requirements for specialized expertise and other cumbersome barriers of traditional manual coding. Backed by this new automation, any developer will be equipped to easily harness advanced AI, IoT, blockchain, big data, and other capabilities in their applications. Automation Efficiency and the Developer Experience Automation stands to transformatively improve developer experience, which will go a long way in winning wary developers over to the idea of accepting new practices. Leveraging low-code application development is fast becoming a standard route to automation for dev teams. With low-code development environments, developers leverage pre-packaged code modules, assembling them like building blocks within a drag-and-drop UI to build complete applications. With low code, developers can eliminate the tedious manual work common to traditional application development, replacing block-and-tackle tasks with a far more automated process. The full benefits of low code give developers a lot to like, including reduced frustration and errors, as well as greater speed, agility, and focus on the interesting feature development that developers get excited about. That superior automation-driven developer experience will give enterprises a leg up in recruiting and retaining developer talent as well, serving as a beacon to attract top prospects. Low-code automation can accelerate application development many times over compared to traditional methods (minimum 10x, in my experience, with the Interplay platform at Iterate.ai). That new efficiency frees developers to deliver application improvements more quickly, and to keep current with fast-shifting market needs. Thus, developer automation drives competitive differentiation on top of a superior developer experience. Low-code modules abstract capabilities derived from AI/ML, big data, IoT, voice, blockchain, and APIs, enabling developers with no specific hard-earned expertise to harness those technologies with plug-and-play simplicity. Currently, the most exciting development in automation technology is generative AI, which allows teams to expedite development processes, especially in tandem with low code. As an example, GitHub Copilot augments developer efforts with suggestions and assistance throughout the coding process and the ability to generate code automatically. Further generative AI tools like GPT-4, the conversation and search engine, can solve coding questions and deliver valuable developer training and support, even for developers leveraging low code. Developers' Roles in an Effective Automation Strategy Amid this positive talk about automation’s benefits, let’s be clear: the disruptive shift toward automation requires developers to accept significant changes. To unlock those productivity gains, developers must undergo their own transformations, building more valuable skillsets around automation to better use development and deployment tools, work with data and more. Enterprises will be required to evolve as well to keep pace competitively as automation sweeps development practices forward. In the scramble to complete digital transformations and harness new tools, hardware, data capabilities, and appropriate security, some organizations will thrive while others fall behind. That said, embracing automation and capitalizing on its competitive advantages and developer benefits is far, far better than the alternative of being left in the past. Approached strategically, developer automation ought to remove block-and-tackle application setup, error mitigation and other busywork from developers’ plates, instead enabling a focus on high-value work such as innovative feature development. An effective automation strategy should also anticipate rising data complexity, and meet it with investments in reliable data infrastructure and attention to data integrity. Robust data capabilities will serve as a strong foundation for AI/ML and related task automation peripheral to development, such as data entry and process handling. An automation strategy must also adapt to changing technologies and business priorities, to ensure continued access to valuable innovations and new processes. The right strategy will also enable developers to utilize automation and tooling across an expanding set of use cases and incorporate automation-driven features in their applications. Developer Automation and Data Accessibility Empowering developers to utilize automation within applications largely hinges on data access. Applications leveraging IoT, computer vision, and similar features must leverage vast quantities of data in real time. Achieving that data access means implementing infrastructure, organizational support and processes that enable efficient data collection, rapid growth, and scalability, and continuous optimization via feedback loops. Enterprises pursuing this data-driven application development have much to gain from low-code automation. Case in point: the world has only 60,000 trained data scientist engineers, and only 300,000 AI engineers. Organizations reliant on traditional hardcoding cannot achieve the competitive differentiating features they have their sights on, such as contextual responses to customer feedback, without first winning the competition to hire and retain these experts. In contrast, those that enlist low-code automation equip their existing development teams to fully harness advanced data, AI, and automation capabilities using abstracted code modules. The Automation-Backed Developer Slow and limit-bound traditional hardcoding is now overdue for replacement by automation that accelerates development to a rapid pace, transforms developer experience, and removes barriers to today’s most advanced technologies and data utilizations. Automation-backed developers will be expected to respond to market needs and deliver iterative application improvements and features at a rapid clip, and be fully empowered to do so.

By Shomron Jacob

Scaling Site Reliability Engineering (SRE) Teams the Right Way

Most SRE teams eventually reach a point in their existence where they appear unable to meet all the demands placed upon them. This is when these teams may need to scale. However, it’s important to understand that increasing team capacity is not the same as increasing the number of people on the team. Let’s unpack what scaling a team is all about, what are the indicators, what are steps you can take, and how you know if you’re done. Scaling Triggers Sometimes it is very easy to tell whether you need to scale your team or not. For example: When the team is assigned more services to manage, Traffic or users have significantly increased, or Service Level Objectives (SLOs) have become more demanding In the above situations, it is usually obvious that the team needs to scale. In other situations, the signs that you need to scale are more subtle and often ambiguous. Here are a few things that may be indicators that your team needs to scale: An increase in toil: A repetitive task that create no long-term value and need to be actively controlled. Automation, run books, and retrospectives all reduce toil. However, when a team is under pressure, it will have no slack to think about quality of life improvements like toil reduction. It will be constantly scrambling to maintain reliability and fulfil business objectives. A decrease in reliability or performance: Similar to toil, reliability and performance need to be actively managed. When teams are over stretched they often react to SLO breaches rather than proactively initiating performance or reliability projects. Improvement projects are delayed or canceled: Increase in toil, a decline in performance or reliability can be symptoms of a more general problem: neglecting long-term planning in favor of reacting to short-term issues. Another symptom of this is when any kind of improvement project is de-prioritized in favor of feature development. Decline in the team’s morale: People in teams that need to scale are usually overloaded, stressed, and close to burn out. This, in fact, is the number one reason to scale your team since losing people is among the most difficult problems to recover from. All of these indicators are not conclusive and can have other causes. You need to be sure that you are solving the correct problem. It can be very tempting to see manpower as a blanket solution for all problems, but it can worsen the problem and leave you with a trickier problem of scaling down. Adding people to your team should be the last thing you do after exhausting all other options. This is not only more prudent financially, but it also ensures that you are not ignoring problems that could become more difficult to address over time. When thinking about any technical initiative, it is useful to break it down using the People-Process-Tools model. This assumes that the most important factors that impact an initiative, in order of importance, are People, Processes, and Tools. Let’s look at them in chronological order. Process Before starting a scaling effort, you should know what metrics you are trying to improve and how you should be measuring them. It is an engineering axiom that you can’t optimize what you’re not measuring. The exact metrics to look at will vary from team to team and from situation to situation, but here are a few to start with: Actual performance against SLOs Project metrics: 80th percentile wait time 80th percentile cycle time Average daily queue size Mean time to acknowledge (MTTA) Once you are measuring your key metrics, institute a process to frequently evaluate your performance on those metrics. It might be as simple as taking a few minutes in every sprint retrospective for this purpose. Don’t underestimate the value of processes to help you scale. Many smaller teams often use simplistic, ad-hoc processes. Engineers often dismiss processes as undesirable overhead. This misses the raison d'etre of processes to reduce error and improve efficiency. Management Processes Toil Limits ensure that toil reduction tasks are prioritized. Postmortems identify measures to prevent the repetition of incidents. Agile methods like Kanban ensure management processes themselves are efficient. Reports like finger charts can help identify bottlenecks. Engineering Processes Alert Noise Reduction quietens noisy alerts and prioritizes them. This reduces the effort needed to manage incidents. Alert Routing ensures that only the appropriate people are notified about incidents. Automation reduces toil and errors. Pairing aids knowledge transfer and reduces errors. Infrastructure as Code improves repeatability and reduces errors. Tools The subject of SRE tools is pretty vast — too large for this article. So rather than going into a potentially lengthy discussion of specific tools, let's discuss how to think about tools in the context of scaling. Different kinds of tools have different kinds of scaling impacts. It is important to have hard data that indicates what kind of improvement is necessary. This data may be in your project management or trouble ticket system, but more often than not you will need to get feedback from your team. In general, there are a few kinds of results that you should expect from the tools that your team is using: Tools that Help You Handle More Load With the Same Team This could be anything from pssh to ansible that helps you handle large fleets of servers, VMs, or containers. Modern monitoring tools not only perform better at scale, but they are also often easier to configure too. Incident management tools like Squadcast prioritize and deduplicate incidents allowing engineers to focus on critical tasks. Tools that Reduce Rework by Reducing Errors Script libraries, runbooks, and runbook automation systems all facilitate task repeatability — allowing tasks to be executed reliably as frequently as needed. Using containers to implement immutable servers ensures that subtle errors caused by config drift are avoided. Tools that Eliminate Certain Kinds of Work Container orchestration systems like Kubernetes eliminate huge swathes of work — everything from setting up process supervisors like supervisors for managing load balancers. Distributed tracing systems like OpenTelementry reduce the need for complex log aggregation systems to track transactions through distributed systems. Tools that Help Delegate Work Tools like RunDeck allow secure, guard-railed, role-based access to scripts. This allows dependent teams like developers or customer support to work independently without adding to the SRE workload. Similarly, tools like Metabase, Kibana, and Grafana can be used to provide self-service access to production data, logs or metrics to product management, customer support, or Management. Providing senior management with the ability to answer their own questions is a particularly powerful way to reduce a lot of high-priority, low value-add effort. There Are No Silver Bullets Avoid the idea that tools are a panacea. Introducing new tools can be financially burdensome and disruptive. If introduced unwisely they can easily make your team worse off. This is why a clear cost-benefit analysis is necessary before investing in new tools. People Once you have exhausted all other options to increase your team’s capacity you then have to start adding people to your team. Capacity Planning Capacity planning is more an art than a science, requiring a combination of hard data and judgment calls. There is no sure-fire method to build the perfect capacity plan. But here are some tips: Use data about your existing load to make projections. This can be in ideal man hours or story points. Relate that to the services under management. You should be able to say something like, “Adding another microservice will add about 50 hours of project work per quarter” or “We currently have 80 story points of demand every sprint versus 60 points of capacity.” You have to be able to approximately quantify and reason about the current and projected loads. Factor in the relative productivity and cost of seniors vs. juniors. Juniors often take longer on tasks than seniors. Seniors often have other responsibilities like code reviews, mentoring, or interviews. As with load, you should be able to quantify and reason about capacity. High utilization, defined as the ratio of task hours to available working hours, is not a good measure of efficiency. Less slack time implies fewer creative hours for innovation and improvement. It’s also likely to lead to frustration and burnout. Try to plan for 30% slack. While it might be a good idea to plug all these numbers into a spreadsheet to make your projections, do not lose sight of the fact that these are only rough approximations of reality. Ensure that you are conservative in capacity projections and liberal in demand projections. Add buffers liberally. It’s always better to end up with slightly more capacity than you need than slightly less. Team Composition There are a couple of major factors to consider when planning the composition of your team: Experience: Balancing out the experience mix of your team requires a set of trade-offs. In general, we can bucket people into juniors, intermediates, and seniors. The definition of these buckets in terms of years of experience and capability will vary depending on your local labor market, tech stack, and business domain. Somebody with 10 years of experience managing Go microservices might be considered senior, but the similar experience on nuclear power station systems may be junior. Juniors are less expensive and less productive while seniors are the opposite. So why not staff completely with that happy medium — intermediates? This idea ignores the special value that both seniors and juniors add. Seniors’ experience allows them to quickly solve problems without reinventing the wheel and, more importantly, teach others while doing it. Juniors are future intermediates who don’t need to be un-trained on bad habits picked up elsewhere. The best compromise is to build your team around a core of intermediates, with a small number of juniors and seniors to round it off. A proportion of 20:60:20 of juniors, intermediates, and seniors might be a goal to strive for. Diversity: Even if you ignore the moral imperative to support groups that have historically been discriminated against, there are good operational reasons to seek diversity in your team. Multiple perspectives contribute to greater creativity and innovation. There’s also some anecdotal evidence that diverse teams are better behaved and more professional than the testosterone-fuelled boys clubs that non diverse teams can occasionally become. Culture Fit: “Cultural fit” has often been a tool of convenience to exclude those who don’t conform to a preconceived notion of what an engineer should be. In my book there is only one fundamental purpose of a cultural fit check and that is to exclude jerks. Nothing saps a team’s productivity like a negative individual who constantly creates petty conflicts or belittles team mates. It’s important to filter out jerks during the recruitment process itself and to get rid of them quickly if identified later. Don’t give high performing jerks a pass — their productivity rarely makes up for the drop in performance they create in the team. Candidate Sources Where can you hire from? One good way is to poach them from elsewhere in your company. They’re often a known quantity and usually much cheaper than external hires. Many traditional organizations have System Administration, Build or DevOps Teams that have people who would make good SREs. Software developers can bring engineering rigor to the team. Usually, though, internal hiring would just move the scaling problem to another team. The most effective candidate sourcing mechanisms vary from place to place but here are some important ones: Employee referrals Recruitment consultants Job boards Advertising Careers page on your website. In general, employee referrals are cheaper and have a better hit rate than all other mechanisms because they are pre-filtered by the employee. Ensure that you have rewards and incentives to encourage them. Increasing capacity via hiring is time consuming and fraught with uncertainty. Ideally, you should start months in advance of the projected growth. Unfortunately most of us don’t have that luxury, so it is critical that you have contingency plans in place to handle hiring delays. Conclusion Scaling SRE teams is a challenging exercise that requires extensive analysis and planning. Adding people is slow, expensive, and risky, so consider process or technology improvements to tide you over. When you start hiring it pays to use plan capacity requirements with data rather than gut instinct. Be thoughtful about the composition of your team, as it can be critical to long-term success.

By Biju Chacko

SRE vs. DevOps

This is a question that I hear on a fairly regular basis, not just internally but from external customers as well. So it’s one that I would like to help you walk through so that you can really figure out what makes sense in your organization, and I think the answer is probably going to surprise you a little bit. I think probably the most important thing to understand is this isn’t a versus question. You don’t have to have one or the other. As a matter of fact, I would argue, and I think that many people would agree, that SRE is actually an essential component of DevOps, and a good, properly implemented DevOps method leads to the necessity of SRE when it comes to deployment. So there are two sides to the same coin, so that will obviously lead to a little bit of confusion because DevOps is the development methodology; it’s all about integrating your development teams and your operations teams. It’s about knocking down those silos between them. It’s about ensuring that everybody is singing the same songbook, and that’s very important. SRE is in charge of automating all of the things and making sure that you never go down. Two sides of the same coin There are really two parts of the same group, so let’s look at the differences because they do have some differences. Probably the first and largest one is that when we think about our DevOps.The DevOps guys, particularly your developers, are doing the Core Development. They are answering the question, “What do we want to do?” they are working with product, they’re working with sales, they’re working with marketing to develop the design and deploy. What is it that we do? They’re working on the core. On the other hand, SRE is not working on the Core Development. What they are working on is the implementation of the core, they are working on the deployment, and they are constantly giving feedback back to that core development group to say, “Hey, something that you guys have designed isn’t working exactly the way that you think that it is” If you want to think about it this way DevOps is trying to develop. SRE is saying how we deploy and maintain and run to solve this problem. It’s theoretical versus practical. Ideally, they’re talking to each other every day because SRE should be logging defects; they should be logging tickets back with development. Still, probably most importantly, they need to understand that they have the same goals. These groups should never be aligned against one another. And so, they do have to have a common understanding. Let’s see about the most important part; we’re going to talk about failure because failure is not necessarily failure; it’s just a way of life. It doesn’t matter what you deploy. It doesn’t matter how well it goes; it will happen. There is a failure budget or an error budget where things will go wrong. SRE team, when it comes to failure, they’re going to anticipate it, they’re going to monitor it, they’re going to log it, they’re going to record everything, and ideally, they can identify a failure before it happens. They’re going to have predictive analytics that will say, “All right, this thing is going to go bad based on what we’ve seen before.” So, SRE is responsible for mitigating some of those failures through monitoring, logging, and doing the preemptive parts. So we’ll do the monitors, we’ll do the logs. SRE is also going to lead all of your post-actual failure incident management. They’re going to get you through the incident, to begin with, and then they’re going to hot wash it, and when it’s done, you have to get Dev online because these are the guys who are going to solve the core problem; some RCAs might be solved by SRE internally. Then SRE team will integrate the fix into their monitoring and their logging efforts to make sure that we don’t get into another RCA for the same kind of problem. There are different skill sets. Core development DevOps, these are the guys that really love writing software. SRE is a little bit more of an investigative mindset, right? You have to be willing to go and do that analysis, figure out what things have gone wrong, and automate everything. But there’s a lot that they have in common. Everyone should be writing automation; everyone should get rid of toil as much as possible because we just don’t have the time to do manual tasks. When we can put the computers in charge of it, computers are not great at thinking on their own, but if you need it to do the same thing repeatedly, you can’t beat computing for that. And so, automation is key; you have a slightly different mindset. DevOps is going to automate deployment; they’re going to automate tasks; they’re going to automate features. SRE will automate redundancy and manual tasks that they can turn into programmatic tasks to keep the stack up.

By Pradeep Gopalgowda