Testing, Deployment, and Maintenance Resources

DZone's Featured Testing, Deployment, and Maintenance Resources

Kubernetes in the Enterprise

In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.

Mastering Go-Templates in Ansible With Jinja2

By Ruslan Kh.

The Power of Ansible, Jinja2, and Go-Templates in Streamlining DevOps Workflow Streamlining DevOps workflows with Go service templates can be an intimidating task, especially when dealing with YAML files that contain Go-inspired string interpolation, as seen in tools like Prometheus, Loki, and Promtail. However, when these files are managed with Ansible, a versatile IT automation tool, and Jinja2, a powerful Python templating engine, these complex tasks can be made easier and more efficient for DevOps professionals. The Challenge: Distinguishing Between Ansible Variables and Go Template Expressions A common challenge when using Ansible is distinguishing between Ansible variables and Go template expressions within YAML files. It's critical to generate a YAML file that includes literal {{ mustaches } that Ansible should not interpolate. This is where Jinja2's {% raw %}{% endraw %} tags come into play, telling Ansible not to treat the enclosed text as Jinja2 code, thus avoiding variable interpolation. The Solution: Leveraging Jinja2's {% raw %}{% endraw %} Tags When working with YAML files, especially those that contain Go-Templates such as Prometheus, Loki, and Promtail, you may encounter a syntax error when Ansible processes the template if you do not use {% raw %}{% endraw %}. This error occurs because Ansible uses Jinja2 to manage variables within templates. If your template contains expressions that Jinja2 interprets as variables or control structures (such as the double curly braces used in Go-Templates), Ansible will attempt to process them, causing an error because these expressions do not match the expected Jinja2 format. Shell $ ansible-playbook roles/promtail.yaml -l testing-host An exception occurred during task execution. To see the full traceback, use -vvv. The error was: . unexpected '.' fatal: [testing-host]: FAILED! => { "changed": false, "msg": "AnsibleError: template error while templating string: unexpected '.'. String: {{ansible_managed | comment(decoration='# ')} --- ... pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - template: source: environment template: '{{ if .Value }{{ .Value }{{ else }default{{ end }' - labels: timestamp: time environment: level: . unexpected '.'"} Common Errors: The Consequences of Not Using {% raw %}{% endraw %} in Ansible Jinja2 Templates The power of Jinja2 lies in its ability to handle these potential errors. By using the {% raw %}{% endraw %} tags, anything enclosed is considered a literal string and will not be processed by Jinja2. This feature is very effective when dealing with Ansible configurations that contain Go templates. Practical Example: Using {% raw %}{% endraw %} Tags in a Go-template with Ansible and Jinja2 To illustrate, consider the following example: /etc/promtail/config.yaml YAML pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - match: selector: '{job="exporters"}' stages: - regex: source: service expression: .+-(?P<environment>(stable|testing|unstable)$) - labels: environment: - template: source: environment template: '{{ if .Value }{{ .Value }{{ else }default{{ end }' Ansible - roles/promtail/templates/promtail.yaml.j2 YAML pipeline_stages: - logfmt: mapping: timestamp: time level: - match: ... - match: selector: '{job="exporters"}' stages: - regex: source: service expression: .+-(?P<environment>(stable|testing|unstable)$) - labels: environment: - template: source: environment template: '{% raw %}{{ if .Value }{{ .Value }{{ else }default{{ end }{% endraw %}' In the example above, the {% raw %}{% endraw %} tags are used to prevent Jinja2 from processing the Go template within the template field. This ensures that Ansible treats the content between these tags as a literal string, avoiding potential syntax errors. Conclusion: The Benefits of Jinja2 and Ansible in Managing Go-Templates By using Jinja2 and Ansible, DevOps professionals can easily manage configurations that include Go-Templates, improving their workflow and operational efficiency. However, as with all software tools and practices, it's important to keep exploring and learning to stay abreast of best practices and newer, more efficient methodologies. More

Refcard #368

Getting Started With OpenTelemetry

By Joana Carvalho CORE

Why I Prefer Trunk-Based Development

By Trisha Gee

A Comprehensive Guide To Testing and Debugging AWS Lambda Functions

By Satrajit Basu CORE

Observability Architecture: Financial Payments Common Observability Elements

Cloud-native technology has been changing the way payment services are architected. In 2020, I presented a series of insights from real implementations adopting open-source and cloud-native technology to modernize payment services. The series consisted of six articles and covered architectural diagrams from logical and schematic to detailed views of the various use cases uncovered. The architectures presented were based on open-source cloud-native technologies, such as containers, microservices, and a Kubernetes-based container platform. The major omission in this series was to avoid discussing any aspect of cloud-native observability. This series will take a look at fixing that omission with an open-source standards-based cloud-native observability platform that helps DevOps teams control the speed, scale, and complexity of a cloud-native world for their financial payments architecture. The introductory article (part one, linked in the "Series Overview" section at the conclusion of this post) covered the baseline architecture, defined the payments project, and shared the planning for this series in adding observability to the logical and physical architectures. In this article, we'll explore the logical diagram that captures the elements of a successful payment solution. Background of Generic Architectures Before diving into the common elements, please understand that this is a collection of identified elements uncovered in multiple working implementations. These elements presented here are, then, the generic common architectural elements identified and raised up to a generic architecture. The intent was to provide architectural guidance and not in-depth technical details. The assumption is that you're smart and can figure out adapting it to your own architecture. You're more than capable of slotting in the technologies and components you've committed to in the past where applicable. The goal here is to describe generic components and outline a few specific cases enabling you to make the right choices when applying to your own architecture. Feel free to comment or contact me directly with your feedback. In the article "Payments Architecture - Common Architecture Elements," I toured the generic architecture and outlined the common elements of payments architecture. In this article, I'll focus on introducing the logical view of only the component layers where I'm adding cloud-native observability elements to the solution. Container Platform In this section, I want to focus on the new addition of cloud-native observability components in the container platform logical view. Modern financial organizations are not only modernizing payments offerings with the use of cloud-native technologies and containers, but they are also finding out that cloud-native observability can be challenging for both their engineering and observability teams. They are not only building, testing, and deploying, but also having to contend with cloud-native complexity, data cardinality explosions, security, and more as they attempt to keep their solutions both running and cost-effective. Within this view of the container platform, you find an addition - a Chronosphere Collector element - that ensures your microservices, applications, APIs, integration points, caching, and other services are providing their metrics and tracing data to your observability platform. Using open source standards and protocols from the CNCF projects Prometheus and OpenTelemetry, it's collecting telemetry data and metrics data, and routing them to an external Chronosphere observability platform. This provides for a very easy transition for organizations that might have started their cloud-native observability journey using open-source projects and standards. In upcoming articles in this series, I'll share the specific ways you can deploy and leverage the Chronosphere Collector element in a container environment. I'll also present several specific use cases and provide schematic diagrams that detail the physical architectures for those use cases. External Systems The elements found in the external systems capture the various regional or local needs for a payments solution. Many are not under the full hosted control of the financial organization, and this is where you find the SaaS solution for your observability needs. Many organizations start off their cloud-native observability journey with do-it-yourself (DIY) solutions that, over time due to the payments solution success, grow into a resource burden in management, infrastructure, and observability data complexity. The managed Chronosphere observability platform allows you to host your cloud-native observability needs, unplugging from your DIY infrastructure, and redirecting your open-source standards ingestion of metrics and telemetry data. Your engineering and observability teams are still using the same open-source tooling, query languages, and visualization that they are well versed in from their experiences with CNCF projects such as Prometheus and OpenTelemetry. Using the Payments Project The architecture collection provided insights into all manner of use cases and industries researched between 2019-2022. The architectures each provide a collection of images for each diagram element as well as the entire project as a whole, for you to make use of as you see fit. If we look at the financial payments project, you'll see a table of contents allowing you to jump directly to the topic or use case that interests you the most. You can also just scroll down through each section and explore at your leisure. Each section provides a short description covering what you see, how it's important to the specific payments topic listed, and a walk through the diagram(s) presented. You can download any of the images to make use in any way you like. At the bottom of the page, you will find a section titled Download Diagrams. If you click on the Open Diagrams link, it will open all the available logical, schematic, and detailed diagrams in the diagram tooling we used to create them. This makes them readily available for any modifications you might see fit to make for use in your own architectures, so feel free to use and modify them! Finally, there is a free online beginners guide workshop available focused on using diagram tooling, please explore to learn tips and tricks from the experts. Series Overview The following overview of this o11y architecture series on adding cloud-native observability to your financial payments architecture can be found here: Financial payments introduction Financial payments common observability elements (this article) Adding observability to immediate payments example Adding observability to financial calculations example Next in this series, adding cloud-native observability to an immediate payments example.

By Eric D. Schabell

The Art of CI/CD Optimization: Mastering Techniques for Workflow Efficiency

Organizations must optimize their workflows and processes to truly harness the power of CI/CD. This blog will explore various techniques and best practices for optimizing your CI/CD workflow, ensuring maximum efficiency and productivity. In today's highly disruptive marketplace, organizations strive to deliver high-quality software quickly and securely. This drive has given rise to the Continuous Integration/Continuous Deployment (CI/CD) approach, which automates the process of building, testing, and deploying applications. CI/CD has become a critical component of modern software development practice, enabling teams to iterate rapidly, enhance collaboration and reduce time to market. However, merely implementing a CI/CD pipeline is not enough. Organizations must optimize their workflows and processes to harness the power of CI/CD truly. This blog will explore various techniques and best practices for optimizing your CI/CD workflow, ensuring maximum efficiency and productivity. From automated testing to infrastructure as code, we will delve into the key strategies that can supercharge your CI/CD pipeline. By implementing these techniques, you can reduce errors, enhance scalability and accelerate the delivery of your software projects. So, let's embark on this journey to uncover the best techniques for optimizing your CI/CD workflow and unlock the full potential of your development process. Cutting-Edge Techniques to Optimize Your CI/CD Workflow Continuous Integration/Continuous Deployment (CI/CD) is a software development practice that aims to automate the process of building, testing, and deploying applications. Driving efficiency in CI/CD involves optimizing and streamlining these processes to minimize errors, reduce time to market and improve overall productivity. Here are some techniques to drive efficiency in CI/CD: Automated Testing Implement a comprehensive suite of automated tests, including unit, integration, and end-to-end tests. These tests should be executed automatically during the CI/CD pipeline to identify and fix issues early in the development cycle. Parallel Processing Break down the CI/CD pipeline into smaller, independent stages that can be executed in parallel. This approach reduces the overall pipeline execution time and allows multiple development teams to work concurrently without waiting for each other's changes. Infrastructure as Code (IaC) Use infrastructure automation tools such as Terraform or AWS CloudFormation to define and provision the required infrastructure resources for each environment. This ensures consistency and reproducibility across different stages of the pipeline. Continuous Monitoring Implement monitoring and alerting mechanisms to track the performance and health of applications in real-time. This helps identify and address issues promptly, reducing downtime and improving the overall quality of the deployed software. Incremental Deployment Instead of deploying the entire application at once, consider deploying smaller incremental changes. This approach reduces the risk of introducing bugs and makes it easier to pinpoint the cause of issues if they arise. Configuration Management Utilize configuration management tools like Ansible or Puppet to manage and version control the application's configuration. This enables consistent and reproducible deployments across different environments. Deployment Pipelines Set up multiple deployment pipelines tailored for different scenarios: development, testing, staging, and production. Each pipeline can have different levels of automation and strictness with respect to quality gates and approvals. Containerization and Orchestration Utilize containerization technologies like Docker and container orchestration platforms like Kubernetes to create portable and scalable application deployments. Containers enable consistent environments and facilitate rapid deployment & scaling. Pull-Request Environments Pull-request environments are temporary, isolated environments created for testing and reviewing code changes within the context of a pull request. These environments provide a way to deploy and run the modified code in an environment that closely resembles the production environment. GitOps GitOps automates the deployment process by continuously monitoring the Git repository for changes. When changes are detected, GitOps tools, such as Flux or Argo CD, automatically reconcile the desired state of the infrastructure with the current state. This eliminates the need for manual intervention and reduces the time and effort required for deployments. Multi-Service Environment With multi-service environments, each service can have its own versioning and deployment process. This means that updates, bug fixes, or new features can be rolled out to specific services without affecting the entire system. Teams can release and deploy services independently, leading to faster iterations and the ability to deliver new functionalities more frequently. By implementing these techniques along with a reliable and secured DevOps tool and DevOps services, you can significantly improve the efficiency, reliability, and speed of your CI/CD workflows, leading to more frequent and successful software releases. Concluding Remarks Optimizing your CI/CD (Continuous Integration/Continuous Deployment) workflow is essential for streamlining software development and delivery processes. By implementing the right techniques, you can significantly enhance the efficiency, quality, and speed of your software releases. Incorporating these techniques into your CI/CD workflow empowers your development team to deliver software more rapidly with higher quality and reliability. By automating CI/CD processes, enabling collaboration, and leveraging best practices, you can achieve faster time-to-market, improved customer satisfaction, and a competitive edge in today's highly competitive software development market space. Continuous improvement and adaptation are key to sustaining an optimized CI/CD workflow, so stay open to embracing new tools, methodologies, and advancements in the field.

By Ruchita Varma

Accelerate Development and Deployment With IBM Cloud Toolchain and Event Notifications

In the rapidly evolving digital landscape, organizations face the challenge of delivering software applications and services swiftly and with impeccable quality. To meet this demand, companies embrace modern DevOps practices such as continuous integration, continuous delivery (CI/CD), and cloud-native architectures. IBM Cloud provides a robust set of services designed to bolster these practices, with the IBM Cloud Event Notifications service standing out as a powerful tool. In this blog, we will delve into the seamless integration of Toolchain as a source with IBM Cloud Event Notifications. By harnessing this integration, you can supercharge your development and deployment processes. Discover how to streamline your workflows, enhance collaboration, and receive timely notifications to stay in sync with the progress of your pipelines. Join us on this journey and unlock the full potential of IBM Cloud Toolchain and Event Notifications for accelerated application delivery. What Are IBM Cloud Event Notifications? IBM Cloud Event Notifications is a fully managed event management service that provides a scalable and reliable way to publish, route, and consume events within the IBM Cloud ecosystem. It enables real-time event-driven architectures, allowing applications and services to react and respond to events as they occur. With Event Notifications, you can build event-driven workflows, trigger actions, and integrate with other IBM Cloud services seamlessly. IBM Cloud Toolchain as a Source: IBM Cloud Toolchain is a set of tools, services, and practices provided by IBM Cloud that allows you to get notified about the status of your pipelines. By integrating IBM Cloud Toolchain with IBM Cloud Event Notifications, you can receive notifications when your pipeline run starts, fails, gets canceled, or encounters errors. Additionally, you can trigger child pipelines based on specific events, enabling you to automate and streamline your pipeline workflows. This integration empowers you to stay informed about the progress of your pipelines and take proactive actions when necessary. Integration Steps Step 1: Create an IBM Cloud Event Notifications Service Instance Log in to your IBM Cloud account. In the IBM Cloud catalog, search Event Notifications > Event Notifications. Select a Region from the list of supported region and Select a pricing plan. Provide a Service name. Select a resource group. Click Create. Step 2: Connecting to Event Notifications in the CD Toolchain In the Toolchain, click Add tool integration. Search for Event Notifications and create integration. 3. An IAM authorization is required between the toolchain and the Event Notifications service. Hence, create an IAM authorization by selecting the source service as the toolchain and the target service as event notifications. Step 3: Verify the CD Toolchain Source in IBM Cloud Event Notifications 1. Click the menu icon > Resource list. 2. Open Services and software. 3. Open the IBM Cloud Event Notifications instance you created. 4. Click Sources. When you connect to Event Notifications in the Toolchain UI, a source is automatically added to your IBM Cloud Event Notifications Sources list. Step 4: Create an IBM Cloud Event Notifications Destination In this step, you will make sure that an email destination exists where notifications will be forwarded. Click Destinations. Notice in the Destinations list that, by default, there is an IBM Cloud Email service defined. You do not need to do anything else to configure an email destination. Note: If you wanted to add a webhook as a destination, you would click Add and provide the appropriate information in the Add a destination panel. Step 5: Create an IBM Cloud Event Notifications Topic Next, you will define an IBM Cloud Event Notifications topic that will receive an event from Toolchain pipeline failure. 1. Click Topics. 2. Click Create. The Topic details panel opens. 3. In the Topic details, enter the following: Enter the Name of your topic — for example, PipelineFailed. For Source, select the IBM Cloud Event Notifications source, which is the Toolchain. Select an Event Type. For this tutorial, select Pipeline. Select an Event subtype. For this tutorial, select pipeline failures. 4. Click Add a condition. (If you do not click Add a condition before you click Create, the topic will be created with no conditions associated with it.) 5. Click Create. Your topic will be displayed in the Topics list. Step 6: Create an IBM Cloud Event Notifications Email Subscription In this step, you will configure who will receive an email when a notification is processed. Click Subscriptions. Click Create. The Create a Subscription panel opens. In the Create a Subscription panel enter the following: Enter the Name of your subscription. For example, VulnerabilityAdvisor. For Topic, select the topic you created. For example, VulnerabilityScan. For Destination, select IBM Cloud Email service. For Recipients, enter a valid email address; for example. 4. Click Create. Your subscription will be added to the Subscriptions list. You should start receiving email notifications at the email address that you configured whenever the criteria defined in both Toolchain and IBM Cloud Event Notifications match. Conclusion The integration of IBM Cloud Toolchain and Event Notifications offers organizations the opportunity to accelerate development and deployment processes in the rapidly evolving digital landscape. By adopting modern DevOps practices such as continuous integration, continuous delivery (CI/CD), and cloud-native architectures, companies can leverage the robust set of services provided by IBM Cloud. The IBM Cloud Event Notifications service, in particular, stands out as a powerful tool that enables streamlined workflows, enhanced collaboration, and timely notifications. By harnessing the full potential of IBM Cloud Toolchain and Event Notifications, organizations can achieve accelerated application delivery and maintain impeccable quality in their software applications and services.

By Pradeep Gopalgowda

GitLab Pages Preview

When I write Apache APISIX-related blog posts, I want my colleagues to review them first. However, it's my blog, and since I mix personal and business posts, I want to keep them from the repository. I need a preview accessible only to a few, something like Vercel's preview. I'm using GitLab Pages, and there's no such out-of-the-box feature. I tried two methods: GitHub gists and PDFs. Both have issues. Gists don't display as nicely as the final page. I tried to improve the situation by using DocGist. It's an improvement, even if not the panacea. Moreover, gists don't display images since I write my posts in Asciidoc. I've to set images in comments, and it breaks the flow. I've tried to attach the images to the gist, but they don't appear in the flow of the post in any case. The pro over comments is that they are ordered; the con is that I need to change the Asciidoc. I used gists because I'm used to GitHub reviews. But since it's my blog, I neither need nor want the same kind of reviews as in a regular Merge Request. I need people to point me to when something needs to be clarified or I missed a logical jump, not that I made a typo (I use Grammarly for this). For this reason, a PDF export of a post is enough to review. However, PDFs have issues on their own: a web "page" is potentially endless, while a regular PDF page cuts the former into standard pages. Splits can happen across diagrams. Besides, PDFs make distribution much harder. In this post, I'll describe how I configured GitLab Pages to get the preview I want. Summary of GitLab Pages GitLab Pages are akin to GitHub Pages: With GitLab Pages, you can publish static websites directly from a repository in GitLab. To publish a website with Pages, you can use any static site generator, like Gatsby, Jekyll, Hugo, Middleman, Harp, Hexo, or Brunch. You can also publish any website written directly in plain HTML, CSS, and JavaScript. -- GitLab Pages That's how you see this blog post. I found no preview feature for GitLab Pages. I asked experts to no avail; GitLab doesn't offer previews. Laying Out the Work I didn't believe it initially, but you only need to create a dedicated artifact. Since the artifact consists of web files, the browser will render them. The idea is to create such an artifact, accessible under an URL, which cannot be easily predicted. I can then share the URL with my colleagues and ask for their review. To start with, we can copy the existing build on master: YAML stages: - preview preview: stage: preview image: name: registry.gitlab.com/nfrankel/nfrankel.gitlab.io:latest before_script: cd /builds/nfrankel/nfrankel.gitlab.io script: bundle exec jekyll b --future -t --config _config.yml -d public artifacts: paths: - public only: refs: - preview variables: JEKYLL_ENV: production At this point, the site is available at https://$CI_PROJECT_NAMESPACE.gitlab.io/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/public/index.html. Many issues need fixing, though. Making It Work Let's fix the issues by order of importance. Fixing Access Permissions The project is private; hence, only I can access the artifact: it defeats the initial purpose of offering the preview to others. I want to give my teammates only limited access to my GitLab repository. I gave them Guest access, according to the Principle of least privilege. However, it still didn't work. As per the documentation, you must also make your pipeline public. Go to Settings > CI/CD > General pipelines and check the Public pipelines checkbox. Fixing Relative Links I use Jekyll to build HTML from Asciidoc. To generate links, Jekyll uses two configuration parameters: The domain, e.g., , set with url The path, e.g., /, set with baseurl Both are different on the preview. You must set those parameters in a YAML configuration file; there's no environment variable alternative. Let's change the build accordingly: YAML preview: stage: preview image: name: registry.gitlab.com/nfrankel/nfrankel.gitlab.io:latest before_script: - cd /builds/nfrankel/nfrankel.gitlab.io - "printf 'url: https://%s.gitlab.io\n' $CI_PROJECT_NAMESPACE >> _config_preview.yml" #1 - "printf 'baseurl: /-/%s/-/jobs/%s/artifacts/public/\n' $CI_PROJECT_NAME $CI_JOB_ID >> _config_preview.yml" #2 - cat _config_preview.yml #3 script: bundle exec jekyll b --future -t --config _config.yml,_config_preview.yml -d public #4 Set url using the CI_PROJECT_NAMESPACE environment variable. I could have used a hard-coded value since it's static, but it makes the script more reusable Set baseurl using the CI_PROJECT_NAME and CI_JOB_ID environment variables. The latter is the random part of the requirement Display the configuration's content for debugging purposes Use it! Improving Usability It's a bore trying to distribute the correct URL each time. Better to write it down in the console after building: YAML after_script: echo https://$CI_PROJECT_NAMESPACE.gitlab.io/-/$CI_PROJECT_NAME/-/jobs/$CI_JOB_ID/artifacts/public/index.html There's still one missing bit. GitLab Pages offer an index page. For example, if you request https://blog.frankel.ch, they will serve the root index.html. With plain artifacts, it's not the case. Given that I only want to offer a single post for preview, it's not an issue, so I didn't research the configuration further. Usage At this point, I only need to push to my preview branch: Shell git push --force origin HEAD:preview Icing on the cake, we don't need to have the branch locally; just push to the remote one. Conclusion In this post, I showed how to preview GitLab Pages and share the preview's URL with teammates in a couple of steps. The hardest part was to realize that web artifacts are rendered regularly with the browser. To Go Further: GitLab CI/CD permissions Set artifacts visibility independent of the project or group visibility Change which users can view your pipelines

By Nicolas Fränkel CORE

Structured Logging

When I was learning about writing serverless functions with AWS Lambda and Java, I came across the concept of structured logging. This made me curious about the concept of Structured Logs, so I decided to explore it further. What Is Structured Logging? Typically, any logs generated by an application would be plain text that is formatted in some way. For example, here is a typical log format from a Java application: [Sun Apr 02 09:29:16 GMT] book.api.WeatherEventLambda INFO: [locationName: London, UK temperature: 22 action: record timestamp: 1564428928] While this log is formatted, it is not structured. We can see that it is formatted with the following components: Timestamp (when it occurred) Fully Qualified Class Name (from where it occurred) Logging Level (the type of event) Message (this is the part that is typically non-standardized and therefore benefits the most from having some structure, as we will see) A structured log does not use a plain-text format but instead uses a more formal structure such as XML or, more commonly, JSON. The log shown previously would like like this if it were structured: Note that the message part of the log is what you would typically be interested in. However, there is a whole bunch of meta-data surrounding the message, which may or may not be useful in the context of what you are trying to do. Depending on the logging framework that you are using, you can customize the meta-data that is shown. The example shown above was generated from an AWS Lambda function (written in Java) via the Log4J2 logging framework. The configuration looks like this: XML <?xml version="1.0" encoding="UTF-8"?> <Configuration packages="com.amazonaws.services.lambda.runtime.log4j2"> <Appenders> <Lambda name="Lambda"> <JsonLayout compact="true" eventEol="true" objectMessageAsJsonObject="true" properties="true"/> </Lambda> </Appenders> <Loggers> <Root level="info"> <AppenderRef ref="Lambda"/> </Root> </Loggers> </Configuration> The JsonLayout tag is what tells the logger to use a structured format (i.e.) JSON in this case. Note that we are using it as an appender for INFO-level logs, which means logs at other levels, such as ERROR or DEBUG, will not be structured. This sort of flexibility, in my opinion, is beneficial as you may not want to structure all of your logs but only the parts that you think need to be involved in monitoring or analytics. Here is a snippet from the AWS Lambda function that generates the log. It reads a weather event, populates a Map with the values to be logged, and passes that Map into the logger. Java final WeatherEvent weatherEvent = objectMapper.readValue(request.getBody(), WeatherEvent.class); HashMap<Object, Object> message = new HashMap<>(); message.put("action", "record"); message.put("locationName", weatherEvent.locationName); message.put("temperature", weatherEvent.temperature); message.put("timestamp", weatherEvent.timestamp); logger.info(new ObjectMessage(message)); There are different ways of achieving this. You could write your own class that implements an interface from Log4J2 and then populate the fields of an instance of this class and pass this instance to the logger. So, What Is the Point of All This? Why would you want to structure your logs? To answer this question, consider you had a pile of logs (as in, actual logs of wood): If I were to say to you, "Inspect the log on top of the bottom left one," you would have to take a guess as to which one I am referring to. Now consider that these logs were structured into a cabin. Now, if I were to say to you, "Inspect the logs that make up the front door," then you know exactly where to look. This is why structure is good. It makes it easier to find things. Querying Your Logs Structured Logs can be indexed efficiently by monitoring tools such as AWS CloudWatch, Kibana, and Splunk. What this means is that it becomes much easier to find the logs that you want. These tools offer sophisticated ways of querying your logs, making it easier to do troubleshooting or perform analytics. For example, this screenshot shows how, in AWS CloudWatch Insights, you would search for logs where a weather event from Oxford occurred. We are referring to the locationName property under the message component of the log. You can do much more sophisticated queries with filtering and sorting. For example, you could say, "Show me the top 10 weather events where the temperature was greater than 20 degrees" (a rare occurrence in the UK). Triggering Events From Logs Another benefit of being able to query your logs is that you can start measuring things. These measurements (called metrics in AWS CloudWatch parlance) can then be used to trigger events such as sending out a notification. In AWS, this would be achieved by creating a metric that represents what you want to measure and then setting up a CloudWatch Alarm based on a condition on that metric and using the alarm to trigger a notification to, for instance, SNS. For example, if you wanted to send out an email whenever the temperature went over 20 degrees in London, you can create a metric for the average temperature reading from London over a period of, say, 5 hours and then create an alarm that would activate when this metric goes above 20 degrees. This alarm can then be used to trigger a notification to an SNS topic. Subscribers to the SNS topic would then be notified so that they know not to wear warm clothes. Is There a Downside To Structured Logs? The decision as to whether to use Structured Logs should be driven by the overall monitoring and analytics strategy that you envision for the system. If you have, for example, a serverless application that is part of a wider system that ties into other services, it makes sense to centralize the logs from these various services so that you have a unified view of the system. In this scenario, having your logs structured will greatly aid monitoring and analytics. If, on the other hand, you have a very simple application that just serves data from a single data source and doesn't link to other services, you may not need to structure your logs. Let's not forget the old adage: Keep it Simple, Stupid. So, to answer the question "Is there a Downside to Structured Logs?" - only if you use it where you don't need to. You don't want to spend time on additional configuration and having to think about structure when having simple logs would work just fine. Conclusion Structured Logging not only aids you in analyzing your logs more efficiently but it also aids you in building better monitoring capabilities in your system. In addition to this, business analytics can be enhanced through relevant queries and setting up metrics and notifications that can signal trends in the system. In short, Structured Logging is not just about logging. It is a tool that drives architectural patterns that enhance both monitoring and analytics.

By Karthik Viswanathan

Cucumber Selenium Tutorial: A Comprehensive Guide With Examples and Best Practices

Cucumber is a well-known Behavior-Driven Development (BDD) framework that allows developers to implement end-to-end testing. The combination of Selenium with Cucumber provides a powerful framework that will enable you to create functional tests in an easy way. It allows you to express acceptance criteria in language that business people can read and understand, along with the steps to take to verify that they are met. The Cucumber tests are then run through a browser-like interface that allows you to see what's happening in your test at each step. This Cucumber Selenium tutorial will walk you through the basics of writing test cases with Cucumber in Selenium WebDriver. If you are not familiar with Cucumber, this Cucumber BDD tutorial will also give you an introduction to its domain-specific language (DSL) and guide you through writing your first step definitions, setting up Cucumber with Selenium WebDriver, and automating web applications using Cucumber, Selenium, and TestNG framework. Why Use Cucumber.js for Selenium Automation Testing? Cucumber.js is a tool that is often used in conjunction with Selenium for automating acceptance tests. It allows you to write tests in a natural language syntax called Gherkin, which makes it easy for non-technical team members to understand and write tests. Cucumber with Selenium is one the easiest-to-use combinations. Here are a few benefits of using Selenium Cucumber for automation testing: Improved collaboration: Since Cucumber.js tests are written in a natural language syntax, they can be easily understood by all members of the development team, including non-technical stakeholders. This can improve collaboration between different team members and ensure that everyone is on the same page. Easier to maintain: Cucumber.js tests are organized into features and scenarios, which makes it easy to understand the structure of the tests and locate specific tests when you need to make changes. Better documentation: The natural language syntax of Cucumber.js tests makes them a good source of documentation for the functionality of your application. This can be especially useful for teams that follow a behavior-driven development (BDD) approach. Greater flexibility: Cucumber.js allows you to write tests for a wide variety of applications, including web, mobile, and desktop applications. It also supports multiple programming languages, so you can use it with the language that your team is most familiar with. To sum it up, Cucumber with Selenium is a useful tool for automating acceptance tests, as it allows you to write tests in a natural language syntax that is easy to understand and maintain and provides greater flexibility and improved collaboration. Configure Cucumber Setup in Eclipse and IntelliJ [Tutorial] With the adoption of Agile methodology, a variety of stakeholders, such as Quality Assurance professionals, technical managers, and program managers, including those without technical backgrounds, have come together to work together to improve the product. This is where the need to implement Behavior Driven Development (BDD) arises. Cucumber is a popular BDD tool that is used for automated testing. In this section, we will explain how to set up Cucumber in Eclipse and IntelliJ for automated browser testing. How To Configure Cucumber in Eclipse In this BDD Cucumber tutorial, we will look at the instructions on how to add necessary JARs to a Cucumber Selenium Java project in order to set up Cucumber in Eclipse. It is similar to setting up the TestNG framework and is useful for those who are just starting with Selenium automation testing. Open Eclipse by double-clicking, then select workspace and change it anytime with the Browse button. Click Launch and close the Welcome window; not needed for the Cucumber Selenium setup. To create a new project, go to File > New > Java Project. Provide information or do what is requested, then enter a name for your project and click the Finish button. Right-click on the project, go to Build Path > Configure Build Path. Click on the button labeled "Add External JARs." Locate Cucumber JARs and click Open. Add required Selenium JARs for Cucumber setup in Eclipse. They will appear under the Libraries tab. To import the necessary JARs for the Cucumber setup in Eclipse, click 'Apply and Close.' The imported JARs will be displayed in the 'Referenced Libraries' tab of the project. How To Install Cucumber in IntelliJ In this part of the IntelliJ Cucumber Selenium tutorial, we will demonstrate how to set up Cucumber with IntelliJ, a widely used Integrated Development Environment (IDE) for Selenium Cucumber Java development. Here are the steps for configuring Cucumber in IntelliJ. To create a new project in IntelliJ IDEA, open the IntelliJ IDE and go to File > New > Project. To create a new Java project in IntelliJ IDEA, select Java and click Next. To complete the process of creating a new project in IntelliJ IDEA, name your project and click Finish. After creating a new project in IntelliJ IDEA, you will be prompted to choose whether to open the project in the current window or a new one. You can select 'This Window.' After creating a new project in IntelliJ IDEA, it will be displayed in the project explorer. To import the necessary JARs for Selenium and Cucumber in IntelliJ IDEA, go to File > Project Structure > Modules, similar to how it was done in Eclipse. To add dependencies to your project, click the '+' sign at the bottom of the window and select the necessary JARs or directories. To add Selenium to Cucumber, add the selenium-java and selenium dependencies JARs to the project. Click Apply and then OK. Similar to how Cucumber Selenium is set up in Eclipse, the imported JARs will be located in the "External Libraries" section once they have been added. For a step-by-step guide on using Cucumber for automated testing, be sure to check out our Cucumber testing tutorial on Cucumber in Eclipse and IntelliJ. This article provides detailed instructions and tips for setting up Cucumber in these two popular Java IDEs. Cucumber.js Tutorial With Examples for Selenium JavaScript Using a BDD framework has its own advantages, ones that can help you take your Selenium test automation a long way. Not to sideline, these BDD frameworks help all of your stakeholders to easily interpret the logic behind your test automation script. Leveraging Cucumber.js for your Selenium JavaScript testing can help you specify an acceptance criterion that would be easy for any non-programmer to understand. It could also help you quickly evaluate the logic implied in your Selenium test automation suite without going through huge chunks of code. Cucumber.js, a Behaviour Driven Development framework, has made it easier to understand tests by using a given-when-then structure. For example- Imagine you have $1000 in your bank account. You go to the ATM machine and ask for $200. If the machine is working properly, it will give you $200, and your bank account balance will be $800. The machine will also give your card back to you. How To Use Annotations in Cucumber Framework in Selenium [Tutorial] The Cucumber Selenium framework is popular because it uses natural language specifications to define test cases. It allows developers to write test scenarios in plain language, which makes it easier for non-technical stakeholders to understand and review. In Cucumber, test scenarios are written in a feature file using annotations, which are keywords that mark the steps in the scenario. The feature file is then translated into code using step definition files, which contain methods that correspond to each step in the feature file. Cucumber Selenium annotations are used to mark the steps in the feature file and map them to the corresponding methods in the step definition file. There are three main types of annotations in Cucumber: Given, When, and Then. Given annotations are used to set up the initial state of the system under test. When annotations are used to describe the actions that are being tested, and Then annotations are used to verify that the system is in the expected state after the actions have been performed. Cucumber Selenium also provides a number of "hooks," which are methods that are executed before or after certain events in the test execution process. For example, a "before" hook might be used to set up the test environment, while an "after" hook might be used to clean up resources or take screenshots after the test has been completed. There are several types of hooks available in Cucumber, including "before" and "after" hooks, as well as "beforeStep" and "afterStep" hooks, which are executed before and after each step in the test scenario. In this tutorial, we will go over the different types of annotations and hooks that are available in Cucumber and how to use them to write effective test scenarios. We will also look at some best practices for organizing and maintaining your Cucumber Selenium tests. How To Perform Automation Testing With Cucumber and Nightwatch JS To maximize the advantages of test automation, it is important to choose the right test automation framework and strategy so that the quality of the project is not compromised while speeding up the testing cycle, detecting bugs early, and quickly handling repetitive tasks. Automation testing is a crucial aspect of software development, allowing developers to quickly and efficiently validate the functionality and performance of their applications. Cucumber is a testing framework that uses natural language specifications to define test cases, and Nightwatch.js is a tool for automating tests for web applications. Behavior Driven Development (BDD) is a technique that clarifies the behavior of a feature using simple, user-friendly language. This approach makes it easy for anyone to understand the requirements, even those without technical expertise. DSL is also used to create automated test scripts. Cucumber allows users to write scenarios in plain text using Gherkin syntax. This syntax is made up of various keywords, including Feature, Scenario, Given, When, Then, and And. The feature is used to describe the high-level functionality, while Scenario is a collection of steps for Cucumber to execute. Each step is composed of keywords such as Given, When, Then, and And, all of which serve a specific purpose. A Gherkin document is stored in a file with a .feature extension. Automation Testing With Selenium, Cucumber, and TestNG Automation testing with Selenium, Cucumber, and TestNG is a powerful combination for testing web applications. Selenium is an open-source tool for automating web browsers, and it can be used to automate a wide range of tasks in a web application. Cucumber is a tool for writing and executing acceptance tests, and it can be used to define the expected behavior of a web application in a simple, human-readable language. TestNG is a testing framework for Java that can be used to organize and run Selenium tests. One of the benefits of using Selenium, Cucumber, and TestNG together is that they can be integrated into a continuous integration and delivery (CI/CD) pipeline. This allows developers to automatically run tests as part of their development process, ensuring that any changes to the codebase do not break existing functionality. To use Selenium, Cucumber, and TestNG together, you will need to install the necessary software and dependencies. This typically involves installing Java, Selenium, Cucumber, and TestNG. You will also need to set up a project in your development environment and configure it to use Selenium, Cucumber, and TestNG. Once you have set up your project, you can begin writing tests using Selenium, Cucumber, and TestNG. This involves creating test cases using Cucumber's Given-When-Then syntax and implementing those test cases using Selenium and TestNG. You can then run your tests using TestNG and analyze the results to identify any issues with the application. To sum it up, automation testing with Selenium, Cucumber, and TestNG can be a powerful tool for testing web applications. By integrating these tools into your CI/CD pipeline, you can ensure that your application is tested thoroughly and continuously, helping you to deliver high-quality software to your users. How To Integrate Cucumber With Jenkins Cucumber is a tool for writing and executing acceptance tests, and it is often used in conjunction with continuous integration (CI) tools like Jenkins to automate the testing process. By integrating Cucumber with Jenkins, developers can automatically run acceptance tests as part of their CI pipeline, ensuring that any changes to the codebase do not break existing functionality. To use Cucumber Selenium with Jenkins, you will need to install the necessary software and dependencies. This typically involves installing Java, Cucumber, and Jenkins. You will also need to set up a project in your development environment and configure it to use Cucumber. Once you have set up your project and configured Cucumber, you can begin writing acceptance tests using Cucumber's Given-When-Then syntax. These tests should define the expected behavior of your application in a simple, human-readable language. To run your Cucumber Selenium tests with Jenkins, you will need to set up a Jenkins job and configure it to run your tests. This typically involves specifying the location of your Cucumber test files and any necessary runtime parameters. You can then run your tests by triggering the Jenkins job and analyzing the results to identify any issues with the application. In conclusion, integrating Cucumber with Jenkins can be a powerful way to automate the acceptance testing process. By running your tests automatically as part of your CI pipeline, you can ensure that your application is tested thoroughly and continuously, helping you to deliver high-quality software to your users. Top Five Cucumber Best Practices for Selenium Automation Cucumber is a popular tool for writing and executing acceptance tests, and following best practices can help ensure that your tests are effective and maintainable. Here you can look at the five best practices for using Cucumber to create robust and maintainable acceptance tests. 1. Keep acceptance tests simple and focused: Cucumber tests should define the expected behavior of your application in a clear and concise manner. Avoid including unnecessary details or extraneous test steps. 2. Use the Given-When-Then syntax: Cucumber's Given-When-Then syntax is a helpful way to structure your acceptance tests. It helps to clearly define the pre-conditions, actions, and expected outcomes of each test. 3. Organize tests with tags: Cucumber allows you to use tags to label and group your tests. This can be useful for organizing tests by feature or by the level of testing (e.g. unit, integration, acceptance). 4. Avoid testing implementation details: Your acceptance tests should focus on the expected behavior of your application rather than the specific implementation details. This will make your tests more robust and easier to maintain. 5. Use data tables to test multiple scenarios: Cucumber's data tables feature allows you to test multiple scenarios with a single test. This can be a useful way to reduce the number of test cases you need to write. By following these best practices when using Cucumber, you can create effective and maintainable acceptance tests that help ensure the quality of your application.

By Sarah Elson

Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline

AWS Lambda is a popular serverless platform that allows developers to run code without provisioning or managing servers. In this article, we will discuss how to implement a serverless DevOps pipeline using AWS Lambda and CodePipeline. What Is AWS Lambda? AWS Lambda is a computing service that runs code in response to events and automatically scales to meet the demand of the application. Lambda supports several programming languages, including Node.js, Python, Java, Go, and C#. CodePipeline is a continuous delivery service that automates the build, test, and deployment of applications. CodePipeline integrates seamlessly with other AWS services, such as CodeCommit, CodeBuild, CodeDeploy, and Lambda. Creation of the Lambda Function To implement a serverless DevOps pipeline, we first need to create a Lambda function that will act as a build step in CodePipeline. The Lambda function will be responsible for building the application code and creating a deployment package. The deployment package will be stored in an S3 bucket, which will be used as an input artifact for the deployment step. Implementing the CodePipeline The next step is to create a CodePipeline pipeline that will orchestrate the build, test, and deployment process. The pipeline will consist of three stages: Source, Build, and Deploy. The Source stage will pull the application code from a Git repository, such as CodeCommit. The Build stage will invoke the Lambda function to build the application code and create a deployment package. The Deploy stage will deploy the application to a target environment, such as an EC2 instance or a Lambda function. The Build Stage In the Build stage, the Lambda function will be triggered by a CodePipeline event. The event will contain information about the source code, such as the Git commit ID and the branch name. The Lambda function will use this information to fetch the source code from the Git repository and build the application. The Lambda function will then create a deployment package, which will be stored in an S3 bucket. The deployment package will contain the application code, as well as any dependencies, configuration files, and scripts required to deploy the application. The Deploy Stage In the Deploy stage, we will use AWS CodeDeploy to deploy the application to a target environment. CodeDeploy is a deployment service that automates the deployment of applications to Amazon EC2 instances, Lambda functions, or on-premises servers. CodeDeploy uses deployment groups to deploy applications to one or more instances in a target environment. The deployment group can be configured to perform rolling deployments, blue/green deployments, or custom deployment strategies. Using CodeDeploy We can use CodeDeploy to deploy the application to a Lambda function by creating a deployment group that targets the Lambda function. The deployment group can be configured to use the deployment package created in the Build stage as the input artifact. CodeDeploy will then create a new version of the Lambda function and update the alias to point to the new version. This will ensure that the new version is deployed gradually and that the old version is still available until the new version is fully deployed. Conclusion In conclusion, implementing a serverless DevOps pipeline with AWS Lambda and CodePipeline can help to streamline the software delivery process, reduce costs and improve scalability. By using Lambda as a build step in CodePipeline, we can automate the build process and create deployment packages that can be easily deployed to a target environment using CodeDeploy. With continuous delivery, we can ensure that new features and bug fixes are delivered to customers quickly and reliably.

By Charles Ituah

Best Practices for Securing Infrastructure as Code (Iac) In the DevOps SDLC

Infrastructure as code (IaC) is the practice of managing and provisioning computing resources using configuration files or scripts rather than manual deployment and configuration processes. This enables developers and operations teams to collaborate more effectively, automate deployments, and improve consistency and reliability. Infrastructure as Code — Everything You Need to Know However, IaC also introduces new security challenges and risks that need to be comprehensively addressed at every stage of the DevOps software development lifecycle (SDLC). In this blog post, we will break down every step of the DevOps lifecycle, from planning to post-deployment, and highlight the potential security risks associated with each stage. We will also provide best practices and recommendations for mitigating these risks and ensuring the security of your IaC infrastructure. By following these guidelines, you can confidently adopt IaC in your DevOps processes without compromising the security of your applications and data. Let's dive in and explore the security challenges and solutions of IaC in the DevOps SDLC! Download the cheat sheet. Plan The planning stage involves defining the requirements and design of the infrastructure, as well as identifying the potential threats and vulnerabilities that may affect it. At the planning stage, there are two main things you should be doing to secure your IaC: Threat modeling Establishing privileges Threat Modeling For threat modeling, it is common to use a standard framework or methodology, such as STRIDE or DREAD, to identify and prioritize the most critical risks in the design of your infrastructure. You can also use tools like Microsoft’s Threat Modeling Tool or OWASP Threat Dragon to assist you in threat modeling. Consider the use of encryption, hashing, and key management techniques to protect sensitive data and credentials both in transit and at rest. You should also have a plan for handling untrusted input. Additionally, consider how network controls like a WAF can improve your application’s security posture. Establishing Privileges Always follow the principle of least privilege, which means granting only the minimum permissions and access levels required for each resource and account. For user accounts, implement segregation of duties by separating the responsibilities of different team members. Minimizing the power of individual credentials reduces the damage that can be done if a cybercriminal hijacks an account or credential. Develop The development or coding stage involves writing and updating the code or scripts that define the infrastructure. Some of the security best practices for this stage are: Security-based IDE plugins Pre-commit hooks Static analysis Secrets management IDE Plugins In DevOps, the culture is all about “shifting left,” which means it’s better to catch bugs and security issues sooner rather than later. As a developer, the quickest feedback you can get is right in your IDE while you are writing your IaC. There are various IDE plugins that are capable of identifying vulnerabilities in your code as you write it. A few examples are: TFLint — TerraForm linter with some security best-practice rules Checkov — misconfiguration scanner for multiple types of IaC Snyk — code, container, and IaC scanner that offers an IDE plugin Pre-Commit Hooks Pre-commit hooks automate the execution of static code analysis tools before the code is committed to your version control system. For example, remediating exposed secrets can get messy when the secret is already in the git history of your repository. If you set up a secret scanner as a pre-commit hook, it will catch secrets before they get committed and save you from some extra cleanup work later. Creating a pre-commit hook to detect secrets with GGShield in less than 2 minutes ggshield is a CLI application that runs in a local environment or in a CI environment to help detect more than 300 types of secrets, as well as other potential security vulnerabilities or policy breaks. Static Analysis Once the code has been committed to your version-controlled repository, you can scan it with static code analysis tools. There are various scanning tools, depending on what you are trying to scan. Some popular IaC static analysis tools are: ggshield — yes, the GitGuardian CLI can also be used to scan for infrastructure as code vulnerabilities by running the command: ggshield iac scan iac_repo Kube Bench — Kubernetes configuration scanner based on CIS Kubernetes Benchmark Coverity — static analysis platform similar to Snyk Learn more about using ggshield to scan IaC misconfigurations: Introducing Infrastructure as Code Security Scanning Secrets Management Secrets management is a complex topic in and of itself, but it’s all about making sure your secrets are accessible in a secure way. If you want to learn more about how to be good at secret management, check out our Secrets Management Maturity Model. Build and Test In the building and testing phases, you have the opportunity to see what the infrastructure will look like and how it will behave. These are the key security practices you should be following in this phase of the DevSecOps pipeline: Separation of environments Dynamic testing Vulnerability scanning Container image scanning Artifact signing Separation of Environments Use a dedicated testing environment that mimics the production environment as closely as possible but with isolated resources and data. Sharing things like databases between environments can lead to production data being put at risk when a vulnerability is introduced to a test environment. Dynamic Testing Dynamic testing tools perform automated tests on the deployed infrastructure to check its configuration and behavior against the expected security policies and standards. A couple of popular IaC dynamic testing tools are InSpec and Terratest. Container Image Scanning When your applications use container images, it’s important to take inventory of the software that is baked into each image and look for vulnerable, outdated versions. You can scan a newly built image in your CI pipeline with a tool like Aqua or Snyk, but it’s also a good idea to scan your entire container registry on a regular basis to ensure that new vulnerabilities are being noticed when an image isn’t receiving updates. And don't forget about leaked secrets in images' layers! Secrets exposed in Docker images: Hunting for secrets in Docker Hub Artifact Signing When you sign build artifacts such as binaries and container images, you are ensuring the integrity of your services between the time they are built to the time they are deployed. To learn more about why supply chain security is important and how you can implement it, check out our blog on Supply Chain Security. Deploy Deploying IaC happens automatically, so there isn’t much involvement from operations at this stage. However, there are still policies you’ll need to follow in your deployment pipeline to ensure that you are meeting best practices when it comes to securely deploying your assets: Immutability Inventory management Immutability Once your infrastructure is deployed, you don’t want it to deviate from what is defined in your code. Post-deployment changes can introduce unintended bugs or vulnerabilities. Whenever a change is needed, you should first update your code and then follow the CI/CD process to redeploy the entire infrastructure. If possible, use policies or controls to prevent the modification of your infrastructure after it has been deployed. Inventory Management Inventory management is a foundational part of most cybersecurity frameworks. When you commission and decommission assets, your IaC tools should be automatically updating your overall asset inventory so you have an accurate picture of your attack surface. Applying tags to assets is another practice that can help you organize and maintain your inventory. Tags improve your ability to identify configuration drift and deprecated systems that have not been decommissioned properly. Monitor Post-deployment monitoring has historically been the bread-and-butter of security programs, but as deployment environments have changed and shifted to the cloud, there are some new approaches to securing IaC. Nonetheless, the two keys of security monitoring remain the same: Logging Threat detection Logging When provisioning and configuring IaC resources, you should have audit and security logging in place to keep a record of the creation of and access to your infrastructure. Forwarding logs to a SIEM or analysis engine can help you identify anomalies like resources being spun up outside of the normal deployment cycle or configuration changes outside of provisioning (tying back to the importance of immutability). Threat Detection Building runtime threat detection into your IaC is the best way to ensure that you are made aware when the infrastructure you have created is under attack. There are countless security tools to choose from depending on the type of infrastructure you are deploying. There are tools like Falco to detect anomalies in Kubernetes pods or EDR tools for traditional virtual machine infrastructure. You can also forward additional logs to a SIEM depending on what is needed to enable your detection strategy. Summary: 15 Infrastructure as Code Best Practices Plan Threat modeling: Use a framework to identify and prioritize risks in the infrastructure design. Consider encryption, hashing, key management techniques, and network controls. Establishing privileges: Follow the principle of least privilege and implement segregation of duties to minimize the power of individual credentials. Develop Security-based IDE plugins: Use IDE plugins to catch bugs and security issues sooner rather than later, such as TFLint, Checkov, and Snyk. Pre-commit hooks: Automate the execution of static code analysis tools before code is committed to the version control system. Use ggshield to detect more than 350+ types of secrets. Static analysis: Scan code with static analysis tools like ggshield, Kube Bench, and Coverity. Secrets management: Securely manage secrets with appropriate tools. Use GitGuardian's Secrets Management Maturity Model if needed. Build and Test Separation of environments: Use a dedicated testing environment that mimics the production environment as closely as possible but with isolated resources and data. Dynamic testing: Use automated tests to check infrastructure configuration and behavior against security policies and standards, such as InSpec and Terratest. Container image scanning: Take inventory of software that is baked into each image and look for vulnerable, outdated versions. Scan a newly built image in your CI pipeline with tools like Aqua and Snyk. Artifact signing: Sign-build artifacts like binaries and container images to ensure their integrity. Deploy Immutability: Do not allow post-deployment changes that deviate from what is defined in the code. Use policies or controls to prevent modification of the infrastructure after it has been deployed. Inventory management: Commission and decommission assets, automatically update the asset inventory and apply tags to assets to organize and maintain the inventory. Monitor Logging: Provision and configure IaC resources with audit and security logging to keep a record of creation and access to the infrastructure. Forward logs to a SIEM or analysis engine to identify anomalies. Threat detection: Build runtime threat detection into IaC using tools like Falco or traditional EDR tools. Conclusion In this blog post, we have discussed some of the best practices and tools for securing IaC at each stage of the DevSecOps software development lifecycle. By following these steps and referencing the cheat sheet, you can improve the security, reliability, and consistency of your IaC throughout your DevOps pipeline. If you're interested in diving deeper into infrastructure security with Terraform, be sure to check out our previous blog post. It offers a detailed exploration of Terraform security practices and techniques that you can use to enhance your IaC security further. Don't miss out on this valuable resource! Infrastructure as Code Security: Security Tools — GitGuardian

By C.J. May

Scaling Site Reliability Engineering (SRE) Teams the Right Way

Most SRE teams eventually reach a point in their existence where they appear unable to meet all the demands placed upon them. This is when these teams may need to scale. However, it’s important to understand that increasing team capacity is not the same as increasing the number of people on the team. Let’s unpack what scaling a team is all about, what are the indicators, what are steps you can take, and how you know if you’re done. Scaling Triggers Sometimes it is very easy to tell whether you need to scale your team or not. For example: When the team is assigned more services to manage, Traffic or users have significantly increased, or Service Level Objectives (SLOs) have become more demanding In the above situations, it is usually obvious that the team needs to scale. In other situations, the signs that you need to scale are more subtle and often ambiguous. Here are a few things that may be indicators that your team needs to scale: An increase in toil: A repetitive task that create no long-term value and need to be actively controlled. Automation, run books, and retrospectives all reduce toil. However, when a team is under pressure, it will have no slack to think about quality of life improvements like toil reduction. It will be constantly scrambling to maintain reliability and fulfil business objectives. A decrease in reliability or performance: Similar to toil, reliability and performance need to be actively managed. When teams are over stretched they often react to SLO breaches rather than proactively initiating performance or reliability projects. Improvement projects are delayed or canceled: Increase in toil, a decline in performance or reliability can be symptoms of a more general problem: neglecting long-term planning in favor of reacting to short-term issues. Another symptom of this is when any kind of improvement project is de-prioritized in favor of feature development. Decline in the team’s morale: People in teams that need to scale are usually overloaded, stressed, and close to burn out. This, in fact, is the number one reason to scale your team since losing people is among the most difficult problems to recover from. All of these indicators are not conclusive and can have other causes. You need to be sure that you are solving the correct problem. It can be very tempting to see manpower as a blanket solution for all problems, but it can worsen the problem and leave you with a trickier problem of scaling down. Adding people to your team should be the last thing you do after exhausting all other options. This is not only more prudent financially, but it also ensures that you are not ignoring problems that could become more difficult to address over time. When thinking about any technical initiative, it is useful to break it down using the People-Process-Tools model. This assumes that the most important factors that impact an initiative, in order of importance, are People, Processes, and Tools. Let’s look at them in chronological order. Process Before starting a scaling effort, you should know what metrics you are trying to improve and how you should be measuring them. It is an engineering axiom that you can’t optimize what you’re not measuring. The exact metrics to look at will vary from team to team and from situation to situation, but here are a few to start with: Actual performance against SLOs Project metrics: 80th percentile wait time 80th percentile cycle time Average daily queue size Mean time to acknowledge (MTTA) Once you are measuring your key metrics, institute a process to frequently evaluate your performance on those metrics. It might be as simple as taking a few minutes in every sprint retrospective for this purpose. Don’t underestimate the value of processes to help you scale. Many smaller teams often use simplistic, ad-hoc processes. Engineers often dismiss processes as undesirable overhead. This misses the raison d'etre of processes to reduce error and improve efficiency. Management Processes Toil Limits ensure that toil reduction tasks are prioritized. Postmortems identify measures to prevent the repetition of incidents. Agile methods like Kanban ensure management processes themselves are efficient. Reports like finger charts can help identify bottlenecks. Engineering Processes Alert Noise Reduction quietens noisy alerts and prioritizes them. This reduces the effort needed to manage incidents. Alert Routing ensures that only the appropriate people are notified about incidents. Automation reduces toil and errors. Pairing aids knowledge transfer and reduces errors. Infrastructure as Code improves repeatability and reduces errors. Tools The subject of SRE tools is pretty vast — too large for this article. So rather than going into a potentially lengthy discussion of specific tools, let's discuss how to think about tools in the context of scaling. Different kinds of tools have different kinds of scaling impacts. It is important to have hard data that indicates what kind of improvement is necessary. This data may be in your project management or trouble ticket system, but more often than not you will need to get feedback from your team. In general, there are a few kinds of results that you should expect from the tools that your team is using: Tools that Help You Handle More Load With the Same Team This could be anything from pssh to ansible that helps you handle large fleets of servers, VMs, or containers. Modern monitoring tools not only perform better at scale, but they are also often easier to configure too. Incident management tools like Squadcast prioritize and deduplicate incidents allowing engineers to focus on critical tasks. Tools that Reduce Rework by Reducing Errors Script libraries, runbooks, and runbook automation systems all facilitate task repeatability — allowing tasks to be executed reliably as frequently as needed. Using containers to implement immutable servers ensures that subtle errors caused by config drift are avoided. Tools that Eliminate Certain Kinds of Work Container orchestration systems like Kubernetes eliminate huge swathes of work — everything from setting up process supervisors like supervisors for managing load balancers. Distributed tracing systems like OpenTelementry reduce the need for complex log aggregation systems to track transactions through distributed systems. Tools that Help Delegate Work Tools like RunDeck allow secure, guard-railed, role-based access to scripts. This allows dependent teams like developers or customer support to work independently without adding to the SRE workload. Similarly, tools like Metabase, Kibana, and Grafana can be used to provide self-service access to production data, logs or metrics to product management, customer support, or Management. Providing senior management with the ability to answer their own questions is a particularly powerful way to reduce a lot of high-priority, low value-add effort. There Are No Silver Bullets Avoid the idea that tools are a panacea. Introducing new tools can be financially burdensome and disruptive. If introduced unwisely they can easily make your team worse off. This is why a clear cost-benefit analysis is necessary before investing in new tools. People Once you have exhausted all other options to increase your team’s capacity you then have to start adding people to your team. Capacity Planning Capacity planning is more an art than a science, requiring a combination of hard data and judgment calls. There is no sure-fire method to build the perfect capacity plan. But here are some tips: Use data about your existing load to make projections. This can be in ideal man hours or story points. Relate that to the services under management. You should be able to say something like, “Adding another microservice will add about 50 hours of project work per quarter” or “We currently have 80 story points of demand every sprint versus 60 points of capacity.” You have to be able to approximately quantify and reason about the current and projected loads. Factor in the relative productivity and cost of seniors vs. juniors. Juniors often take longer on tasks than seniors. Seniors often have other responsibilities like code reviews, mentoring, or interviews. As with load, you should be able to quantify and reason about capacity. High utilization, defined as the ratio of task hours to available working hours, is not a good measure of efficiency. Less slack time implies fewer creative hours for innovation and improvement. It’s also likely to lead to frustration and burnout. Try to plan for 30% slack. While it might be a good idea to plug all these numbers into a spreadsheet to make your projections, do not lose sight of the fact that these are only rough approximations of reality. Ensure that you are conservative in capacity projections and liberal in demand projections. Add buffers liberally. It’s always better to end up with slightly more capacity than you need than slightly less. Team Composition There are a couple of major factors to consider when planning the composition of your team: Experience: Balancing out the experience mix of your team requires a set of trade-offs. In general, we can bucket people into juniors, intermediates, and seniors. The definition of these buckets in terms of years of experience and capability will vary depending on your local labor market, tech stack, and business domain. Somebody with 10 years of experience managing Go microservices might be considered senior, but the similar experience on nuclear power station systems may be junior. Juniors are less expensive and less productive while seniors are the opposite. So why not staff completely with that happy medium — intermediates? This idea ignores the special value that both seniors and juniors add. Seniors’ experience allows them to quickly solve problems without reinventing the wheel and, more importantly, teach others while doing it. Juniors are future intermediates who don’t need to be un-trained on bad habits picked up elsewhere. The best compromise is to build your team around a core of intermediates, with a small number of juniors and seniors to round it off. A proportion of 20:60:20 of juniors, intermediates, and seniors might be a goal to strive for. Diversity: Even if you ignore the moral imperative to support groups that have historically been discriminated against, there are good operational reasons to seek diversity in your team. Multiple perspectives contribute to greater creativity and innovation. There’s also some anecdotal evidence that diverse teams are better behaved and more professional than the testosterone-fuelled boys clubs that non diverse teams can occasionally become. Culture Fit: “Cultural fit” has often been a tool of convenience to exclude those who don’t conform to a preconceived notion of what an engineer should be. In my book there is only one fundamental purpose of a cultural fit check and that is to exclude jerks. Nothing saps a team’s productivity like a negative individual who constantly creates petty conflicts or belittles team mates. It’s important to filter out jerks during the recruitment process itself and to get rid of them quickly if identified later. Don’t give high performing jerks a pass — their productivity rarely makes up for the drop in performance they create in the team. Candidate Sources Where can you hire from? One good way is to poach them from elsewhere in your company. They’re often a known quantity and usually much cheaper than external hires. Many traditional organizations have System Administration, Build or DevOps Teams that have people who would make good SREs. Software developers can bring engineering rigor to the team. Usually, though, internal hiring would just move the scaling problem to another team. The most effective candidate sourcing mechanisms vary from place to place but here are some important ones: Employee referrals Recruitment consultants Job boards Advertising Careers page on your website. In general, employee referrals are cheaper and have a better hit rate than all other mechanisms because they are pre-filtered by the employee. Ensure that you have rewards and incentives to encourage them. Increasing capacity via hiring is time consuming and fraught with uncertainty. Ideally, you should start months in advance of the projected growth. Unfortunately most of us don’t have that luxury, so it is critical that you have contingency plans in place to handle hiring delays. Conclusion Scaling SRE teams is a challenging exercise that requires extensive analysis and planning. Adding people is slow, expensive, and risky, so consider process or technology improvements to tide you over. When you start hiring it pays to use plan capacity requirements with data rather than gut instinct. Be thoughtful about the composition of your team, as it can be critical to long-term success.

By Biju Chacko

Observability Architecture: Financial Payments Introduction

Back in September 2020, I was researching open-source architectures - meaning looking at several customer solutions from my employer at the time - and developing a generic view of these solutions for certain use cases. One of the use cases is known as financial payments. Back in 2020, I kicked off a series covering this architecture with the article Payments Architecture - An Introduction. The series consisted of six articles and covered architectural diagrams from logical and schematic to detailed views of the various use cases we uncovered. The architectures presented were based on open-source cloud-native technologies, such as containers, microservices, and a Kubernetes-based container platform. The major omission in this series was to avoid discussing any aspect of cloud-native observability. This series will take a look at fixing that omission with an open-source, standards-based, cloud-native observability platform that helps DevOps teams control the speed, scale, and complexity of a cloud-native world for their financial payments architecture. The Baseline Architecture Let's review the use case before we dive into the architectural details. For a bit of background, we review what the base open-source generic architecture was focused on for the financial payments use case. Cloud technology is changing the way payment services are architected. This series builds on the original baseline solution that was used to modernize payment services. Note that you can find this and other open-source architecture solutions in their repository; feel free to browse them at your leisure. The rest of this article will focus on introducing cloud-native observability to your payments architecture. These projects are providing you with a way to create a cloud-native payment architecture that's proven to work in multiple customer cloud environments, with a focus in this article on the addition of cloud-native observability. Now let's look at the use case definition and lay the groundwork for diving into how you can add cloud-native observability to your architecture. Defining Payments To start off our story, the following statement has been developed to help in guiding our architecture focus for this financial payments use case: Financial institutions enable customers with fast, easy-to-use, and safe payment services available anytime, anywhere. With this guiding principle, the baseline architecture was developed to help everyone to be successful in providing their customers with a robust payment experience. We continue to expand on this baseline adding a robust cloud-native observability platform that provides the control, visibility, speed, and scale that financial service providers are looking for. All diagrams and components used to expand the architecture with cloud-native observability conform to the original design guidelines and leverage the same diagram tooling. We'll start by revisiting the original logical diagram and sharing insights into the additional (newer) components related to the cloud-native observability architecture. You'll discover the technologies used to collect and store both metrics and tracing data through the use of a collector and the Chronosphere platform. This is followed by specific examples worked out in schematic diagrams (physical architecture) that explore a few specific financial payments use case examples and provide you with guides for mapping cloud-native observability components to your own existing architectures. You'll see both networked connections and data flow examples worked out to help you in your understanding of the generic views being provided. Next, let's quickly cover how you can make use of the content in this financial payments project and leverage both the ability to download images of the architectures and to be able to open the diagrams in the open-source tooling for adjustment to your own needs. Using the Payments Project The architecture collection provided insights into all manner of use cases and industries researched between 2019-2022. The architectures each provide a collection of images for each diagram element as well as the entire project as a whole, for you to make use of as you see fit. If we look at the financial payments project, you'll see a table of contents allowing you to jump directly to the topic or use case that interests you the most. You can also just scroll down through each section and explore at your leisure. Each section provides a short description covering what you see, how it's important to the specific payment topic listed, and a walk through the diagram(s) presented. You can download any of the images to make use in any way you like. At the bottom of the page, you will find a section titled Download Diagrams. If you click on the Open Diagrams link, it will open all the available logical, schematic, and detailed diagrams in the diagram tooling we used to create them. This makes them readily available for any modifications you might see fit to make for use in your own architectures, so feel free to use and modify! Finally, there is a free online beginners guide workshop available focused on using diagram tooling, please explore to learn tips and tricks from the experts. Series Overview The following overview of this o11y architecture series on adding cloud-native observability to your financial payments architecture can be found here: Financial payments introduction (this article) Financial payments logical observability elements Adding observability to immediate payments example Adding observability to financial calculations example Catch up on any articles you missed by following one of the links above as the series progresses. Next in this series, explore the cloud-native observability elements needed for any financial payment processing architecture.

By Eric D. Schabell