Containers Trend Report. Explore the current state of containers, containerization strategies, and modernizing architecture.
Securing Your Software Supply Chain with JFrog and Azure. Leave with a roadmap for keeping your company and customers safe.
Containers
The proliferation of containers in recent years has increased the speed, portability, and scalability of software infrastructure and deployments across all kinds of application architectures and cloud-native environments. Now, with more and more organizations migrated to the cloud, what's next? The subsequent need to efficiently manage and monitor containerized environments remains a crucial task for teams. With organizations looking to better leverage their containers — and some still working to migrate out of their own monolithic environments — the path to containerization and architectural modernization remains a perpetual climb. In DZone's 2023 Containers Trend Report, we will explore the current state of containers, key trends and advancements in global containerization strategies, and constructive content for modernizing your software architecture. This will be examined through DZone-led research, expert community articles, and other helpful resources for designing and building containerized applications.
Threat Modeling
Distributed SQL Essentials
In an era where environmental awareness is paramount, the need for accurate and timely air quality data is crucial. One key pollutant that demands attention is PM2.5, referring to fine particulate matter with a diameter of 2.5 micrometers or smaller. RPI3 with Enviro+ Sensors Hardware (Enviro + Air Quality) Raspberry Pi 4 Model B Rev 1.1 with 4GB RAM PMS5003 Particulate Matter Sensor with Cable BME280 LTR-559 MICS6814 ADS1015 MEMS microphone 0.96" color LCD PMS5003 To efficiently gather and process PM2.5 data from a MiNiFi Java Agent equipped with a particle sensor, Apache NiFi offers an exceptional solution. This article explores the process of building a NiFi flow to seamlessly ingest PM2.5 data, empowering you to monitor and analyze air quality with ease. We will also utilize Cloudera Edge Flow Manager (EFM) to develop, build, manage, deploy, monitor, enhance, backup, and maintain our flows for agents. In this case, I am prototyping with a single agent attached to this class of agents. We will deploy the one flow to one agent, the process is no different if there are 50,000 different devices receiving the same agent class. Particle Sensor Source Code FLaNK-ParticulateMatterSensor FLaNK-Edge On the MiNiFi agent, we have a shell script that calls our Python 3 app. Shell cd /opt/demo/ python3 /opt/demo/enviroagent.py Step 1: Setting Up the MiNiFi Agent and Particle Sensor Before diving into NiFi, it is essential to have a functional MiniFi agent connected to a particle sensor. The sensor is connected and basic reporting is successful. Notice how high those numbers are. This is not a good sign. The MiNiFi agent is a lightweight data ingestion tool that runs on resource-constrained devices. By integrating a particle sensor, the agent can measure and transmit real-time PM2.5 data, acting as a valuable data source for our NiFi flow. After you add a new NAR, you have to refresh to see it. Click Refresh Manifest… to see any new processors available Add the NAR for Kafka if you want it in your MiNiFi Java Agent Refresh Complete MLX Thermal Camera CEM Monitor Dashboard Edge Events Open a Flow for each device agent class Let’s build our flow by adding processors. Run Shell Script In the first step, we use ExecuteProcess to run a shell script that calls my Python application for sensor reading. This script returns the results as JSON via standard out (STDOUT) which we can read in MiNiFi. Final MiNiFi Flow to Deploy In our second step in the flow, we set the user agent so we know who is calling our NiFi HTTP REST listener. Set Agent Name To Identify Flow A selection of processors available: Publish MiNiFi Flow Results from their agent For DevOps stuff like backup, we can use EFM’s Swagger REST interface. Swagger REST API Step 2: Install and Configure Apache NiFi Next, ensure that Apache NiFi is installed and ready to use. NiFi provides a graphical interface for constructing data flows, making it easier to design and manage the ingestion process. Once installed, configure NiFi to establish a connection with the MiniFi agent. This connection enables the flow of PM2.5 data from the sensor to NiFi for processing. Step 3: Designing the NiFi Flow The heart of our data ingestion solution lies in designing an efficient NiFi flow. Apache NiFi Start by creating a NiFi dataflow canvas and add relevant processors, controllers, and connections to construct the desired flow. In the case of PM2.5 data ingestion, key processors might include: ListenHTTP: Configured to receive data from the MiNiFi agent, this processor acts as the entry point for incoming PM2.5 measurements. ParseJSON: As the received data is likely in JSON format, this processor parses and extracts the relevant fields such as timestamp and PM2.5 concentration. RouteOnAttribute: With the extracted data, this processor allows for dynamic routing based on specific attributes. For example, you can route data to different branches based on PM2.5 concentration thresholds for further analysis. PutDatabaseRecord: This processor enables storing the PM2.5 data in a database for long-term storage and retrieval. These are just a few examples of processors you can use. The flexibility of NiFi allows for customization based on your specific requirements. NiFi — Received from MiNiFi Java Agent For today’s flow, we do some processing on it via record processors. Step 4: Enhancing the Flow With NiFi Features To create a robust and efficient PM2.5 data ingestion flow, leverage NiFi’s processors. For instance: Data validation: Implement data validation techniques to ensure the quality and integrity of the ingested PM2.5 data. You can utilize processors like ValidateRecord to enforce data integrity rules. Scalability and fault tolerance: NiFi offers the capability to scale horizontally, allowing you to distribute the workload across multiple NiFi instances. This enhances fault tolerance and ensures seamless data ingestion even in the face of failures. Data transformation and enrichment: Consider incorporating processors like UpdateAttribute, UpdateRecord, or MergeContent to enrich the PM2.5 data with additional information or transform it into a desired format before storage or further processing. Step 5: Finalize Schemas After enrichment, enhancement, cleaning, and validation, we can finalize our schemas for the incoming enviroplus data as well as the final cleaned-up particle stream. The Cloudera Schema Registry provides a host for our data schemas and a REST endpoint to use from SQL Stream Builder, NiFi, Java, Scala, and anyone who can read REST endpoints. Step 6: Data Streaming Our data is streaming into Kafka to buffer and distribute to all consumers everywhere. We can also geo-replicate via Streams Replication to any other cluster, availability zone, cloud, or cluster with no code. SMM View of our Topic: particles (data is avro with a schema) Below is a sample record we can peek at in the stream. Step 7: Run Continuous SQL Analytics We now can run some continuous SQL analytics on this data stream. We use standard Flink SQL supercharged with the Cloudera SQL Stream Builder UI. Let’s examine our “table” so we can do some queries. Schema Registry Schema connects to Kafka Topic A simple select * and we have all of our topic data available. select systemtime, adjtempf, humidity, gasKO, pm25, pm1, pm10, pm1atmos, pm25atmos, pm10atmos, oxidising, nh3 from `sr1`.`default_database`.`particles`; select systemtime, max(pm25) as maxpm25, max(pm1) as maxpm1, max(pm10) as maxpm10, max(nh3) as maxnh3, min(pm25) as minpm25, min(pm1) as minpm1, min(pm10) as minpm10, min(nh3) as minnh3, count(pm25) as RowCount from `sr1`.`default_database`.`particles` group by systemtime; An example record in JSON. { "uuid" : "air_uuid_kbr_20230608132527", "amplitude100" : 1.0, "amplitude500" : 0.3, "amplitude1000" : 0.2, "lownoise" : 0.4, "midnoise" : 0.2, "highnoise" : 0.3, "amps" : 0.3, "ipaddress" : "192.168.1.188", "host" : "rp4", "host_name" : "rp4", "macaddress" : "a6:e0:87:04:1a:75", "systemtime" : "2023-06-08T09:25:28.895187", "endtime" : "1686230728.89", "runtime" : "625.63", "starttime" : "06/08/2023 09:15:02", "cpu" : 0.3, "cpu_temp" : "53.5", "diskusage" : "29329.2 MB", "memory" : 5.8, "id" : "20230608132527_337aa313-278b-43d5-a4fd-6d39a8840822", "temperature" : "36.7", "adjtemp" : "29.4", "adjtempf" : "64.9", "temperaturef" : "78.1", "pressure" : 1003.2, "humidity" : 15.6, "lux" : 89.8, "proximity" : 0, "oxidising" : 76.2, "reducing" : 140.8, "nh3" : 50.9, "gasKO" : "Oxidising: 76188.84 Ohms\nReducing: 140805.11 Ohms\nNH3: 50944.44 Ohms", "pm25" : 183, "pm1" : 92, "pm10" : 195, "pm1atmos" : 60, "pm25atmos" : 121, "pm10atmos" : 129, "pmper1l03" : 15588, "pmper1l05" : 4565, "pmper1l1" : 1324, "pmper1l25" : 128, "pmper1l5" : 16, "pmper1l10" : 9 } Step 8: Create Dashboard for Visualization We can now create a basic dashboard for visualization. We could also use Jupyter Notebook hosted by Cloudera Machine Learning or use the Cloudera Data Visualization tool. An easy way to power visualization is with a materialized view in SQL Stream Builder that provides a REST interface to JSON data which is super easy to read. We could also utilize Iceberg tables, Phoenix tables, or Hive tables; but for today, let’s try the cool REST interface supplied by SSB. REST REST Data Once we have a query we like in SQL, you click Materialized View, and you can build a view. You need to click to create a new API key for security. Also, I recommend you Enable MV, Recreate on Job Start, and Ignore Nulls as options shown below. Also, pick a logical primary key from your query and set a retention time. Then you can click “Add New Query” to decide what fields and any parameters you want to pass. SQL Stream Builder (SSB) JQuery + Datatables.net + HTML5 + CSS Simple Display hosted by (python3 -m http.server 8000) Step 9: Notifications Let’s send some notifications and alerts to Slack. We can do this easily in Apache NiFi. Conclusion By leveraging the power of Apache NiFi, you can build a seamless flow for ingesting PM2.5 data from a MiniFi agent equipped with a particle sensor. This enables you to actively monitor and analyze air quality, empowering you to make informed decisions and take necessary actions. Deeper analytics and machine learning are enabled by other connected parts of the Data Platform. Particle Schema {"type":"record", "name":"particles", "namespace":"org.apache.nifi", "fields": [{"name":"uuid","type":["string","null"]}, {"name":"amplitude100","type":["double","null"]}, {"name":"amplitude500","type":["double","null"]}, {"name":"amplitude1000","type":["double","null"]}, {"name":"lownoise","type":["double","null"]}, {"name":"midnoise","type":["double","null"]}, {"name":"highnoise","type":["double","null"]}, {"name":"amps","type":["double","null"]}, {"name":"ipaddress","type":["string","null"]}, {"name":"host","type":["string","null"]}, {"name":"host_name","type":["string","null"]}, {"name":"macaddress","type":["string","null"]}, {"name":"systemtime","type":["string","null"]}, {"name":"endtime","type":["string","null"]}, {"name":"runtime","type":["string","null"]}, {"name":"starttime","type":["string","null"]}, {"name":"cpu","type":["double","null"]}, {"name":"cpu_temp","type":["string","null"]}, {"name":"diskusage","type":["string","null"]}, {"name":"memory","type":["double","null"]}, {"name":"id","type":["string","null"]}, {"name":"temperature","type":["string","null"]}, {"name":"adjtemp","type":["string","null"]}, {"name":"adjtempf","type":["string","null"]}, {"name":"temperaturef","type":["string","null"]}, {"name":"pressure","type":["double","null"]}, {"name":"humidity","type":["double","null"]}, {"name":"lux","type":["double","null"]}, {"name":"proximity","type":["int","null"]}, {"name":"oxidising","type":["double","null"]}, {"name":"reducing","type":["double","null"]}, {"name":"nh3","type":["double","null"]}, {"name":"gasKO","type":["string","null"]}, {"name":"pm25","type":["int","null"]}, {"name":"pm1","type":["int","null"]}, {"name":"pm10","type":["int","null"]}, {"name":"pm1atmos","type":["int","null"]}, {"name":"pm25atmos","type":["int","null"]}, {"name":"pm10atmos","type":["int","null"]}, {"name":"pmper1l03","type":["int","null"]}, {"name":"pmper1l05","type":["int","null"]}, {"name":"pmper1l1","type":["int","null"]}, {"name":"pmper1l25","type":["int","null"]}, {"name":"pmper1l5","type":["int","null"]}, {"name":"pmper1l10","type":["int","null"]} ]} References Extremely High Levels of PM2.5: Steps to Reduce Your Exposure Getting started with Enviro - Provisioning An outdoor air quality station with Enviro+ and Luftdaten Getting Started with Enviro+ Enviro for Raspberry Pi – Enviro GitHub: pimoroni/enviroplus-python GitHub: FLiP-Py-Pi-EnviroPlus GitHub: minifi-enviroplus
As the IT landscape continues to evolve rapidly, network security becomes increasingly crucial for safeguarding sensitive data and maintaining network integrity. Implementing robust access control measures, such as network security groups (NSGs), is essential to ensure network security. NSGs function as virtual firewalls, permitting or denying inbound and outbound traffic to and from Azure resources based on predefined rules. However, managing and monitoring NSGs can become complex, particularly when dealing with numerous rules and resources. It is imperative to maintain a balance between allowing legitimate traffic and maintaining a high level of security by ensuring defined rules are neither overly permissive nor overly restrictive. This article presents an approach that utilizes Apache Spark and Python code to identify the optimal set of user rules by analyzing Network Watcher Flow Events Logs. The proposed method aims to enhance the efficiency and effectiveness of managing NSGs while ensuring robust network security in Azure environments. Introduction Azure network security groups (NSG) allow or deny network traffic to virtual machine instances within a virtual network by activating rules or access control lists (ACLs). An NSG can be associated with a subnet or with an individual virtual machine instance within that subnet. All Virtual Machine instances connected to a subnet by an NSG are subject to its ACL rules. You can also restrict traffic to an individual virtual machine by associating an NSG directly with it. Each network security group contains a set of Default Rules. The default rules in each NSG include three inbound rules and three outbound rules. The default rules cannot be deleted, but since they are assigned the lowest priority, you can replace them with your own. Inbound vs. Outbound There are two types of NSGs: inbound and outbound. From a VM perspective, NSG ruleset direction is evaluated. Inbound rules, for instance, affect traffic initiated from external sources, such as the Internet or another virtual machine, to a virtual machine. Traffic sent from a VM is affected by outbound security rules. A session's return traffic is automatically allowed and is not validated against rules in the reverse direction. Our focus should be on allowing (or denying) the client-to-server direction of the session. Figure 1- Default Network Security Rule User Security Rules In the Network Security Group (NSG), we can apply the right rules with a high priority number to a network interface or a subnet to protect Azure resources. The following fields are included in each security rule: Rule name and description Priority number, which defines the position of the rule in the ruleset. Rules on the top are processed first; therefore, the smaller number has a higher preference Source and destination with port numbers (for TCP and UDP) IP protocol types, such as TCP, UDP, and ICMP. These 3 protocols cover almost all application requirements. “Any” keyword permits all IP protocols The user rules are applied on top of the default rules and restrict access based on IP address, port number, and protocol in the Network Security Group. NSG's inbound security rules are shown in Figure 2. We can also define outbound security rules on top of default rules. Figure 2 - Inbound User Security Rules There are times when User Rules can be overly permissive, even if Source/Destination is restricted by IP Address or IP Network Range. Figure 2 shows that 151 priority has an "Any" port, while all other rules have "Any" as the Destination, which is open to all network ranges. If the network team is uncertain about which ports/protocols/destinations/sources and destinations can be used between virtual machine networks, then micro-segmentation and permissive rules must be implemented. Micro-Segmentation Segmentation in the public cloud refers to the practice of dividing and isolating different components of a network or infrastructure to enhance security and control. It involves implementing various measures to prevent unauthorized access and restrict communication between different resources within the cloud environment. Network Security Groups (NSGs) are a fundamental tool for achieving segmentation in Azure virtual networks. Here's how NSGs work in the context of segmentation in the public cloud: Filtering inbound traffic: NSGs allow you to define inbound security rules that specify the allowed sources, protocols, ports, and destinations for incoming network traffic. By configuring these rules, you can restrict access to specific resources within your virtual network, such as virtual machines or applications. This helps protect sensitive data and prevents unauthorized access. Filtering outbound traffic: Similarly, NSGs enable you to define outbound security rules that control the flow of traffic leaving your virtual network. This allows you to restrict outgoing connections from specific resources or limit them to specific destinations, ports, or protocols. By implementing outbound rules, you can prevent data exfiltration and control the communication channels utilized by your resources. Traffic isolation: NSGs can be applied at the subnet level, allowing you to segment different parts of your virtual network. By associating NSGs with subnets, you can enforce specific security policies for each subnet, controlling the traffic between them. This enables you to create secure zones within your network, isolating different applications or tiers of your infrastructure. Network-level monitoring and logging: NSGs provide the ability to monitor and log network traffic. Azure provides diagnostic logging capabilities for NSGs, allowing you to capture and analyze network flow logs. By examining these logs, you can gain insights into network activity, identify potential security threats, and troubleshoot connectivity issues. By leveraging NSGs for network traffic filtering, Azure users can establish a strong foundation for segmentation in their public cloud environments. NSGs provide a flexible and scalable solution for enforcing security policies, controlling network traffic, and achieving granular isolation between resources. However, it's important to note that NSGs are just one component of a comprehensive security strategy, and additional security measures, such as network virtual appliances, intrusion detection systems, or secure gateways, may be necessary depending on specific requirements and compliance standards. Benefits of Micro-Segmentation and Permissive Rule Checks Micro-segmentation, achieved through proper configuration of NSG rules, provides several benefits in terms of network security: Enhanced Security: Micro-segmentation allows fine-grained control over network traffic, enabling organizations to restrict communication between resources based on specific rules. This helps prevent lateral movement within a network and limits the potential impact of a security breach. Improved Compliance: By implementing permissive rule checks, organizations can ensure that their NSGs comply with security best practices and regulatory requirements. This helps maintain a secure and compliant network infrastructure. Minimized Attack Surface: Micro-segmentation reduces the attack surface by limiting communication pathways between resources. It prevents unauthorized access and restricts the movement of malicious actors within the network. Simplified Network Management: Using Apache Spark for permission checks enables organizations to automate the analysis and monitoring of NSG rules, reducing the manual effort required for security audits. The distributed computing capabilities of Apache Spark enable efficient processing of large datasets, making it suitable for organizations with complex network infrastructures. Rapid Detection and Response: Micro-segmentation, coupled with permissive rule checks, enables organizations to quickly identify and respond to any unauthorized or suspicious network traffic. By analyzing the NSG logs and validating the rules, potential security incidents can be detected promptly. Network Security Flow Events and Logging Network security groups flow logging is a feature of Azure Network Watcher that allows you to log information about IP traffic flowing through a network security group. Flow data is sent to Azure Storage, from where you can access it and export it to any visualization tool, security information and event management (SIEM) solution, or intrusion detection system (IDS) of your choice. It's vital to monitor, manage, and know your network so that you can protect and optimize it. You need to know the current state of the network, who's connecting, and where users are connecting from. You also need to know which ports are open to the internet, what network behavior is expected, what network behavior is irregular, and when sudden rises in traffic happen. Flow logs are the source of truth for all network activity in your cloud environment. Whether you're in a startup that's trying to optimize resources or a large enterprise that's trying to detect intrusion, flow logs can help. You can use them for optimizing network flows, monitoring throughput, verifying compliance, detecting intrusions, and more. Here we are going to use Network Flow events to identify unknown or undesired traffic and identify the top talker in your network and remove overly permissive or restrictive traffic rules. NSG Log Schema We can start the PySpark code by defining the schema for NSG logs, which includes fields like category, macAddress, operationName, properties, resourceId, systemId, and time. This schema structure helps organize and process the log data efficiently. Python insightsNeworkFlowEventsSchema = StructType() \ .add("records", ArrayType(StructType() \ .add("category", StringType(), True) \ .add("macAddress", StringType(), True) \ .add("operationName", StringType(), True) \ .add("properties", StructType()\ .add("Version", LongType(), True) \ .add("flows", ArrayType(StructType() \ .add("flows", ArrayType(StructType() \ .add("flowTuples", ArrayType(StringType(), True)) \ .add("mac", StringType(), True)\ , True))\ .add("rule", StringType(), True) \ ), True)) \ .add("resourceId", StringType(), True) \ .add("systemId", StringType(), True) \ .add("time", StringType(), True) \ , True)) For more details about Azure NSG Flow Events, please refer to this website. To create the right permissive rules, I chose the following parameters from the Network Security Group. Resource ID or SystemID as one of the primary Key Rule Name, which is under Properties → Flows → Rule FlowTuples, which is under Properties → Flows → Flows → FlowTuples NSG Flow stores the flow events as JSON files which looks like below. JSON {"records":[{"time":"2023-01-26T17:30:53.8518900Z","systemId":"57785417-608e-4bba-80d6-25c3a0ebf423","macAddress":"6045BDA85225","category":"NetworkSecurityGroupFlowEvent","resourceId":"/SUBSCRIPTIONS/DA35404A-2612-4419-BAEF-45FCDCE6045E/RESOURCEGROUPS/ROHNU-RESOURCES/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/CVS-NSGLOGS-NSG","operationName":"NetworkSecurityGroupFlowEvents","properties":{"Version":2,"flows":[{"rule":"DefaultRule_DenyAllInBound","flows":[{"mac":"6045BDA85225","flowTuples":["1674754192,185.156.73.107,10.27.0.4,54227,46988,T,I,D,B,,,,","1674754209,185.156.73.150,10.27.0.4,43146,62839,T,I,D,B,,,,","1674754210,185.156.73.91,10.27.0.4,58965,63896,T,I,D,B,,,,","1674754212,89.248.163.30,10.27.0.4,52429,41973,T,I,D,B,,,,","1674754223,87.246.7.70,10.27.0.4,43000,8443,T,I,D,B,,,,","1674754236,92.255.85.15,10.27.0.4,41014,8022,T,I,D,B,,,,"]}]}]},{"time":"2023-01-26T17:31:53.8673108Z","systemId":"57785417-608e-4bba-80d6-25c3a0ebf423","macAddress":"6045BDA85225","category":"NetworkSecurityGroupFlowEvent","resourceId":"/SUBSCRIPTIONS/DA35404A-2612-4419-BAEF-45FCDCE6045E/RESOURCEGROUPS/ROHNU-RESOURCES/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/CVS-NSGLOGS-NSG","operationName":"NetworkSecurityGroupFlowEvents","properties":{"Version":2,"flows":[{"rule":"DefaultRule_AllowInternetOutBound","flows":[{"mac":"6045BDA85225","flowTuples":["1674754265,10.27.0.4,20.44.10.123,49909,443,T,O,A,B,,,,","1674754265,10.27.0.4,52.152.108.96,49910,443,T,O,A,B,,,,","1674754267,10.27.0.4,52.152.108.96,49911,443,T,O,A,B,,,,","1674754267,10.27.0.4,20.44.10.123,49912,443,T,O,A,B,,,,","1674754268,10.27.0.4,52.185.211.133,49913,443,T,O,A,B,,,,","1674754268,10.27.0.4,20.44.10.123,49914,443,T,O,A,B,,,,","1674754271,10.27.0.4,20.44.10.123,49909,443,T,O,A,E,1,66,1,66","1674754271,10.27.0.4,52.152.108.96,49910,443,T,O,A,E,24,12446,1,66","1674754273,10.27.0.4,20.44.10.123,49912,443,T,O,A,E,15,3542,12,5567","1674754274,10.27.0.4,52.185.211.133,49913,443,T,O,A,E,12,1326,10,4979","1674754277,10.27.0.4,20.44.10.123,49914,443,T,O,A,E,13,2922,14,5722","1674754278,10.27.0.4,23.0.198.228,49916,443,T,O,A,B,,,,","1674754279,10.27.0.4,104.102.142.78,49918,443,T,O,A,B,,,,","1674754279,10.27.0.4,104.102.142.78,49917,443,T,O,A,B,,,,","1674754280,10.27.0.4,13.107.4.50,49919,80,T,O,A,B,,,,","1674754280,10.27.0.4,13.107.4.50,49920,80,T,O,A,B,,,,","1674754280,10.27.0.4,13.107.4.50,49921,80,T,O,A,B,,,,","1674754280,10.27.0.4,13.107.4.50,49922,80,T,O,A,B,,,,","1674754281,10.27.0.4,52.152.108.96,49911,443,T,O,A,E,87,11226,1093,1613130","1674754284,10.27.0.4,104.208.16.88,49923,443,T,O,A,B,,,,","1674754284,10.27.0.4,20.72.205.209,49924,443,T,O,A,B,,,,","1674754289,10.27.0.4,13.107.4.50,49925,80,T,O,A,B,,,,","1674754290,10.27.0.4,104.208.16.88,49923,443,T,O,A,E,14,2877,13,5627","1674754291,10.27.0.4,20.72.205.209,49924,443,T,O,A,E,12,1452,10,4692","1674754300,10.27.0.4,20.50.80.209,49927,443,T,O,A,B,,,,","1674754306,10.27.0.4,20.50.80.209,49927,443,T,O,A,E,10,3220,9,5415"]}]},{"rule":"DefaultRule_DenyAllInBound","flows":[{"mac":"6045BDA85225","flowTuples":["1674754254,89.248.165.197,10.27.0.4,46050,41834,T,I,D,B,,,,","1674754255,45.143.200.102,10.27.0.4,44049,49361,T,I,D,B,,,,","1674754263,51.91.172.152,10.27.0.4,53162,5985,T,I,D,B,,,,","1674754297,122.116.9.72,10.27.0.4,58757,23,T,I,D,B,,,,"]}]}]}]} Enable Network Security Group Logging Network Watcher in Azure stores NSG Flow events in an Azure Storage account using the diagnostic settings feature. When enabled, diagnostic settings allow the Network Watcher to send NSG Flow events to a specified storage account for retention and analysis. The NSG Flow events are stored in the storage account in the form of log files. Each log file contains information about the network flows captured by the Network Watcher, including source and destination IP addresses, ports, protocols, timestamps, and action (allow/deny). The storage account can be configured to store NSG Flow events in a specific container within the storage account. The log files are typically stored in the Azure Blob storage service, which provides scalable and durable storage for unstructured data. By leveraging the diagnostic settings and Azure Storage account, organizations can effectively collect and retain NSG Flow events for analysis, monitoring, and compliance purposes. This data can then be used for various security and network analysis scenarios to gain insights into network traffic patterns and identify potential security threats or anomalies. Plain Text Note: I strongly recommend creating an ADLS Gen2 Storage account to store the NSG Flow Events with at least 7 days retention policy Figure 3 - Enable NSG Flow logging using Network Watcher The directory structure of NSG Flow events stored in an Azure Storage account typically follows a hierarchical organization. Here is an example of a possible directory structure: Python abfs://insights-logs-networksecuritygroupflowevent@<storageaccount>.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/<Subscriptions>/RESOURCEGROUPS/<ResourceGroups>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<NetworkSecurityGroup>/y={year}/m={month}/d={day}/h={hours}/m=00/mac={macID}/PT1H.json Figure 4 - Directory structure of the NSG flow events The storage account serves as the top-level container for storing different types of data. Within the storage account, a container is created with the name “” to hold the NSG Flow event logs specifically. The NSG Flow events are then organized based on the subscriptions, resource groups, and network security groups of when they were captured. The logs are typically organized in a hierarchical manner, starting from the year, followed by the month, day, and hour of capture. Under each hour directory, the NSG Flow event log files are stored as PT1H.json under the MAC address. These log files contain the actual captured network flow data, usually in a structured format such as JSON. This directory structure allows for easy organization and retrieval of NSG Flow events based on the specific time period when the events occurred. It enables efficient querying and analysis of the logs based on the desired time range or granularity. How To Read NSG Flow Files Using Pyspark Below provided code is written in Python and utilizes Apache Spark to read and process NSG (Network Security Group) Flow logs stored in Azure Blob Storage. The code leverages Apache Spark’s distributed computing capabilities to handle large datasets efficiently and perform the required calculations in a parallelized manner. It utilizes Spark SQL functions and operations to manipulate and analyze the data effectively. Let's break down the code step by step: Spark Configuration SparkConf() creates a Spark configuration object. .setAppName(appName) sets the name of the Spark application. .setAll([...]) sets additional configuration properties for Spark, such as enabling case sensitivity in SQL queries, setting the number of shuffle partitions, and specifying the SAS (Shared Access Signature) token for accessing the Azure Blob Storage. Spark Session and Spark Context SparkSession.builder creates a SparkSession, which is the entry point for working with structured data in Spark. .config(conf=sparkConf) applies the previously defined Spark configuration. .getOrCreate() retrieves an existing SparkSession or creates a new one if none exists. spark.sparkContext gets the Spark Context (sc) from the SparkSession. Hadoop File System Configuration sc._gateway.jvm provides access to the Java Virtual Machine (JVM) running Spark. java.net.URI, org.apache.hadoop.fs.Path, org.apache.hadoop.fs.FileSystem, and org.apache.hadoop.conf.Configuration are Java classes used for working with Hadoop Distributed File System (HDFS). sc._jsc.hadoopConfiguration().set(...) sets the SAS token for accessing the Azure Blob Storage in the Hadoop configuration. One of the Options to Access NSG Flow Logs at the Subscription Level i.e. Option 1 fs = FileSystem.get(...) creates a FileSystem object to interact with the Azure Blob Storage. fs.listStatus(Path("/resourceId=/SUBSCRIPTIONS/")) retrieves the status of files/directories in the specified path ("/resourceId=/SUBSCRIPTIONS/"). The code then iterates through the files and directories to construct the NSG Flow Logs' paths based on the subscription, resource group, and date. The NSG Flow Logs' paths are stored in a dictionary (NSGDict) for further processing. print(NSGStatus) is outside the loop and represents the last value of NSGStatus. It will print the NSG Flow Log path for the most recent subscription, resource group, and date. Another Option Is to Access NSG Flow Logs Entirely Option 2 provides an alternative option for reading NSG logs at once using a regular expression with "*." It constructs a path pattern with placeholders for subscription, resource group, date, and hour. Python from pyspark.sql import SparkSession from pyspark.conf import SparkConf # Create Spark configuration spark_conf = SparkConf() \ .setAppName(app_name) \ .setAll([ ('spark.sql.caseSensitive', 'true'), ('fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container, blob_account), blob_sas_token), ("spark.sql.shuffle.partitions", "300"), ("spark.default.parallelism", "300") ]) # Create Spark session spark = SparkSession.builder.config(conf=spark_conf).getOrCreate() sc = spark.sparkContext # Set Hadoop configuration for Azure Blob Storage sc._jsc.hadoopConfiguration().set('fs.azure.sas.%s.%s.dfs.core.windows.net' % (blob_container, blob_account), blob_sas_token) # OPTION 1 - Read the NSG Flow Logs at subscription level and create dictionary URI = sc._gateway.jvm.java.net.URI Path = sc._gateway.jvm.org.apache.hadoop.fs.Path FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration fs = FileSystem.get(URI.create("abfs://insights-logs-networksecuritygroupflowevent@storage_account.dfs.core.windows.net"), Configuration()) status = fs.listStatus(Path("/resourceId=/SUBSCRIPTIONS/")) nsg_dict = dict() for file_status in status: subscription_name = str(file_status.getPath().getName()) resource_group_path = "/resourceId=/SUBSCRIPTIONS/"+subscription_name+"/RESOURCEGROUPS/*/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/" resource_group_status = fs.globStatus(Path(resource_group_path+"*/")) for resource_group_file in resource_group_status: nsg_path = str(resource_group_file.getPath()) nsg_status = nsg_path + f"/y={year}/m={month}/d={day}" if fs.exists(Path(nsg_status)): if subscription_name in nsg_dict: nsg_dict[subscription_name].extend([nsg_status]) else: nsg_dict[subscription_name] = [nsg_status] print(nsg_status) # OPTION 2 - Read all the NSG logs at once with "*" Regular expression like below value = "abfs://insights-logs-networksecuritygroupflowevent@<storage_account>.dfs.core.windows.net/resourceId=/SUBSCRIPTIONS/*/RESOURCEGROUPS/*/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/*/y={year}/m={month}/d={day}/h={hours}" key = "All_Subscriptions" To determine when to use Option 1 or Option 2 for reading NSG flow logs, you can consider the following factors: Number of Subscriptions: Option 1 is recommended when the number of subscriptions in Azure is quite high. In this case, Option 1 allows you to process the NSG logs in parallel using multithreading, utilizing the full processing power of your system. The code snippet you provided utilizes the multiprocessing.pool.ThreadPool to create a pool of threads (75 in this case) and processes each subscription in parallel. This approach helps in distributing the workload across multiple threads and improves overall processing efficiency. Total Size of NSG Flow Events: Option 1 is also preferable when the total size of the NSG flow events per day is quite high. By utilizing multithreading, you can process multiple NSG logs concurrently, reducing the overall processing time. This is particularly beneficial when dealing with large amounts of data, as it allows for parallel processing and efficient utilization of system resources. Python import concurrent.futures def parallel(x): try: value =nsg_dict[x] key = x print("Subscriptions:", key) processNSGRule(value, key) except Exception as e: print(e) with concurrent.futures.ThreadPoolExecutor(max_workers=75) as executor: executor.map(parallel,nsg_dict) Simplicity and Resource Constraints: Option 2 is recommended when the number of subscriptions is manageable and the total size of NSG flow events per day is relatively small. Option 2 involves reading the entire NSG log file at once with a single read operation, making it a straightforward and simpler approach. This approach is suitable when resource constraints or processing time are not major concerns. In summary, if you have a large number of subscriptions in Azure or the total size of NSG flow events per day is significant, Option 1, using parallel processing with multithreading, is recommended. This allows for efficient utilization of system resources and faster processing. On the other hand, if the number of subscriptions is manageable and the total size of NSG flow events is relatively small, Option 2 provides a simpler and straightforward approach. Load the NSG Flow File Once the files and directories from the storage account are retrieved, the next step is to load them into Spark as a dataframe. In the provided code snippet, therecursiveFileLookup option is used, which means that Spark will traverse through directories called "hour," "minute," and "macAddress" within the file path even after reaching the "day" folder. When loading JSON files into Spark, the inferSchema option is enabled by default. This allows Spark to analyze the JSON files and automatically infer the schema while loading them into a dataframe. However, there are some downsides to using inferSchema.One downside is that Spark needs to read each file to analyze its schema before loading it. This process can have a significant impact on the performance of Spark, especially when dealing with a large number of files or large file sizes. Reading and analyzing the schema of each file individually can be time-consuming and resource-intensive. To overcome this, it is strongly recommended to provide a predefined schema while loading the JSON files into Spark. By providing a schema, Spark can bypass the schema inference step and directly load the files based on the provided schema. This approach improves the performance of Spark by eliminating the need for schema analysis for each file. Creating a schema for the JSON files can be done manually by defining the structure and data types of the JSON fields. This can be achieved using the StructType and StructField classes in Spark. Once the schema is defined, it can be passed to the spark.read.json() method as an argument, ensuring that Spark uses the predefined schema for loading the files. Please refer NSG Log Schema Section. By providing a predefined schema, Spark can efficiently load the JSON files as dataframes without the overhead of schema inference. This approach enhances the performance of Spark, especially when dealing with large volumes of data. Additionally, it provides better control over the schema and ensures consistency in the data structure, improving the reliability of subsequent data processing and analysis tasks. Python spark.read.option("recursiveFileLookup","true").format("json").schema(insightsNeworkFlowEventsSchema).load(FILE_LOCATION) Parsing the NSG Flow Events JSON File The provided code defines a function called NSGruleDef that processes NSG (Network Security Group) flow logs using Spark DataFrames. Let's break down the code step by step: Loading NSG Flow Logs: spark.read.option("recursiveFileLookup", "true")sets the recursive file lookup option to enable traversal through nested directories. .format("json") specifies that the files being loaded are in JSON format. .schema(insightsNeworkFlowEventsSchema) specifies the predefined schema (insightsNeworkFlowEventsSchema) to be used while loading the JSON files. .load(filepath) loads the JSON files from the provided filepath into a DataFrame called loadNSGDF. Exploding Nested Structures: explodeNSGDF = loadNSGDF.select(explode("records").alias("record")) uses the explode function to flatten the nested records structure within loadNSGDF DataFrame. Each record is treated as a separate row in explodeNSGDF. parsedNSGDF = explodeNSGDF.select(col("record.resourceId").alias("resource_id"),col("record.properties").getField("flows").alias("flows")) extracts specific columns from explodeNSGDF, including resourceId and flows (which represent the flow data). Exploding Flow Tuples: explodeFlowsDF = parsedNSGDF.withColumn("flow", explode("flows")).select("resource_id", col("flow.rule").alias("rule_name"), col("flow.flows.flowTuples").alias("flow_tuples")) uses the explode function again to expand the flows column into multiple rows, creating a new column called flow. It also selects the resource_id, rule_name, and flow_tuples columns. Filtering NSG Allow Rules: filterNSGAllowDF = explodeFlowsDF.where(~col('rule_name').contains('Deny')) filters out the rows where the rule_name column does not contain the string 'Deny'. This step retains only the rows representing allowed (non-denied) rules. Plain Text Please note that in order to effectively manage permissive rules of the network Security group, in our case, we propose the rules to be set only for the Allowed Inbound and Outbound Rules within the network Security group. Thus, we ignore denied rules in this case. Exploding Flow Tuples: explodeFlowTuplesDF = filterNSGAllowDF.select("resource_id", "rule_name", explode(col("flow_tuples")).alias("flow_rules")) further expands theflow_tuples column into separate rows using the explode function. It creates a new DataFrame called explodeFlowTuplesDF with columns resource_id, rule_name, and flow_rules. Grouping Flow Tuples: groupFlowTuplesDF = explodeFlowTuplesDF.groupBy("resource_id", "rule_name").agg(collect_set("flow_rules").alias("collect_flow_tuples") Python def processNSGRule(filepath, subscription): loadNSGDF = spark.read.option("recursiveFileLookup", "true").format("json").schema(insightsNeworkFlowEventsSchema).load(filepath) explodeNSGDF = loadNSGDF.select(explode("records").alias("record")) parsedNSGDF = explodeNSGDF.select(col("record.resourceId").alias("resource_id"), col("record.properties").getField("flows").alias("flows")) explodeFlowsDF = parsedNSGDF.withColumn("flow", explode("flows")).select("resource_id", col("flow.rule").alias("rule_name"), col("flow.flows.flowTuples").alias("flow_tuples")) filterNSGAllowDF = explodeFlowsDF.where(~col('rule_name').contains('Deny')) explodeFlowTuplesDF = filterNSGAllowDF.select("resource_id", "rule_name", explode(col("flow_tuples")).alias("flow_rules")) groupFlowTuplesDF = explodeFlowTuplesDF.groupBy("resource_id", "rule_name").agg(collect_set("flow_rules").alias("collect_flow_tuples")) collectFlowTuplesDF = groupFlowTuplesDF.select("system_id", "rule_name", collect_arrays_udf("collect_flow_tuples").alias("flow_rules")).select("resource_id", "rule_name", "flow_rules") Selecting Columns and Applying UDF: groupFlowTuplesDF.select("system_id", "rule_name", collect_arrays_udf("collect_flow_tuples").alias("flow_rules")) selects the columns system_id, rule_name, and collect_flow_tuples from the groupFlowTuplesDF DataFrame. collect_arrays_udf refers to a user-defined function (UDF) that takes the collect_flow_tuples column as input and merges multiple arrays into one array. The UDF aggregates the elements of the collect_flow_tuples column into an array or list structure. The resulting column is then aliased as flow_rules. Selecting Final Columns: .select("resource_id", "rule_name", "flow_rules") selects the columns resource_id, rule_name, and flow_rules from the intermediate DataFrame. This step ensures that the final DataFrame, named collectFlowTuplesDF, contains the desired columns for further processing or analysis. Python from pyspark.sql.functions import udf from pyspark.sql.types import ArrayType, StringType def collect_arrays(list_array): collect_array = [] for i in range(len(list_array)): collect_array.extend(list_array[i]) return collect_array collect_arrays_udf = udf(collect_arrays, ArrayType(StringType())) The code processes the NSG flow logs, extracts relevant information, and transforms the data into a structured format using Spark DataFrames. The resulting DataFrame, collectFlowTuplesDF, contains the resource ID, rule name, and the corresponding flow rules for each NSG. NSG Data Transformation in Spark Network Security Group Data is loaded as a Spark DataFrame with multiple records. Each record in the DataFrame contains information about a network security group and its respective rules and flowtuples. To process this data effectively, the records are split into three columns: resourceID, ruleName, and flowTuples, following the parsing format for NSG flow events. This step allows for easier analysis and manipulation of the data. To combine the flowTuples belonging to the same resourceID and ruleName into a single row, the groupBy operation is used on the DataFrame, grouping the data based on these two columns. This process may appear straightforward, but in Spark transformation, it actually creates partitions based on resourceID and ruleName. In essence, the groupBy operation performs a similar function to the repartition (resourceID, ruleName) operation in Spark. Using groupBy has its advantages and disadvantages. Let's start with the positive aspects. If the NSG user rules and default rules are evenly distributed and created with less permissiveness from the beginning, then there won't be any major issues. The groupBy operation will successfully combine the flowTuples for each resourceID and ruleName, resulting in the desired output. However, there are negative aspects to consider when using groupBy. One of the main concerns is that it can create uneven partitions, leading to data skew. Data skew occurs when the distribution of data across partitions is not balanced, causing certain partitions to contain significantly more data than others. This can have a negative impact on the performance of Spark jobs, as some partitions may take longer to process, resulting in a bottleneck. In some cases, the data skew can be severe enough that a Spark executor exceeds the maximum buffer size allowed by the Kyro serializer, which is typically set to 2GB. When this happens, the job fails to execute successfully. To avoid such failures, it is crucial to carefully analyze the data and determine whether using groupBy is the best approach. Data Skew Spark data skew refers to an imbalance in the distribution of data across partitions when performing parallel processing in Apache Spark. This skew can occur when the data has a non-uniform distribution or when certain keys or values are disproportionately more common than others. This leads to some partitions processing much larger amounts of data than others, causing a performance bottleneck and hindering the efficiency of Spark computations. To explain this concept visually, let's consider a simple example. Imagine we have a dataset of customer transactions, where each transaction is associated with a customer ID and a monetary value. We want to perform some aggregations on this dataset, such as calculating the total transaction amount for each customer. In a distributed environment, the data is divided into multiple partitions, and each partition is processed independently by a worker node. In an ideal scenario, the data would be evenly distributed across partitions, as shown in the diagram below: Plain Text Partition 1: [Customer1, Customer2, Customer3, Customer4] Partition 2: [Customer5, Customer6, Customer7, Customer8] In this balanced scenario, each partition contains an equal number of customers, and the processing is distributed evenly across the worker nodes, resulting in optimal performance. However, in the case of data skew, the distribution is imbalanced, as illustrated in the diagram below: Plain Text Partition 1: [Customer1, Customer2, Customer3, Customer4, Customer5, Customer6] Partition 2: [Customer7, Customer8] In this skewed scenario, Partition 1 has a significantly larger number of customers compared to Partition 2. As a result, the worker processing Partition 1 will have a heavier workload, while the worker processing Partition 2 will finish its task much faster. This imbalance leads to a performance bottleneck, as the overall processing time is determined by the slowest worker. To address data skew, Spark provides techniques such as data repartitioning, which redistributes the data across partitions in a more balanced manner. For example, one could use a technique like salting, where a random value is added as a prefix to the key (in this case, the customer ID) to ensure a more uniform distribution of data across partitions. By achieving a balanced data distribution across partitions, Spark can leverage the full parallel processing capabilities of the cluster, improving overall performance and eliminating bottlenecks caused by data skew. Now we understand what data skew is and how it impacts NSG Data analysis. When we do not use groupBy, the output can be noticeably different. If the groupBy operation is omitted, the result will contain less permissive rules, as they won't be consolidated. To consolidate the rules again, a different script would be required. Alternatively, a weekly or monthly consolidation script can be implemented to overcome this problem and ensure that the rules are appropriately combined. While the process of loading NSG Data as a Spark DataFrame and using groupBy to combine flowTuples appears simple, it involves partitioning the data based on resourceID and ruleName. The use of groupBy can have positive implications when the NSG rules are evenly distributed and less permissive. However, it can also lead to uneven partitions and data skew, potentially causing Spark executor failures. It is crucial to analyze the data and determine whether using groupBy is the most suitable approach, considering the potential downsides and exploring alternative consolidation strategies if necessary. Core UDFs and Functions To Define NSG Rules The code includes several user-defined functions (UDFs), and helper functions to perform specific tasks. These functions include: validateIPv4Address The function mentioned checks whether an IP address is a valid IPv4 address. It is important to verify if an IP address is in the IPv4 format because the NSG rule mentioned in the code specifically works with IPv4 addresses. Here is an explanation of how the function performs this check: Python from ipaddress import IPv4Address def validateIPv4Address(ipv4string): try: ip_object = IPv4Address(ipv4string) return True except ValueError: return False Explanation of the function: The functionis_valid_ipv4_address takes anip_address as input. To validate IPv4Addresses, I used ipaddress module, developed in Python3.6. Some modules are already installed, such as ipaddress. The function returns False if the IP address check throws an error or fails. The function returns True if the checks are passed, and the IP address provided is valid IPv4. check_subnet The provided code defines a function called check_subnetthat determines the IP network range based on the octets of the destination IP address. Here is a detailed explanation of how the function works: Python def check_subnet(dest_four, dest_three, dest_two, dest_one, dest_ip): """ check_subnet function checks each destination IP based on the length of four/three/two/one octet destination IP list, and returns the destination IP network ranges. :param dest_four: Number of four octets in the destination IP address :param dest_three: Number of three octets in the destination IP address :param dest_two: Number of two octets in the destination IP address :param dest_one: Number of one octets in the destination IP address :param dest_ip: Destination IP address :return: Destination IP network range """ if dest_four <= 10: destination = f"{dest_ip}/32" elif dest_three <= 10: dest_ip_part = dest_ip.split(".") dest_ip = ".".join(dest_ip_part[:3]) destination = f"{dest_ip}.0/24" elif dest_two <= 10: dest_ip_part = dest_ip.split(".") dest_ip = ".".join(dest_ip_part[:2]) destination = f"{dest_ip}.0.0/16" elif dest_one <= 10: dest_ip_part = dest_ip.split(".") dest_ip = ".".join(dest_ip_part[:1]) destination = f"{dest_ip}.0.0.0/8" else: destination = "0.0.0.0/0" return destination Explanation of the function: The function then checks the number of octets in the destination IP address against certain thresholds to determine the appropriate IP network range. Plain Text Note: In this example, I used 10 as the length of the IP address in order to check, but we can define it according to our needs. If the number of four octets (dest_four) is less than or equal to 10, it means the destination IP address has a length of four octets, and the network range is set to the destination IP address with a subnet mask of /32 (indicating a single IP address). If the number of three octets (dest_three) is less than or equal to 10, it means the destination IP address has a length of three octets. The function splits the destination IP address by the dot ('.') separator and joins the first three octets together. The network range is set to this joined IP address with a subnet mask of /24 (indicating a range of 256 IP addresses). If the number of two octets (dest_two) is less than or equal to 10, it means the destination IP address has a length of two octets. Similarly, the function joins the first two octets of the destination IP address and sets the network range to this joined IP address with a subnet mask of /16 (indicating a range of 65,536 IP addresses). If the number of one octet (dest_one) is less than or equal to 10, it means the destination IP address has a length of one octet. The function joins the first octet of the destination IP address and sets the network range to this joined IP address with a subnet mask of /8 (indicating a range of 16,777,216 IP addresses). If none of the above conditions are met, it means the destination IP address does not fall into any specific length category, and the network range is set to the default value of 0.0.0.0/0 (indicating all IP addresses). Finally, the function returns the determined destination IP network range. By using this function, you can determine the appropriate IP network range based on the length of the destination IP address in octets. group_ip_network This function groups source and destination IP addresses into their respective IP networks and combines them with source ports, destination ports, and protocols. The provided code defines a function called group_ip_network that processes a list of flow tuples related to a network security group (NSG). The function converts source IP addresses and destination IP addresses into source networks and destination networks. It also combines them with source ports, destination ports, and protocols. Here's an explanation of how the function works: The function initializes several empty sets, and a dictionary called ip_network_dict to store the IP network ranges. It iterates over each flow tuple in the flow_list parameter. Within the loop, the function performs the following actions for each flow tuple: Adds the tuple's source and destination IP addresses as four-octet values to the four_octets_list set. Splits the source and destination IP addresses by the dot ('.') separator and adds the three-octet values to the respective sets (source_three_octets and dest_three_octets). Adds the combination of the source three-octet IP address, destination IP address, source port, destination port, and protocol as a tuple to thethree_octets_list set. Similar actions are performed for two-octet and one-octet values, adding them to their respective sets (source_two_octets, dest_two_octets, two_octets_list, source_one_octets, dest_one_octets, one_octets_list). The function then checks the lengths of the sets containing the different octet values of the source IP address (source_four_octets, source_three_octets, source_two_octets, source_one_octets). If the length of source_four_octets is less than or equal to 10, the function iterates over the four_octets_list set and creates the destination IP network ranges using the check_subnet function. It combines the source network, destination network, source port, destination port, and protocol into a string (source) and stores it in the ip_network_dict dictionary, where the source IP network range is the key, and the associated source ports are stored as a list. If the source IP network range already exists in the dictionary, the source ports are appended to the existing list. Similar actions are performed for lengths of source_three_octets, source_two_octets, and source_one_octets. The appropriate subnet masks are used based on the length of the octets. If none of the above conditions are met, it means the length of the source IP address does not fall into any specific length category. In this case, the function assigns a source IP network range of 0.0.0.0/0 (indicating all IP addresses). The function then joins the elements of the ip_network_dict dictionary into a list called socket_range. Each element is a string consisting of the source IP network range, destination IP network range, source port, destination port, and protocol, separated by semicolons. Finally, the function returns the socket_range list, which contains the combined information of source IP network ranges, destination IP network ranges, ports, and protocols for each flow tuple. In summary, the group_ip_network function processes flow tuples from a network security group and converts IP addresses into IP network ranges. It combines them with ports and protocols, storing the information in a dictionary and returning a list of strings representing the combined data for each flow tuple. Python def group_ip_network(flow_list): ip_network_dict = dict() four_octets_set = set() source_four_octets_set = set() dest_four_octets_set = set() source_three_octets_set = set() dest_three_octets_set = set() three_octets_set = set() source_two_octets_set = set() dest_two_octets_set = set() two_octets_set = set() source_one_octet_set = set() dest_one_octet_set = set() one_octet_set = set() for ip in flow_list: four_octets_set.add((ip[1], ip[2], ip[3], ip[4], ip[5], ip[6])) source_ip_parts = ip[1].split(".") destination_ip_parts = ip[2].split(".") # Four Octet List of Source and Destination IP source_four_octets_set.add(ip[1]) dest_four_octets_set.add(ip[2]) # Three Octet List of Source and Destination IP source_three_octets_set.add(".".join(source_ip_parts[:3])) dest_three_octets_set.add(".".join(destination_ip_parts[:3])) three_octets_set.add((".".join(source_ip_parts[:3]), ip[2], ip[3], ip[4], ip[5], ip[6])) # Two Octet List of Source and Destination IP source_two_octets_set.add(".".join(source_ip_parts[:2])) dest_two_octets_set.add(".".join(destination_ip_parts[:2])) two_octets_set.add((".".join(source_ip_parts[:2]), ip[2], ip[3], ip[4], ip[5], ip[6])) # One Octet List of Source and Destination IP source_one_octet_set.add(".".join(source_ip_parts[:1])) dest_one_octet_set.add(".".join(destination_ip_parts[:1])) one_octet_set.add((".".join(source_ip_parts[:1]), ip[2], ip[3], ip[4], ip[5], ip[6])) # If conditions check the length of four/three/two/one octets of source IP list # If the length is less than or equal to 10, it returns Source IP network ranges if len(source_four_octets_set) <= 10: for four_octet in four_octets_set: destination = check_subnet(len(dest_four_octets_set), len(dest_three_octets_set), len(dest_two_octets_set), len(dest_one_octet_set), four_octet[1]) source = f"{four_octet[0]}/32;{destination};{four_octet[3]};{four_octet[4]};{four_octet[5]}" s_port = four_octet[2] if source in ip_network_dict: ip_network_dict[source].extend([s_port]) else: ip_network_dict[source] = [s_port] elif len(source_three_octets_set) <= 10: for three_octet in three_octets_set: destination = check_subnet(len(dest_four_octets_set), len(dest_three_octets_set), len(dest_two_octets_set), len(dest_one_octet_set), three_octet[1]) source = f"{three_octet[0]}.0/24;{destination};{three_octet[3]};{three_octet[4]};{three_octet[5]}" s_port = three_octet[2] if source in ip_network_dict: ip_network_dict[source].extend([s_port]) else: ip_network_dict[source] = [s_port] elif len(source_two_octets_set) <= 10: for two_octet in two_octets_set: destination = check_subnet(len(dest_four_octets_set), len(dest_three_octets_set), len(dest_two_octets_set), len(dest_one_octet_set), two_octet[1]) source = f"{two_octet[0]}.0/16;{destination};{two_octet[3]};{two_octet[4]};{two_octet[5]}" s_port = two_octet[2] if source in ip_network_dict: ip_network_dict[source].extend([s_port]) else: ip_network_dict[source] = [s_port] elif len(source_one_octet_set) <= 10: for one_octet in one_octet_set: destination = check_subnet(len(dest_four_octets_set), len(dest_three_octets_set), len(dest_two_octets_set), len(dest_one_octet_set), one_octet[1]) s_port = one_octet[2] source = f"{one_octet[0]}.0/8;{destination};{one_octet[3]};{one_octet[4]};{one_octet[5]}" if source in ip_network_dict: ip_network_dict[source].extend([s_port]) else: ip_network_dict[source] = [s_port] else: for octet in four_octets_set: destination = check_subnet(len(dest_four_octets_set), len(dest_three_octets_set), len(dest_two_octets_set), len(dest_one_octet_set), octet[1]) source = f"0.0.0.0/0;{destination};{octet[3]};{octet[4]};{octet[5]}" s_port = octet[2] if source in ip_network_dict: ip_network_dict[source].extend([s_port]) else: ip_network_dict[source] = [s_port] # Join the elements of ip_network_dict dictionary into a list with ';' socket_range = [] for key, value in ip_network_dict.items(): socket_range.append(f"{key};{value}") return socket_range dest_ip_network_range This function classifies destination IP addresses into Class A/B/C and other IP ranges and groups them into IP network ranges. The provided code is a function named classify_destination_ips that takes a list of destination IPs as input and classifies them into different IP ranges. Here's a breakdown of how the code works: The function initializes several empty lists: ip_ranges, class_a, class_b, class_c, and other_ips. These lists will be used to store the classified IP addresses. The code then iterates over each destination IP in the destinations list. Inside the loop, it first checks if the destination IP is a valid IPv4 address by calling the validate_ipv4_address function. If it is a valid IP, the code proceeds with the classification. For each valid IP, it checks if it falls within a specific IP range. If the IP falls within the range of Class A (10.0.0.0 - 10.255.255.255), it adds the destination to the class_a list. If it falls within the range of Class B (172.16.0.0 - 172.31.255.255), it adds the destination to the class_b list. If it falls within the range of Class C (192.168.0.0 - 192.168.255.255), it adds the destination to the class_c list. If it doesn't fall into any of these ranges, it adds the destination to the other_ips list. After classifying all the IP addresses, the code checks if there are any IP addresses in the other_ips list. If there are, it calls the group_ip_network function to group them into IP ranges and appends the resulting ranges to the ip_ranges list. Similarly, the code checks if there are any IP addresses in the class_a, class_b, and class_c lists. If there are, it calls the group_ip_network function for each list and appends the resulting ranges to the ip_ranges list. Finally, the function returns the ip_ranges list, which contains all the classified IP ranges. Python def classify_destination_ips(destinations): """ Classify the destination IPs of flow tuples based on Class A/B/C and other IP :param destinations: List of destination IPs :return: List of IP ranges """ ip_ranges = [] class_a = [] class_b = [] class_c = [] other_ips = [] for dest in destinations: if validate_ipv4_address(dest[2]): if IPv4Address(dest[2]) >= IPv4Address('10.0.0.0') and IPv4Address(dest[2]) <= IPv4Address('10.255.255.255'): class_a.append(dest) elif IPv4Address(dest[2]) >= IPv4Address('172.16.0.0') and IPv4Address(dest[2]) <= IPv4Address('172.31.255.255'): class_b.append(dest) elif IPv4Address(dest[2]) >= IPv4Address('192.168.0.0') and IPv4Address(dest[2]) <= IPv4Address('192.168.255.255'): class_c.append(dest) else: other_ips.append(dest) if len(other_ips) > 0: other_ip_ranges = group_ip_network(other_ips) ip_ranges += other_ip_ranges if len(class_a) > 0: class_a_ranges = group_ip_network(class_a) ip_ranges += class_a_ranges if len(class_b) > 0: class_b_ranges = group_ip_network(class_b) ip_ranges += class_b_ranges if len(class_c) > 0: class_c_ranges = group_ip_network(class_c) ip_ranges += class_c_ranges return ip_ranges source_ip_network_range This function classifies source IP addresses into Class A/B/C and other IP ranges and groups them into IP network ranges. The function source_ip_network_range is renamed to classify_source_ip_ranges to provide a more descriptive name. Variable names are modified to follow lowercase with underscores style for improved readability. The docstring remains the same to explain the purpose of the function and its parameters. The range function is replaced with a loop that directly iterates over the sources list using the variable source. The condition (validate_ipv4address(flow_list[1])== True and flow_list[7]=="A") is simplified to validate_ipv4_address(flow_list[1]) and flow_list[7] == "A". The variables classA, classB, classC, and otherip are changed to class_a, class_b, class_c, and other_ip, respectively, following the lowercase with underscores style. The check for IP ranges is now performed using the ip_address variable instead of calling IPv4Address multiple times for each range comparison. The addition of lists ipranges = ipranges + otheripranges is replaced with the augmented assignment operator += for conciseness. The function call dest_ip_network_range is changed toip_network_range_udf assuming it is defined elsewhere in the code. Python def classify_source_ip_ranges(sources): """ Classify the source IPs of flow tuples based on Class A/B/C and other IP ranges. :param sources: List of flow tuples. :return: List of IP ranges. """ ip_ranges = [] class_a = [] class_b = [] class_c = [] other_ip = [] for source in sources: flow_list = source.split(",") if validate_ipv4_address(flow_list[1]) and flow_list[7] == "A": ip_address = IPv4Address(flow_list[1]) if ip_address >= IPv4Address('10.0.0.0') and ip_address <= IPv4Address('10.255.255.255'): class_a.append(flow_list) elif ip_address >= IPv4Address('172.16.0.0') and ip_address <= IPv4Address('172.31.255.255'): class_b.append(flow_list) elif ip_address >= IPv4Address('192.168.0.0') and ip_address <= IPv4Address('192.168.255.255'): class_c.append(flow_list) else: other_ip.append(flow_list) if len(other_ip) > 0: other_ip_ranges = dest_ip_network_range(other_ip) ip_ranges += other_ip_ranges if len(class_a) > 0: class_a_ranges = dest_ip_network_range(class_a) ip_ranges += class_a_ranges if len(class_b) > 0: class_b_ranges = dest_ip_network_range(class_b) ip_ranges += class_b_ranges if len(class_c) > 0: class_c_ranges = dest_ip_network_range(class_c) ip_ranges += class_c_ranges return ip_ranges ip_network_range_udf = udf(classify_source_ip_ranges) classify_source_ip_ranges that classifies source IP addresses of flow tuples into different categories based on their IP class (A, B, C) or other IP addresses. The function takes a parameter called sources, which is nothing but the list of flow tuples. Inside the function, there are four empty lists created: classA, classB, classC, and otherip. These lists will be used to store the flow tuples based on their IP classification. The code then iterates over each element in the sources list using a for loop. In each iteration, the current element (a flow tuple) is split by a comma (,) to extract its individual components. The flow tuple is then checked for two conditions: whether the second element, i.e., Source IP Address (index 1), is a valid IPv4 address and whether the eighth element, i.e., Policy of the rule (Allow or Deny) (index 7) is equal to 'A.' If both conditions are met, the flow tuple is classified based on its IP class (A, B, or C) by comparing the IP address with predefined IP ranges. If the IP address falls within the range of Class A IP addresses (10.0.0.0 - 10.255.255.255), the flow tuple is added to the class A list. Similarly, if it falls within the range of Class B IP addresses (172.16.0.0 - 172.31.255.255) or Class C IP addresses (192.168.0.0 - 192.168.255.255) it is added to the respective class B or class C list. If the IP address does not fall within any of these ranges, it is considered as "other" and added to the otherip list. After processing all flow tuples in the sources list, the code checks if there are any flow tuples in the other list. If so, it calls a function called dest_ip_network_range to perform a similar IP classification for the destination IP addresses of these flow tuples. The resulting IP ranges are then added to the ip_ranges list. The same process is repeated for the class A, class B, and class C lists if they contain any flow tuples. The IP ranges from the destination IP classification are added to theip_ranges list. Finally, the function returns theip_ranges list, which contains the IP ranges for the classified source IP addresses. The last line of code defines a user-defined function (UDF) called ip_network_range_udfusing the udf function. This UDF can be used with Spark DataFrame operations to apply the source_ip_network_range function on the data. Using the defined UDFs and functions, the code performs a permission check on NSG rules. It processes flow tuples from the NSG logs and groups them based on IP network ranges, source ports, destination ports, and protocols. It also classifies IP addresses into different categories, such as Class A/B/C and other IPs, and creates IP network ranges accordingly. Final NSG Daily Rule The below-provided code consists of a series of transformations applied to a DataFrame named dataframe for batch processing. Let's break down the code step by step: coreUDFNSGDF Transformation: coreUDFNSGDF = dataframe.withColumn("flow_tuples_network",ip_network_range_udf("flow_rules")).select("system_id","rule_name","flow_tuples_network") This step applies a user-defined function (UDF) called ip_network_range_udf to the "flow_rules" column of the input DataFrame. The UDF transforms the"flow_rules" values into a network range representation and creates a new column called "flow_tuples_network." The resulting DataFrame, coreUDFNSGDF, selects the columns "system_id", "rule_name", and "flow_tuples_network." splitNSGRuleDF Transformation: splitNSGRuleDF = coreUDFNSGDF.select(col("system_id"),col("rule_name"),split(col("flow_tuples_network"),"],").alias("flow_net_array")) This step splits the "flow_tuples_network" column by the delimiter "],". The resulting DataFrame, splitNSGRuleDF, selects the columns "system_id", "rule_name," and creates a new column "flow_net_array" that contains the split values. explodeNSGRuleDF Transformation: explodeNSGRuleDF = splitNSGRuleDF.select("system_id","rule_name",explode("flow_net_array").alias("flow_explode")) This step uses the explode function to expand the "flow_net_array" column, resulting in multiple rows for each element in the array. The resulting DataFrame, explodeNSGRuleDF, selects the columns "system_id,""rule_name," and creates a new column, "flow_explode." regexNSGRuleDF Transformation: regexNSGRuleDF = explodeNSGRuleDF.select(col("system_id"),col("rule_name"),F.regexp_replace(F.col("flow_explode"), "[\[\]]", "").alias("flow_range")) This step uses the regexp_replace function to remove the square brackets "[" and "]" from the "flow_explode" column values. The resulting DataFrame, regexNSGRuleDF, selects the columns "system_id,""rule_name," and creates a new column "flow_range." finalNSGRuleDF Transformation: finalNSGRuleDF = regexNSGRuleDF.select(col("system_id"),col("rule_name"),split(col("flow_range"),";").alias("flow_array")).select(col("system_id"),col("rule_name"),col("flow_array")[0].alias("sourcerange"),col("flow_array")[1].alias("destinationrange"),col("flow_array")[2].alias("destination_ports"),col("flow_array")[3].alias("protocols"),col("flow_array")[4].alias("policy"),col("flow_array")[5].alias("source_portlist")) This step splits the "flow_range" column by the delimiter ";" and creates a new column "flow_array," with the split values. The resulting DataFrame, finalNSGRuleDF, selects the columns "system_id" and "rule_name" and extracts specific elements from the "flow_array" column using indexing. The extracted elements are given meaningful aliases such as "sourcerange", "destinationrange,""destination_ports,""protocols,""policy," and "source_portlist." Writing the DataFrame: finalNSGRuleDF.write.format('parquet').save(output_file_path) This step writes the finalNSGRuleDF DataFrame in the Parquet file format and saves it to the specified output file path. Overall, the provided code performs a series of transformations on the input DataFrame, splitting columns, extracting specific values, and finally saving the transformed DataFrame in the Parquet format. Python def processBatchData(input_dataframe): coreUDFNSGDF = input_dataframe.withColumn("flow_tuples_network", ip_network_range_udf("flow_rules")).select("system_id", "rule_name", "flow_tuples_network") splitNSGRuleDF = coreUDFNSGDF.select(col("system_id"), col("rule_name"), split(col("flow_tuples_network"), "],").alias("flow_net_array")) explodeNSGRuleDF = splitNSGRuleDF.select("system_id", "rule_name", explode("flow_net_array").alias("flow_explode")) regexNSGRuleDF = explodeNSGRuleDF.select(col("system_id"), col("rule_name"), F.regexp_replace(F.col("flow_explode"), "[\[\]]", "").alias("flow_range")) finalNSGRuleDF = regexNSGRuleDF.select(col("system_id"), col("rule_name"), split(col("flow_range"), ";").alias("flow_array")).select(col("system_id"), col("rule_name"), col("flow_array")[0].alias("sourcerange"), col("flow_array")[1].alias("destinationrange"), col("flow_array")[2].alias("destination_ports"), col("flow_array")[3].alias("protocols"), col("flow_array")[4].alias("policy"), col("flow_array")[5].alias("source_portlist")) finalNSGRuleDF.write.format('parquet').save(output_file_path) Consolidate NSG Rule As discussed in the NSG data transformation in the Spark section, consolidating the proposed rule on a daily, weekly, or monthly basis is considered a good practice in network security. This process involves reviewing and analyzing the rules that have been suggested for implementation. By consolidating the proposed rules, you ensure that the network security group (NSG) is effectively managed and operates optimally. Throughout the document, I discussed two types of rules: less permissive rules and over-permissive rules. Less permissive rules refer to rules that have stricter access controls, providing limited access to network resources. On the other hand, over-permissive rules are rules that have looser access controls, potentially granting more access than necessary. These concepts were discussed at a high level in the document. The default rules in a network security group are the rules that are applied when no specific rules are defined for a particular network resource. While default rules may provide some basic level of security, they are not always considered good practice. Relying solely on default rules indicates a poorly managed network security group. It is important to define specific rules for each network resource to ensure proper security measures are in place. In contrast to default rules, user rules are defined based on their relative importance. User rules prioritize specific requirements and access needs for different network resources. However, user rules may not always be precise, as they are subjective and can vary depending on individual perspectives and requirements. To address these challenges, it is suggested to propose rules based on daily NSG logs. By analyzing the logs generated by the NSG on a daily basis, you can gain insights into the actual network traffic patterns, security incidents, and potential vulnerabilities. Consolidating these rules daily allows for a more accurate and up-to-date understanding of the network security requirements. Furthermore, when consolidating the proposed rules, it is advisable to define them as absolutely as possible. This means setting rules that are clear, unambiguous, and strictly enforceable. Absolute rules provide a higher level of certainty and help minimize any potential misinterpretation or misconfiguration that could lead to security breaches. In the process of consolidating and defining rules, collaboration with the network security team is crucial. Working together with the team allows for a comprehensive understanding of the network infrastructure and helps in identifying the confidence level of the proposed rules. Utilizing Spark analysis, a powerful data processing framework, can aid in extracting insights from the NSG logs and assist in determining the effectiveness and reliability of the proposed rules. Conclusion Micro-segmentation and permissive rule checks in network security groups play a vital role in maintaining a secure and compliant network infrastructure. By leveraging Apache Spark’s distributed computing capabilities, organizations can efficiently analyze and validate NSG rules, ensuring that the defined rules are permissive enough to allow legitimate traffic while maintaining a high level of security. Automating the permission check process using Apache Spark not only enhances network security but also simplifies network management and enables rapid detection and response to potential security incidents. By implementing micro-segmentation and adhering to permissive rule best practices, organizations can strengthen their network security posture, minimize the attack surface, and protect their sensitive data from unauthorized access. In an ever-evolving threat landscape, organizations need to prioritize network security and adopt advanced techniques like micro-segmentation and permissive rule checks. By staying proactive and leveraging the power of technologies like Apache Spark, organizations can effectively mitigate security risks and maintain robust network infrastructure.
This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report There are few technologies that have seen as much attention and adoption over the last decade as Infrastructure as Code (IaC) and containers. In those years, both have improved and eventually converged to form a modern standard for application delivery. In this article, we'll take a look at this transformation, the ways it has uprooted traditional methods of working in software, and how DevOps top performers use the technologies to deliver value at a blistering pace. Two Powerful Technologies IaC and containers emerged initially to solve different sets of problems. Modern IaC was predated by configuration management, aiming to gain some control over the frequently unpredictable drift of infrastructure over time. Not only did automation increase the ease with which infrastructure could be provisioned, it improved stability as well because services could be quickly recreated instead of manually fixed. Yet, in a world where dev and ops are split, advances in infrastructure automation remained squarely in the toolkit of ops. Containers were also born of a need for reproducibility, though in the context of applications. As installed libraries, dependencies, and other aspects of the runtime environment changed over time or across machines, the behavior of applications hosted on them changed as well. Many early approaches leveraged the Unix chroot syscall, which eventually converged to the development of containers as a more complete abstraction. Unlike IaC, much of the pain that containers solved was shared by both dev and ops. Collaboration across different workstations was challenging for dev and deploying that application to yet another machine in production was difficult for ops. Breaking Down the Wall of Confusion A silver lining of previously separate groups having the same problem is they start to work more closely to find a solution. The convergence within the industry toward containers has tracked alongside the DevOps movement, breaking down many of the traditional silos found in companies. Now an icon of this transformation, the "wall of confusion" connotes the challenges faced in traditional organizations when work moves in one direction from dev to ops. In the most pathological scenario, code that "works" on a developer's machine is "tossed over the wall" to be deployed and maintained by the operations team. Figure 1 illustrates this scenario: Figure 1: The wall of confusion Work moves from dev to ops: dev is responsible for Green, and ops is responsible for Red Success in this situation is difficult to achieve for many reasons, including that: It is not clear who should compile the binary; the developer understands application requirements but operations knows the real target environment. Many details of the environment that the application was tested in are known, but only to the developer. Operations must recreate this environment without having a consistent way to document it precisely. Even with IaC on the ops side, untracked changes to the dev environment will lead to production failures. Without containers, the use of IaC can improve outcomes only marginally. Ops can manage and change all that is on their side of the wall more effectively, but the same gaps in communication and feedback are still present. The core limitation is not lifted, and this is demonstrated well by the scope of work needed to be done by IaC tools. Many steps to prepare dependencies for the application are redone by ops when provisioning the production environment, even though the developer already took similar steps to produce their development environment. IaC tools will build the environment quickly, but without effective communication channels, they will only build broken environments more quickly. Once containers are introduced, however, these limitations can be significantly diminished. Figure 2: Delivering from dev to ops with containers Containers enable a more direct line of flow between dev and ops After containerizing the application, many previously hidden details can be shared explicitly: Most of the application dependencies are delivered directly to the production environment. The binary can be compiled, linked, and tested before handoff. The configuration can be split into what is already known at build time and what must be determined when the application is deployed. The overall footprint of what ops must independently build and maintain is significantly reduced. The minimization of hidden dependencies on the developer's machine has an outsized impact on the shared understanding across the organization. Of the three remaining dependencies depicted in this scenario, source code is largely irrelevant once the downstream artifacts have been persisted in the container. The runtime configuration and hardware dependencies remain as the key dependencies outside the containerized application. The definition of these is a natural task for many IaC tools, specifying resources such as CPU and networking supplemented with relevant connection parameters and credentials. It is also more appropriately aligned with the concerns of operations. Doubling Down on Containers The trend of container adoption in the industry has had an impact on the way dev and ops are able to work, and has been central to many DevOps transformations. What's next for these technologies? In the same way these technical advances have changed the way we work, the changes to the way we work have paved the way for further improvements to the technology. The abstraction that containers provided ultimately reduced the set of concerns that IaC tools had to deal with. This enabled the development of better abstractions for IaC, largely built around virtualized compute, storage, and networking. Installing dependencies and managing OSs has shifted left into the container build process and often within the development cycle. In many ways, containers form a contract between the application and the infrastructure that will run it. This has been formally defined by the Open Container Initiative (OCI), and an ecosystem of modern cloud-native tools has emerged conforming to the specification. A familiar but often overlooked element of container-driven architectures is the container registry. By systematically submitting container images to a registry, the DevOps workstream gains the additional benefit of decoupling software delivery. Figure 3: Decoupling build and deploy Dev may build container images with Docker, and ops may deploy them with Kubernetes A container image, representing an immutable snapshot of a deployable application, is pushed to the registry. The application can then be deployed to any OCI-compliant container runtime. This may be directly to a virtual machine (VM), an open-source orchestration framework like Kubernetes, or one of the many managed container runtimes offered by cloud providers. A common element between these options is that container images are pulled from the registry when needed. Not only does this enable greater control over when changes are actually released, but it also provides a natural means of declaratively specifying the application being deployed. In fact, this is arguably the key to achieving some of the longest-held goals of IaC. The entire infrastructure is declared authoritatively with immutable references to applications. This is the closest the industry has been able to get to a single definition of the entire system. Modern Patterns Made Possible As defined by the OpenGitOps project, the core principles of GitOps are: Declarative Versioned and immutable Pulled automatically Continuously reconciled With these principles in mind, it is clear that GitOps is a natural extension of patterns enabled by IaC and containers. The system definition is made declarative and immutable through the use of IaC with references to immutable containers. By persisting the definition to a version control system such as Git, we can achieve the versioned quality. The remaining two principles — pulled automatically and continuously reconciled — are the extensions that GitOps offers and can generally be implemented by running agents in the target environment. One agent is responsible for synchronizing the latest state from source control and the other ensures that the actual state continues to match the desired state. In order for an organization to fully benefit from a microservices architecture, the services must be small and independently deployable. Deployment pipelines have often been built to push builds through various stages, sometimes making it difficult to selectively release only certain parts of the system. As we have seen, container registries offer essential decoupling: Build pipelines push images to the registry, and the declared infrastructure pulls those images precisely when we want it to. This enables isolation between teams and small, safe changes to production. Conclusion Starting initially as tools meant for distinct use cases, IaC and containers have converged to form a modern framework for cloud-native application delivery. With maturing open-source standards, they are democratizing access to previously elite levels of quality and reliability. First-class support from cloud providers is further reducing barriers to adoption. The greatest advances are in the ways we work and collaborate as teams, and this is ultimately how we are able to deliver the greatest value and at the highest levels of quality. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report
This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report As containerized architecture gains momentum, businesses are realizing the growing significance of container security. While containers undeniably offer profound benefits, such as portability, flexibility, and scalability, they also introduce unprecedented security challenges. In this report, we will address the fundamental principles and strategies of container security and delve into two specific methods — secrets management and patching. Additionally, we will examine tools and techniques for securing keys, tokens, and passwords. Current Trends in Container Security Developers and DevOps teams have embraced the use of containers for application deployment. In a report, Gartner stated, "By 2025, over 85% of organizations worldwide will be running containerized applications in production, a significant increase from less than 35% in 2019." On the flip side, various statistics indicate that the popularity of containers has also made them a target for cybercriminals who have been successful in exploiting them. According to a survey released in a 2023 State of Kubernetes security report by Red Hat, 67% of respondents stated that security was their primary concern when adopting containerization. Additionally, 37% reported that they had suffered revenue or customer loss due to a container or Kubernetes security incident. These data points emphasize the significance of container security, making it a critical and pressing topic for discussion among organizations that are currently using or planning to adopt containerized applications. Strategies for Container Security Container security can be most effectively handled using a comprehensive multi-level approach, each involving different strategies and principles. Employing this approach minimizes the risk of exposure and safeguards the application against threats. Figure 1: Multi-layer approach to container security Application security focuses on securing the application that is executed within the container, which can be achieved by implementing input validation, secure coding practices, and encryption. In contrast, the container runtime environment should undergo regular vulnerability scans and patching. Finally, the host layer is considered the most critical security layer because it is responsible for running the containers. It can be secured by implementing baseline configurations to harden the host operating system, deploying firewalls, implementing network segmentation, and using intrusion detection and intrusion prevention systems. Each layer of the container infrastructure provides an opportunity to apply a set of overarching principles and strategies for security. Below, we've outlined some key strategies to help provide a better understanding of how these principles can be put into action. SECURING CONTAINERIZED ENVIRONMENTS Core Principles and Strategies Description Secure by design Least privilege, separation of duty, defense in depth Risk assessment Vulnerability scanning and remediation, threat modeling, security policy Access management RBAC, MFA, centralized identity management Runtime security Network segmentation, container isolation, intrusion detection and prevention Incident management and response Log management, incident planning and response, continuous monitoring Container Segmentation To secure communication within a container segment, containers can be deployed as microservices, ensuring that only authorized connections are allowed. This is achieved using cloud-native container firewalls, container zones, service mesh technologies, etc., that control the traffic to the virtual network using granular policies. While network segmentation divides the physical network into sub-networks, container segmentation works on an overlay network to provide additional controls for resource-based identity. Image Scanning Before deploying containers, it is important to analyze the container base images, libraries, and packages for any vulnerabilities. This can be accomplished by utilizing image scanning tools, such as Anchore and Docker Scout. Runtime Protection To identify and respond to potential security incidents in real time, it is crucial to monitor activities within the container. Runtime security tools can assist in this task by identifying unauthorized access, malware, and anomalous behavior. Access Control To minimize the possibility of unauthorized access to the host machine, only authorized personnel should be granted access to the containerized environment. Strong authentication and authorization mechanisms, such as multifactor authentication, role-based access control (RBAC), and OAuth, could be deployed for this purpose. Secrets Management in Container Security Secrets management protects against both external and internal threats and simplifies credential management by centralizing it. It attempts to protect sensitive information (keys, tokens, etc.) that controls access to various services, container resources, and databases. Ultimately, it ensures that sensitive data is kept secure and meets regulatory compliance requirements. Due to the importance of secrets, they should always be encrypted and stored securely. Mishandling of this information can lead to data leakage, breach of intellectual property, and losing customer trust. Common missteps include secrets being stored in plain text, hardcoding them, or committing them to source control system/repository. Overview of Common Types of Secrets To ensure the security of secrets, it's crucial to have a clear understanding of the various types: Passwords are the most commonly used secret. They are used to authenticate users and provide access to web services, databases, and other resources in the container. Keys serve multiple purposes, such as encrypting and decrypting data and providing authentication for devices and services. Common key types include SSH keys, API keys, and encryption keys. Tokens are used to provide temporary access to resources or services. Authentication tokens, such as access tokens, OAuth, and refresh tokens, are used to authenticate third-party services or APIs. Database credentials could be usernames, passwords, and connection strings that are used to access the database and database-specific secrets. Overview of Popular Secrets Management Tools When evaluating a security solution, it's important to consider a range of factors, such as encryption, access control, integration capabilities, automation, monitoring, logging, and scalability. These are all desirable traits that can contribute to a robust and effective security posture. Conversely, pitfalls such as lack of transparency, limited functionality, poor integration, and cost should also be considered. In addition to the above-listed capabilities, a comprehensive evaluation of a security solution also takes into account the specific needs and requirements of a company's current infrastructure (AWS, Azure, Google Cloud, etc.) and compatibility with its existing tools to ensure the best possible outcome. Below is the list of some proven tools in the industry for your reference: HashiCorp Vault – An open-source tool that provides features like centralized secrets management, secrets rotation, and dynamic secrets. Kubernetes Secrets – A built-in secrets management tool within the Kubernetes environment that allow users to store sensitive information such as Kubernetes objects. It is advised to use encryption, RBAC rules, and other security best practices for configuration when using Kubernetes Secrets. AWS Secrets Manager – A cloud-based tool that is both scalable and highly available. It supports containers running on Amazon ECS and EKS, provides automatic secret rotation, and can integrate with AWS services, like Lambda. Azure Key Vault – Usually used by containers running on Azure Kubernetes Service. It can support various key types and integrates with most Azure services. Docker Secrets – A built-in secrets management tool that can store and manage secrets within Docker Swarm. Note that this tool is only available for Swarm services and not for standalone containers. Short-Lived Secrets An emerging trend in the field of secrets management is the use of short-lived secrets that have a limited lifespan, are automatically rotated at regular intervals, and are generated on demand. This is a response to the risk associated with longlived, unencrypted secrets, as these new secrets typically only last for a matter of minutes or hours and are automatically deleted once they expire. Patching in Container Security To reduce exposure risk from known threats, it is important to ensure that containers are using the latest software versions. Patching ensures that the software is regularly updated to address any open vulnerabilities. If patching is not applied, malicious actors can exploit vulnerabilities and cause malware infections and data breaches. Mature organizations use automated patching tools to keep their container environments up to date. Patching Tools To keep container images up to date with the latest security patches, there are many tools available in the market. Kubernetes and Swarm are the most widely used orchestration tools that provide a centralized platform and allow users to automate container deployment. Ansible and Puppet are other popular automation tools used for automated deployment of patches for Docker images. Best Practices for Implementing Patching Applying patches in a container environment can significantly enhance the security posture of an organization, provided they follow industry best practices: Scan containers on a periodic basis to identify vulnerabilities and keep the base images up to date. Use an automatic patching process with automated tools as much as possible to reduce manual intervention. Use official images and test patches in a testing environment before deploying into production. Track the patching activity, monitor logs, and act on alerts or issues generated. Create automated build pipelines for testing and deploying containers that are patched. Conclusion As more organizations adopt containerized environments, it's vital to understand the potential security risks and take a proactive approach to container security. This involves implementing core security principles and strategies, using available tools, and prioritizing the security of containerized data and applications. Strategies like multi-layered security, secrets management, and automated patching can help prevent security breaches. Additionally, integrating patching processes with CI/CD pipelines can improve efficiency. It's important for organizations to stay up to date with the latest container security trends and solutions to effectively protect their containerized environments. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report
This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report Containers are a major catalyst for rapid cloud-native adoption across all kinds of enterprises because they help organizations quickly lift and shift legacy applications or break monoliths into microservices to move to the cloud. They also unlock system architecture to adopt a multi-cloud ecosystem by providing an abstraction between the application and underlying platform. Benefits of containers are widely evident around the cloud-native world and its modernization journey. Enterprises on the cloud-native roadmap are adopting and running containers at scale. Containers are not only about building and running images — a lot more goes on behind the scenes for container management, including all the tools and processes covering the complete lifecycle of containers. When enterprises start adopting containers, they will only have a handful of containers to look after. In this case, "container management" looks like little more than having docker build and docker run. Ignoring a container management strategy can lead to developer and operator ineffectiveness, poor governance and compliance, and security challenges in the long term. Giving priority to strategizing and managing the container lifecycle can help boost productivity and the effectiveness of developers and teams. It also contributes toward solution agility and helps in reducing the blast radius and vulnerabilities. Enterprises need to holistically consider container management planning and lifecycle before accelerating container adoption. Aspects of Container Management Strategy Let's understand various key parts of container management and its components. Container and Image Supply Chain Container images are building blocks for running containers. An image supply chain consists of all the nuts and bolts to make it executable on environments by pull, build, and run. An image supply chain also includes: All the layers of images built on top of the base image, which includes libraries and utilities that complement the containerized application package CI/CD tools that test and scan your packaging as a container image Static and runtime scanning for vulnerability detecting and patching, signing, or hashing of images to validate their sanctity in your registries or pipeline Figure 1: Container management lifecycle - Container image supply chain Container Infrastructure Handling Once your container image supply chain has been established (see Figure 1), you next want to run and build your application on top of it. For this, you need something on which you can run or execute containers. This includes compute for running containers and software logistics to schedule and organize them. If you're working with just a few containers, you can still manually gauge and control where to run the containers, what else will be in the app sidecars, or support ecosystem components. Provisioning the right storage and networking for those containers can be manually or semi-automatically handled. At scale, however, it is almost unmanageable to handle a large workload without an intelligent orchestrator that orchestrates these infrastructures as well as other aspects of container execution. Container Runtime Security and Policy Enforcement It is equally important for your container management solution to perform security scans, competence checks, and policy enforcement. A management solution enforces policy and compliance in parallel with a runtime security scan for vulnerabilities inside a container pipeline, and it scans running containers on host nodes. Container Monitoring and Observability Images and containers are fully packaged with all the dependencies and prerequisites of apps running on top of an identified compute. Now we need to understand containers' behavior and what they are up to. A containerization strategy — which covers monitoring and observability of logging, traces, and metrics collection — should include container workloads, orchestration, and tooling that support container execution. Container execution inside a cluster of managed infrastructure includes supporting tools and utilities for running containers. Orchestrators will also have their own logging and monitoring since containers are ephemeral in nature. Planning Container Management Strategy So far, we have discussed all major components of container management. Enterprises should address the following aspects while designing a container management lifecycle. Figure 2: Container governance and policy compliance - Container management stages Handling Image Supply Chain Existing CI/CD tools can be leveraged to build container images after compiling code and base references. A few important things to handle while building your enterprise image supply chain are: The ability to scan container images in an enterprise repository Security and policy compliance Hashing or signing the image to avoid any tampering Scanning mirror images from a well-known and sanitized registry before bringing them into an enterprise repository Tagging and attributing images with details of the teams owning it for better support, portability, and upkeep Some mature enterprises handle redundancy and replication of an image repository and artifacts to ensure high availability across the DevOps cycle, followed by periodic backups and a recovery process. Elastic, highly available, and fault-tolerant systems are not just limited to an execution environment but are equally important for the end-to-end DevOps cycle. Infrastructure and Orchestration Handling Strategy Infrastructure and orchestration handling strategy is all about the allocation of compute, storage, networking, and backups of containers running at scale. Selecting the type and quantum of compute is very important for designing containers. Containers can only be truly portable if the underlying compute is elastic and supports X (horizontal) and Y (vertical) scalability. Storage requirements for containers can be a mix of OS usage as well as container persistence. It means that container operations require a well-planned storage supply with diverse options of file, block, and blob storage. Networking is an essential part of the connectivity and delivery of a solution alongside enterprise security. Using a mature orchestrator like Kubernetes, Docker swarm, etc., provides different flavors of inter- and intra-container cluster connectivity. Backups are an important part of operating containerized environments, which consist of mounted storage that holds data required to persist. A well-managed backup strategy contributes toward resiliency, cross-regional recovery, and autoscaling. For example, you can use image and container backups to recreate immutable read-only containers, given their ephemerality. Container Security Principles You are only as secure as your most vulnerable container. One of the main advantages of containers is that they reduce the blast radius and attack surface. Regular scanning and re-scanning of a repository is a good starting point, as you can see in Figure 2. Also, it is vital to consider implementing container runtime scanning — most likely traditional, agent-based host scanning to detect runtime anomalies. Container images are immutable; hence, vulnerability patching should replace an old image with a new properly scanned and tested image. Patching hundreds or thousands of containers can be cumbersome and should be replaced with new containers based on updated and patched base images. Container Observability Planning Looking inside a dense cluster of small ephemeral containers is challenging, and they may grow out of control if not handled maturely. The 12-Factor App guides us through the critical aspect of externalizing your logs. Containers will come and go, but the draining of logs toward an external syslog gives you better insights via log aggregation and mining. Figure 3: Container strategy phases and execution pipeline Besides everything, developer experience is crucial in enterprise container management planning. It's important to analyze the productivity and effectiveness that the container lifecycle is bringing to developers and operators working on a DevOps pipeline chain. Enterprises also need to evaluate whether DIY or managed services (like EKS, AKS, or GKE) are a better fit for them. The answer may depend on the enterprise's maturity around different aspects of infrastructure, networking, and security handling, as you see in Figure 3. Organizations' roadmaps for infrastructure (private vs. hybrid vs. multi-cloud architecture) should be considered in the container management strategy. Best Practices for Building an Optimized Container Ecosystem Let's quickly review some best practices to help build better containers: Package a single app per container Do not treat containers as VMs Handle container PIDs and zombie processes carefully Optimize docker build cache Remove unnecessary tools and utilities from images Be cautious of using publicly sourced images vs. scanned enterprise build images Build on the smallest possible images Properly tag your images for better lifecycle handling Conclusion Finally, I am containerizing and packaging a portable summary of an effective container management strategy (pun intended). The takeaway is to inspect how effectively your engineers and developers are managing a large containerized production environment. How agile you will be in responding to urgent vulnerabilities? How soon you can respond to dynamic scalability requirements of compute and storage? The 12-Factor App is an effective gauge of measuring container ecosystem maturity. When choosing your tool, consider options that support infrastructure requirements of today and tomorrow. Enterprises also need to determine whether to use DIY or managed services based on in-house maturity around container lifecycle stages. You can always strategize your plan around the re-use of tools and processes to manage containers as well as non-container components optimally. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report
This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report Conventionally, software applications were developed using a monolithic architecture where all application components were tightly interwoven and deployed as a single unit. As software applications became more intricate and organizations started relying on distributed systems, the constraints of monolithic architecture started becoming more evident. Containerization was introduced as a solution to this and the growing demand for application scalability, agility, and resilience. The success of microservices further propelled the use of containers, which enabled organizations to break down monolithic applications into independently deployable services. This loosely coupled framework also enabled developers to deploy changes quickly, while achieving enhanced scalability and fault tolerance. In this article, we explore the limitations of monolithic architecture and demonstrate how containers and microservices can support modern application delivery. We also delve into the various strategies, stages, and key steps organizations can adopt in their journey towards containerization and learn how different strategies can support different use cases. Breaking a Monolith Into Microservices Compared to monolithic architecture, a microservices architecture comprises modular, loosely coupled components that communicate via well-defined APIs. This architecture promotes fault tolerance, as the failure of one service has limited impact on the overall system. Microservices also differ from monoliths by using polyglot persistence, which enables each service to select its ideal storage solution based on its requirements. However, before you transition from monolithic to microservices architecture, it's essential to understand the key differences between the two for making informed decisions about application design and choosing the right transformation strategy. The following table outlines these distinctions, offering insights into the unique benefits and characteristics of each approach: KEY DIFFERENCES BETWEEN MONOLITHIC AND MICROSERVICES ARCHITECTURES Aspect Monolithic Architectures Microservices Architectures Structure Single, large application Multiple, small services Deployment Deploy the entire application at once Deploy individual services independently Scalability Scale the entire application Scale specific services as needed Development Changes require coordination across the team Changes can be made independently Technology stack Typically uses a single, unified stack Each service can use a different stack Fault tolerance Failure in one component affects the entire app Failure in one service has limited impact Strategies for Migrating to Containers Strategies for migrating to containers vary depending on an organization's specific requirements and goals. These strategies help manage the transition from monolithic architectures to containerized microservices, allowing organizations to achieve increased agility, scalability, and resilience. Let's review some common approaches. Phased Approach This approach involves incrementally breaking down monoliths into microservices through containerization, beginning with components that will potentially realize maximum benefits first. It reduces risks while giving teams time to learn and adapt processes over time. When to use: The phased approach is considered best when you wish to minimize risk and disrupt ongoing operations. It is also suitable for organizations with limited resources or complex monolithic applications who would prefer a gradual transformation from monolithic to microservices. Hybrid Architecture The hybrid architecture approach combines monolithic and microservices components, where some remain within monolithic architecture while others migrate toward microservices architectures progressively. This approach balances the benefits of both architectures and is suitable for organizations that cannot fully commit to a complete migration. When to use: Adopt this approach when it isn't feasible or necessary to completely transition an application over to microservices. This strategy works especially well for organizations that wish to retain parts of a monolithic application while reaping some advantages of microservices for specific components. Big Bang Approach Redesign and implement an entire application using microservices from scratch. Although this strategy might require dedicated resources and may introduce greater risk, this allows organizations to fully exploit the advantages of microservices and containers. When to use: Choose this approach when your monolithic application has reached a point where a complete overhaul is necessary to meet current and future demands, or when your organization can afford a resource-intensive yet riskier transition to microservices and containers while reaping all their advantages. Re-Platforming This approach involves moving the monolithic application to a container platform without breaking it into microservices. Replatforming offers some benefits of containerization, such as improved deployment and scalability, without the complexities of a full migration to microservices. When to use: It's recommended to use re-platforming when the goal is to gain some of the advantages of containerization without breaking down the monolith into microservices. It is also ideal for organizations that are new to containerization and want to test the waters before committing to a full migration to microservices. Practical Steps to Embracing a Containerization Journey Embarking on a containerization journey signifies the transformation of monolithic architectures into streamlined, efficient, and scalable microservices. The following section explores various stages involved in migrating monoliths to containers, and it provides a comprehensive roadmap to successfully navigate the complexities of containerization. Stages of Migrating Monoliths to Containers The migration process from monoliths to containers typically goes through three stages. Each stage focuses on addressing specific challenges and gradually transforming the architecture to optimize efficiency, flexibility, and maintainability: Identify the monolith in an organization's application architecture. Look for large, tightly coupled systems that have a single codebase, shared resources, and limited modularity. Analyze the dependencies, data flow, and communication patterns to understand the complexities of your monolith. Define service boundaries. Perform domain-driven design (DDD) analysis to identify logical service boundaries within the monolith. Establish bounded contexts that encapsulate specific business functions or processes, enabling microservices to be designed around these contexts and reducing inter-service dependencies. Re-architect the application into smaller, more manageable pieces. Re-architect the monolithic application into microservices using patterns like API gateway, service registry, and circuit breaker. Implement an API-driven approach, with each microservice exposing a RESTful API or using message-based communication such as AMQP or Kafka. Figure 1: Migrating monoliths to containers Key Steps of a Containerization Journey Embracing containerization often involves a series of well-defined steps that may be tailored for individual use cases. Successful containerization adoption may vary based on each organization's use case; however, the following four steps provide general guidance as organizations navigate their container journey from identifying component applications and setting up robust management to administering security practices for containers. Identify Application Components Analyze your application's architecture using dependency graphs or architecture diagrams to identify individual components like databases, web servers, and background workers. Determine the components that can be containerized and identify related dependencies that should be resolved during containerization. Purpose: Provides clarity on the application's architecture Helps with efficient resource allocation Enables component isolation for greater modularity Helps with dependency management Ensures seamless communication between containerized components Containerize Application Components Create Dockerfiles for each component to enable the encapsulation of application components into isolated containers, which facilitates easier deployment, portability, and version control. Use multi-stage builds to optimize image sizes and employ best practices like using non-root users and minimizing the attack surface. Ensure proper versioning of images by tagging them and storing them in a container registry like Docker Hub, RedHat Quay, or GitHub Container registry. Purpose: Encapsulates components in containers to create isolated environments Enables easy transfer of components across different environments Facilitates better version control of components Container Orchestration Choose a container orchestration platform such as Kubernetes or Docker Swarm to manage the deployment, scaling, and management of containerized applications. Implement appropriate resource allocation by defining resource limits and requests in your deployment manifests or compose files. Create self-healing deployments using liveness and readiness probes to monitor container health and restart unhealthy containers. Purpose: Ensures optimal allocation of resources for each container Maintains high availability and fault tolerance of applications Facilitates rolling updates and rollbacks with minimal downtime Container Management and Security Use tools like Prometheus and Grafana for monitoring and logging, with custom alerts for critical events. Implement a CI/CD pipeline for container updates, automating the build, test, and deployment process. Employ security best practices such as scanning images for vulnerabilities, enforcing network segmentation, and applying the principle of least privilege. Additionally, it is also recommended to use container security solutions that help with the continuous monitoring of threats and enforce security policies. For instance, the following configuration file can be considered as a pseudocode for managing and securing containers: # Example of container security best practices configuration container_security: - use_non_root_user: true - minimize_attack_surface: true - implement_network_segmentation: true - enforce_least_privilege: true As the next step, the following pseudocode, for instance, illustrates how to load the security best practices configuration and apply it to running containers using any container management tool: # Pseudocode to manage and secure containers import container_management_tool # Load security best practices configuration security_config = load_config("container_security") # Apply security best practices to running containers container_management_tool.apply_security_config(security_config) # Monitor container performance and resource usage container_management_tool.monitor_containers() # Implement logging and alerting container_management_tool.setup_logging_and_alerts() Purpose: Monitors container performance and resource usage Implements logging and alerting for better visibility into containerized applications Ensures container security by scanning images for vulnerabilities and applying best practices Enforces security policies and monitors threats using container security solutions Real-World Use Cases Some real-world examples of organizations that have successfully implemented containerization in their application architecture include Netflix, Spotify, and Uber. Netflix: Architecture Transformation for Agility and Scalability Netflix is one such company that has successfully leveraged containers and microservices architecture to address the massive scale of its streaming service. By investing in container-based workloads, Netflix was able to break down their monolithic app into separate, more focused services that could be deployed and managed independently, giving greater agility and scalability while accommodating large traffic spikes during peak times more easily. This provided greater flexibility as they handled increased traffic with greater ease. Spotify: Containerization for Better Developer Productivity Spotify also embraced containers and microservices to increase developer productivity. Before the transformation journey, they relied on a monolithic architecture that required extensive coordination among cross-functional teams to run and maintain. By breaking up this monolith with containerization and microservices, they created a modular architecture where developers were able to work independently on specific components or features of their application without interference from one team to the next. Containers also provided a consistent environment in which developers were able to build, test, and deploy iterative changes on application easily. Uber: Containers for Deployments and Traffic Spike Management At first, Uber was using a monolithic framework that was difficult to scale and required close team collaboration in order to function smoothly. They transitioned over to using Domain-Oriented Microservice Architecture (DOMA) for scaling their services based on demand. This platform utilized containers, which allowed more rapid deployment and improved the handling of traffic spikes. Uber took advantage of microservices and containers to scale its services independently, enabling it to adapt their offerings according to traffic or usage patterns. In addition, using microservices provided increased fault isolation and resilience that ensured its core services remained available even if one service failed. Conclusion Embarking on a containerization journey enables organizations to transform their monolithic applications into more agile, scalable, and resilient microservices-based systems. Despite the benefits, it's also essential to acknowledge that transitioning from monolithic applications to containerized microservices may introduce challenges, such as increased operational complexity and potential issues in managing distributed systems. As you reflect on your own organization's containerization journey, consider the following questions: Which components of your monolithic application will benefit the most from containerization? What strategy will you adopt for migrating to containers: phased approach, hybrid architecture, big bang approach, or re-platforming? How will you ensure effective container orchestration and management in your new architecture? With the above questions answered, the blueprint of the transformation journey is already defined. What comes next is implementing the chosen strategy, monitoring the progress, and iterating as needed to refine the process. This is an article from DZone's 2023 Containers Trend Report.For more: Read the Report
It’s 168飞艇官网直播|幸运飞行艇官方开奖查询结果 funny how a seemingly meaningless event in one’s life can lead to an unexpected change. For me, one of those events happened in July 2021 when my flight home was delayed by so much that I paid for my very first Uber. I was so impressed by the experience that I wrote about the underlying payment processor in my “Leveraging Marqeta to Build a Payment Service in Spring Boot” article. I continued to dive deeper into the Marqeta platform, writing about how to create a rewards card and even an article about building a buy now, pay later payment solution. It was at that point I started talking with Marqeta about becoming a part of their engineering team. About six weeks later, I hung up my consultant hat and started as a full-time employee of a Fintech-based market disruptor that has created the world’s first modern card issuing platform. It’s been an exciting journey along a challenging, winding road. I love it! In my spare time, I have continued to dive deeper into the world of web3, and I am always eager to learn more about Fintech too. So exploring where web3 and financial services intersect is natural for me! For this article, I wanted to see how easy it is for a web2 developer to use Java in order to perform some Fintech transactions using web3 and USDC over the Ethereum blockchain. My plan is to use the Circle Java SDK, Java 17, and a Spring Boot 3 RESTful API. About Circle and USDC We can’t talk about USDC (USD Coin) without first talking about Circle, since they are the company that manages the stablecoin. Circle was founded in 2013 (around the same time as Marqeta) with a focus on peer-to-peer payments. As a private company, Circle has reached some key milestones: Circle Pay (mobile payments) allowed users to hold, send, and receive traditional fiat currencies. Later, they became a Bitcoin digital wallet service – allowing consumers to buy/sell Bitcoins. By 2015, a Circle account could be funded in USD via US-issued Visa and Mastercard credit cards and debit cards. Similar functionality was extended to Europe a year later. In 2018, Circle raised venture capital to establish USDC with the promise that its coin is backed by fully reserved assets. USDC is a digital (crypto) currency pegged to (and backed by) the US dollar. Basically, this means that 1 USDC is always equal to 1 US dollar. So why USDC then instead of US dollars? USDC can be sent, in any amount, for just a couple of dollars. USDC can be sent globally, and nearly instantly. Because USDC is a digital currency, it can be sent, received, and settled at any time. You don’t have to worry about banking hours. And since USDC is digital – there’s an API and SDK to work with it (which is what we’ll explore in this article). USDC is issued by a private entity (it’s not a central bank digital currency) and is primarily available as an Ethereum ERC-20 token. With USDC, Circle aims to be a disruptor by giving customers the ability to avoid bank hours, processing times, and costly fees – all while building a business with the USDC digital currency. Using USDC To Make and Receive Payments With Java? Yes, Please! For this publication, let’s assume your service caters to those interested in using USDC to perform financial transactions. From what you’ve observed, the current fees associated with using the Circle platform will allow your service to still be profitable. Still trying to keep this scenario realistic, let’s also assume that your infrastructure has its roots in Java-based microservices written in Spring Boot. Your infrastructure supports a proven infrastructure of web2 applications and services. To keep things simple, we will introduce a new service – called circle-sdk-demo – which will act as the integration with the Circle platform. As a result, we will plan to explore the Java SDK by Circle – which is still currently in a beta state. Demonstration Before getting started with the service, we need to navigate to the Circle’s Developer site and obtain an API key for sandbox use. All I had to do is fill out this form: And then I received an API key for their sandbox: Keep the API key value handy, as it will be required before we start our new service. Creating a Spring Boot Service For this demonstration, I thought I would plan to use Spring Boot 3 for the first time. While I used the Spring Initializr in IntelliJ IDEA, the results of my dependencies are noted in the following pom.xml file for the circle-sdk-demo service: <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-configuration-processor</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-validation</artifactId> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>com.circle</groupId> <artifactId>circle</artifactId> <version>0.1.0-beta.3</version> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.12.0</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> For this demonstration, I had to manually add the circle and commons-lang3 artifacts. Externalizing Circle Configuration In order to externalize the API key value, the CircleConfigurationProperties class was created: @Data @Validated @Configuration("circleConfigurationProperties") @ConfigurationProperties("circle") public class CircleConfigurationProperties { @NotBlank() private String apiKey; } For reference, here is a copy of my application.yaml configuration file: circle: api-key: server: port: 8585 error: include-message: always You can use this link to determine how to externalize the population of the circle.api-key value with the API key created above. I certainly do not recommend populating the API key value directly into the configuration file above. Next, I created the following CircleConfiguration: @Slf4j @RequiredArgsConstructor @Component public class CircleConfiguration { private final CircleConfigurationProperties circleConfigurationProperties; @Bean public Circle getCircle() { log.info("======================="); log.info("Initializing Circle SDK"); log.info("======================="); log.info("basePath={}", Circle.SANDBOX_BASE_URL); log.info("circle.api-key={}", SecurityUtils.maskCredentialsRevealPrefix( circleConfigurationProperties.getApiKey(), 7, '*')); Circle circle = Circle.getInstance() .setBasePath(Circle.SANDBOX_BASE_URL) .setApiKey(circleConfigurationProperties.getApiKey()); log.info("circle={}", circle); log.info("=================================="); log.info("Circle SDK Initialization Complete"); log.info("=================================="); return circle; } } With the API key value set, starting the circle-sdk-demo Spring Boot 3.x service appears as shown below: Adding a Circle Integration Service To communicate with the Circle platform, I created a simple pass-through service called CircleIntegrationService: @Slf4j @Service public class CircleIntegrationService { private final BalancesApi balancesApi = new BalancesApi(); private final CryptoPaymentIntentsApi cryptoPaymentIntentsApi = new CryptoPaymentIntentsApi(); public ListBalancesResponse getBalances() throws ApiException { ListBalancesResponse listBalancesResponse = balancesApi.listBalances(); log.info("listBalancesResponse={}", listBalancesResponse); return listBalancesResponse; } public CreatePaymentIntentResponse createPayment(SimplePayment simplePayment) throws ApiException { CreatePaymentIntentRequest createPaymentIntentRequest = new CreatePaymentIntentRequest(new PaymentIntentCreationRequest() .idempotencyKey(UUID.randomUUID()) .amount( new CryptoPaymentsMoney() .amount(simplePayment.getAmount()) .currency(simplePayment.getCurrency()) ) .settlementCurrency(simplePayment.getSettlementCurrency()) .paymentMethods( Collections.singletonList( new PaymentMethodBlockchain() .chain(Chain.ETH) .type(PaymentMethodBlockchain.TypeEnum.BLOCKCHAIN) ) )); CreatePaymentIntentResponse createPaymentIntentResponse = cryptoPaymentIntentsApi.createPaymentIntent(createPaymentIntentRequest); log.info("createPaymentIntentResponse={}", createPaymentIntentResponse); return createPaymentIntentResponse; } public GetPaymentIntentResponse getPayment(String id) throws ApiException { UUID paymentIntentId = UUID.fromString(id); log.info("paymentIntentId={} from id={}", paymentIntentId, id); GetPaymentIntentResponse getPaymentIntentResponse = cryptoPaymentIntentsApi.getPaymentIntent(paymentIntentId); log.info("getPaymentIntentResponse={}", getPaymentIntentResponse); return getPaymentIntentResponse; } } This service allows the following functionality to be performed: Obtain a list of balances for my API key Create a new payment Get an existing payment by ID Creating RESTful URIs In the example scenario, the circle-sdk-demo will act as middleware between my existing services and the Circle platform. Next, basic controllers were created for the following URIs: GET /balances POST /payments GET /payments/{id} For this example, I simply created the BalancesController and PaymentsController classes to house these URIs. A more realistic design would employ an API First approach, similar to what I noted in my “Exploring the API-First Design Pattern” publication. circle-sdk-demo Service In Action With the circle-sdk-demo service running, I was able to perform some cURL commands against my local service, which interacted with the Circle platform via the Java SDK. Getting a list of balances: curl --location 'localhost:8585/balances' Results in the following payload response from Circle: { "data": { "available": [], "unsettled": [] } } Creating a payment: curl --location 'localhost:8585/payments' \ --header 'Content-Type: application/json' \ --data '{ "currency" : "USD", "amount" : "1.67", "settlement_currency": "USD" }' Results in the following payload response from Circle: { "data": { "id": "60b9ff8b-f28c-40cf-9a1c-207d12a5350b", "amount": { "amount": "1.67", "currency": "USD" }, "amountPaid": { "amount": "0.00", "currency": "USD" }, "amountRefunded": { "amount": "0.00", "currency": "USD" }, "settlementCurrency": "USD", "paymentMethods": [ { "type": "blockchain", "chain": "ETH", "address": null } ], "fees": null, "paymentIds": [], "refundIds": [], "timeline": [ { "status": "created", "context": null, "reason": null, "time": "2023-03-28T12:26:39.607607Z" } ], "expiresOn": null, "updateDate": "2023-03-28T12:26:39.604637Z", "createDate": "2023-03-28T12:26:39.604637Z", "merchantWalletId": "1013833795" } } Getting an existing payment by ID: curl --location 'localhost:8585/payments/60b9ff8b-f28c-40cf-9a1c-207d12a5350b' \ --header 'Content-Type: application/json' Results in the following response payload from Circle: { "data": { "id": "60b9ff8b-f28c-40cf-9a1c-207d12a5350b", "amount": { "amount": "1.67", "currency": "USD" }, "amountPaid": { "amount": "0.00", "currency": "USD" }, "amountRefunded": { "amount": "0.00", "currency": "USD" }, "settlementCurrency": "USD", "paymentMethods": [ { "type": "blockchain", "chain": "ETH", "address": "0xa7fa0314e4a3f946e9c8a5f404bb9819ed442079" } ], "fees": [ { "type": "blockchainLeaseFee", "amount": "0.00", "currency": "USD" } ], "paymentIds": [], "refundIds": [], "timeline": [ { "status": "pending", "context": null, "reason": null, "time": "2023-03-28T12:26:42.346901Z" }, { "status": "created", "context": null, "reason": null, "time": "2023-03-28T12:26:39.607607Z" } ], "expiresOn": "2023-03-28T20:26:42.238810Z", "updateDate": "2023-03-28T12:26:42.345054Z", "createDate": "2023-03-28T12:26:39.604637Z", "merchantWalletId": "1013833795" } } From the Circle developer logs screen, I can see a summary of all of my requests, including the response payload: Conclusion Readers of my publications may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” - J. Vester When I look back on each of my web3 publications, I am always taken aback by the number of steps or complexity involved in getting from point A to point B. While I am sure this was the case when I started working with web2, I feel like the learning cost is much higher now. This is where Circle really seems to bridge the gap. In the example illustrated above, I was able to leverage Java and Spring Boot to integrate a RESTful API into the Circle platform and start making real-time, online, secure payments. As a result, Circle is helping me adhere to my mission statement. Things tend to move fast in technology fields, and early adopters are often faced with challenges like: Documentation that is not polished, accurate, or even available Tools and technologies with steep learning curves Inability to easily integrate with existing frameworks, services, and applications From what I found in this exercise, Circle has avoided these pitfalls – giving me the option to avoid bank hours, processing times, and costly fees – while building my business with the USDC digital currency. In addition to USDC, it also supports card payments, cryptocurrencies, and other digital payment methods. And it has distinct advantages over other payment technologies, such as Apple Pay, PayPal, and Google Pay. If you are interested in the source code for this publication, it can be found over at GitLab. Have a really great day!
Elasticsearch is renowned for its robust search capabilities, making it a popular choice for building high-performance search engines. One of the key features that contribute to its efficiency is the use of aliases. Elasticsearch aliases provide a powerful mechanism for optimizing search operations, improving query performance, and enabling dynamic index management. In this article, we will explore how aliases can be leveraged to create an efficient search engine in Elasticsearch and demonstrate their practical usage through an architecture diagram and examples. What Are Aliases? Aliases in Elasticsearch are secondary names or labels associated with one or more indexes. They act as a pointer or reference to the actual index, allowing you to interact with the index using multiple names. An alias abstracts the underlying index name, providing flexibility and decoupling between applications and indexes. Benefits of Using Aliases in Elasticsearch Aliases in Elasticsearch provide a range of benefits that enhance index management, deployment strategies, search efficiency, and data organization. Let's explore the advantages of using aliases in more detail: Index Abstraction: Aliases allow the abstract of the underlying index names by using user-defined names. This abstraction shields the application from changes in index names, making it easier to switch or update indexes without modifying the application code. By using aliases, the application can refer to indexes using consistent, meaningful names that remain unchanged even when the underlying index changes. Index Management: Aliases simplify index management tasks. When creating a new index or replacing an existing one, it can update the alias to point to the new index. This approach enables seamless transitions and reduces the impact on application configurations. Instead of updating the application code with new index names, it only needs to modify the alias, making index updates more manageable and less error-prone. Blue-Green Deployments: Aliases are particularly useful in blue-green deployment strategies. In such strategies, it maintains two sets of indexes: the "blue" set represents the current production version, while the "green" set represents the new version being deployed. By assigning aliases to different versions of indexes, it can seamlessly switch traffic from the old version to the new version by updating the alias. This process ensures zero downtime during deployments and enables easy rollback if necessary. Index Rollover: Elasticsearch's index rollover feature allows it to automatically create new indexes based on defined criteria, such as size or time. Aliases can be used to consistently reference the latest active index, simplifying queries and eliminating the need to update index names in the application. With aliases, it can query the alias instead of referencing specific index names, ensuring that the application always works with the latest data without requiring manual intervention. Data Partitioning: Aliases enable efficient data partitioning across multiple indexes based on specific criteria by associating aliases with subsets of indexes that share common characteristics, such as time ranges or categories, which can narrow down the search space and improve search performance. For example, it can create aliases that include only documents from a specific time range, allowing it to search or aggregate data within that partition more efficiently. Filtering and Routing: Aliases can be associated with filters or routing values, providing additional flexibility in search operations. By defining filters within aliases, it can perform searches or aggregations on subsets of documents that match specific criteria. This enables to focus search operations on relevant subsets of data, improving search efficiency and reducing unnecessary data processing. Similarly, routing values associated with aliases allow direct search queries to specific indexes based on predefined rules, further optimizing search performance. Scenario To better understand aliases in action, let's consider a practical example of an e-commerce platform that handles product data and uses a search microservice for searching product information. The platform maintains an index named "index1" to store product information, as shown in Figure 1. Now, let's assume we want to introduce versioning, which involves indexing new product information to make it available for customer searches. The goal is to seamlessly transition to the new version without any disruptions to the application. Figure 1: Swapping of Alias between the indexes Step 1: Initial Index Setup The e-commerce platform begins with an index named "index1," which holds the existing product data. Step 2: Creating an Alias To ensure a smooth transition, the platform creates an alias called "readAlias" and associates it with the "index1" index. This alias acts as the primary reference point for the application, abstracting the underlying index name. Step 3: Introducing a New Index Version To accommodate updates and modifications, a new version of the index, "index2," is created. This new version will store the updated product information. While the search application, upon running and reads the data from Index1 through readAlias. Step 4: Updating the Alias To seamlessly switch to the new index version, the platform updates the alias "readAlias" to point to the "index2" index. This change ensures that the application can interact with the new index without requiring any modifications to the existing codebase. Step 5: Dropping the Older Index Once the alias is successfully updated, the platform can safely drop the older index, "index1," as it is no longer actively used. By updating the alias, the application can seamlessly interact with the new index without any code modifications. Additionally, it can employ filtering or routing techniques through aliases to perform specific operations on subsets of products based on categories, availability, or other criteria. Create an Alias in Elasticsearch Using the Elasticsearch Rest API PUT /_aliases { "actions": [ { "add": { "index": "index1", "alias": "readAlias" } } ] } Updating the Alias and Dropping the Older Index To switch the alias to the new index version and drop the older index, we can perform multiple actions within a single _aliases operation. The following command removes "index1" from the "readAlias" alias and adds "index2" to it: POST _aliases { "actions": [ { "remove": { "index": "index1", "alias": "readAlias" } }, { "add": { "index": "index2", "alias": " readAlias" } } ] } With these operations, the alias "readAlias" now points to the "index2" version, effectively transitioning to the new product data. Elasticsearch Aliases in a Spring Boot Application To use Elasticsearch aliases in a Spring Boot application, first, configure the Elasticsearch connection properties. Then, create an Elasticsearch entity class and annotate it with mapping annotations. Next, define a repository interface that extends the appropriate Spring Data Elasticsearch interface. Programmatically create aliases using the ElasticsearchOperations bean and AliasActions. Finally, perform search and CRUD operations using the alias by invoking methods from the repository interface. With these steps, you can seamlessly utilize Elasticsearch aliases in your Spring Boot application, improving index management and search functionality. Java @Repository public interface ProductRepository extends ElasticsearchRepository<Product, String> { @Autowired private ElasticsearchOperations elasticsearchOperations; @PostConstruct public void createAliases() { String indexV1 = "index1"; String indexV2 = "index2"; IndexCoordinates indexCoordinatesV1 = IndexCoordinates.of(indexV1); IndexCoordinates indexCoordinatesV2 = IndexCoordinates.of(indexV2); AliasActions aliasActions = new AliasActions(); // Creating an alias for indexV1 aliasActions.add( AliasAction.add() .alias("readAlias") .index(indexCoordinatesV1.getIndexName()) ); // Creating an alias for indexV2 aliasActions.add( AliasAction.add() .alias("readAlias") .index(indexCoordinatesV2.getIndexName()) ); // Applying the alias actions elasticsearchOperations.indexOps(Product.class).aliases(aliasActions); } } In this example, the createAliases() method is annotated with @PostConstruct, ensuring that the aliases are created when the application starts up. It uses the AliasActions class to define the alias actions, including adding the alias "readAlias" for both "index1" and "index2" indexes. Java @Service public class ProductService { @Autowired private ProductRepository productRepository; public List<Product> searchProducts(String query) { return productRepository.findByName(query); } // Other service methods for CRUD operations } In the ProductService class, we can invoke methods from the ProductRepository to perform search operations based on the "readAlias" alias. The Spring Data Elasticsearch repository will route the queries to the appropriate index based on the alias configuration. Conclusion Elasticsearch aliases provide a valuable tool for managing indexes, enabling seamless transitions, versioning, and efficient data retrieval in e-commerce platforms. By utilizing aliases, e-commerce businesses can ensure uninterrupted service, leverage search microservices, and enhance their overall data management capabilities. Embracing aliases empowers organizations to evolve their product indexes while maintaining a stable and performant application environment.
Everything Bad is Good for You is a pop culture book that points out that some things we assume are bad (like TV) have tremendous benefits to our well-being. I love the premise of disrupting the conventional narrative and was reminded of that constantly when debating some of the more controversial features and problems in Java. It’s a feature, not a bug… One of my favourite things about Java is its tendency to move slowly and deliberately. It doesn’t give us what we want right away. The Java team understands the requirements and looks at the other implementations, then learns from them. I’d say Java’s driving philosophy is that the early bird is swallowed by a snake. Checked Exceptions One of the most universally hated features in Java is checked exceptions. They are the only innovative feature Java introduced as far as I recall. Most of the other concepts in Java existed in other languages, checked exceptions are a brand new idea that other languages rejected. They aren’t a “fun” feature, I get why people don’t like them. But they are an amazing tool. The biggest problem with checked exceptions is the fact that they don’t fit nicely into functional syntax. This is true for nullability as well (which I will discuss shortly). That’s a fair complaint. Functional programming support was tacked onto Java and in terms of exception handling it was poorly done. The Java compiler could have detected checked exceptions and required an error callback. This was a mistake made when these capabilities were introduced in Java 8. E.g. if these APIs were better introduced into Java we could have written code like this: Java api.call1() .call2(() -> codeThatThrowsACheckedException()) .errorHandler(ex -> handleError(ex)) .finalCall(); The compiler could force us to write the errorHandler callback if it was missing which would satisfy the spirit of the checked exceptions perfectly. This is possible because checked exceptions are a feature of the compiler, not the JVM. A compiler could detect a checked exception in the lambda and require a specially annotated exception handling callback. Why wasn’t something like this added? This is probably because of the general dislike of checked exceptions. No one attempted to come up with an alternative. No one likes them because no one likes the annoying feature that forces you to tidy up after yourself. We just want to code, checked exceptions force us to be responsible even when we just want to write a simple hello world… This is, to a great extent, a mistake… We can declare that main throws an exception and create a simple hello world without handling checked exceptions. In large application frameworks like Spring, checked SQLException is wrapped with a RuntimeException version of the same class. You might think I’m against that but I’m not. It’s a perfect example of how we can use checked exceptions to clean up after the fact. Cleanup is performed internally by Spring, at this point the exception-handling logic is no longer crucial and can be converted to a runtime exception. I think a lot of the hate towards the API comes from bad versions of this exception such as MalformedURLException or encoding exceptions. These exceptions are often thrown for constant input that should never fail. That’s just redundant and bad use of language capabilities. Checked exceptions should only be thrown when there’s cleanup we can do. That’s an API problem, not a problem with the language feature. Null Pouring hate on null has been trending for the past 15+ years. Yes, I know that quote. I think people misuse it. Null is a fact of life today, whether you like it or not. It’s inherent in everything: databases, protocols, formats, etc. Null is a deep part of programming and will not go away in the foreseeable future. The debate over null is pointless. The debate that matters is whether the cure is better than the disease and I’m yet unconvinced. What matters isn’t if null was a mistake, what matters is what we do now. To be fair, this directly correlates to your love of functional programming paradigms. Null doesn’t play nicely in FP which is why it became a punching bag for the FP guys. But are we stepping back or stepping forward? Let’s break this down into three separate debates: Performance Failures Ease of programming Performance Null is fast. Superfast. Literally free. The CPU performs a null check for us and handles exceptions as interrupts. We don’t need to write code to handle null. The alternatives can be very low overhead and can sometimes translate to null for CPU performance benefits. But this is harder to tune. Abstractions leak and null is the way our hardware works. For most intents and purposes, it is better. There is a caveat. We need the ability to mark some objects as non-null for better memory layout (as Valhalla plans to do). This will allow for better memory layout and can help speed up code. Notice that we can accomplish this while maintaining object semantics, a marker would be enough. I would argue that null takes this round. Failures People hate NullPointerException. This baffles me. NullPointerException is one of the best errors to get. It’s the fail-fast principle. The error is usually simple to understand and even when it isn’t; it isn’t far off. It’s an easy bug to fix. The alternative might include initializing an empty object which we need to verify or setting a dummy object to represent null. Open a database that has been around long enough and search for “undefined”. I bet it has quite a few entries… That’s the problem with non-null values. You might not get a failure immediately. You will get something far worse. A stealth bug that crawls through the system and pollutes your data. Since null is so simple and easy to detect there’s a vast number of tools that can deal with it both in runtime and during development. When people mention getting a null pointer exception in production I usually ask: what would have been the alternative? If you could have initialized the value to begin with then why didn’t you do it? Java has the final keyword, you can use that to keep non-null stateful values. Mutable values are the main reason for uninitialized or null values. It’s very possible that a non-null language wouldn’t fail. But would its result be worse? In my experience, corrupt data in storage is far worse. The problem is insidious and hides under the surface. There’s no clue as to the origin of the problem and we need to set “traps” to track it down. Give me a fail-fast any day. In my opinion, null has this one hands down… Ease of Programming An important point to understand is that null is a requirement of modern computing. Our entire ecosystem is built on top of null. Languages like Kotlin demonstrate this perfectly, they have null and non-null objects. This means we have duplication. Every concept related to objects is expressed twice, and we need to maintain semantics between null and non-null. This raises the complexity bar for developers new to such languages and makes for some odd syntax. This in itself would be fine if the complexity paid off. Unfortunately, such features only resolve the most trivial non-issue cases of null. The complex objects aren’t supported since they contain null retrieved from external sources. We’re increasing language complexity for limited benefit. Boilerplate This used to be a bigger issue, but looking at a typical Java file vs. TypeScript or JavaScript, the difference isn’t as big. Still, people nitpick. A smart engineer I know online called the use of semicolons in languages: "Laziness". I don’t get that. I love the semicolon requirement and am always baffled by people who have a problem with that. As an author it lets me format my code while ignoring line length. I can line break wherever I want, the semicolon is the part that matters. If anything, I would have loved to cancel the ability to write conditional statements without the curly braces e.g.: Java if(..) x(); else y(); That’s terrible. I block these in my style requirements; they are a recipe for disaster with an unclear beginning or end. Java forces organization, this is a remarkable thing. Classes must be in a specific file and packages map to directories. This might not matter when your project is tiny, but as you handle a huge code base, this becomes a godsend. You would instantly know where to look for clues. That is a powerful tool. Yet, it leads to some verbosity and some deep directory structures. But Java was designed by people who build 1M LoC projects, it scales nicely thanks to the boilerplate. We can’t say the same for some other languages. Moving Fast Many things aren’t great in Java, especially when building more adventurous startup projects. That’s why I’m so excited about Manifold. I think it’s a way to patch Java with all the “cool stuff” we want while keeping the performance, compatibility and stability we love. This can let the community move forward faster and experiment, while Java as a platform can take the slow and steady route. Final Word Conventional wisdom is problematic. Especially when it is so one-sided and presents a single-dimension argument in which a particular language feature is inferior. There are tradeoffs to be made and my bias probably shines through my words. However, the cookie-cutter counterpoints don’t cut it. The facts don’t present a clear picture to their benefit. There’s always a tradeoff and Java has walked a unique tightrope. Even a slight move in the wrong direction can produce a fast tumbling-down effect. Yet it maintains its traction despite the efforts of multiple different groups to display it as old-fashioned. This led to a ridiculous perception among developers of Python and JavaScript as “newer” languages. I think the solution for that is two-fold. We need to educate about the benefits of Java's approach to these solutions. We also need solutions like Manifold to explore potential directions freely. Without the encumberment of the JCP. A working proof of concept will make integrating new ideas into Java much easier.
Sometimes, it is necessary to examine the behavior of a system to determine which process has utilized its resources, such as memory or CPU time. These resources are often scarce and may not be easily replenished, making it important for the system to record its status in a file. By doing so, it becomes feasible to identify the most resource-intensive process in the past. If the system has not encountered an Out-of-Memory (OOM) killer, which can be found in the syslog, this information can be used to further pinpoint the problematic process. Atop Tool: An Overview There is a special tool that can be used both for real-time monitoring system usage and collecting system status into logs in the background. This is atop. With atop, you can gather information on CPU and memory usage, which can also be collected by other popular monitoring tools like top and htop. Additionally, atop provides insights into I/O and network usage, eliminating the need to install additional tools for network and I/O monitoring, such as iftop and iostat. In my opinion, atop is a versatile tool for many tasks. Atop is an open-source project and is available for most Linux distributions. What Is Atop Used For? Atop can be used for incident investigations in a Linux environment. Atop is a system resource monitor that can provide detailed information about system activity, including CPU, memory, and disk usage, as well as process-level activity During an incident investigation, atop can help you identify which processes were running at the time of the incident, how many resources they were consuming, and whether there were any spikes in resource usage that may have contributed to the incident. You can also use atop to monitor specific system components, such as network activity, and track changes over time. Basic use cases are listed below: Real-time resources monitoring Incidents analysis of the system behavior Capacity planning Resource allocation For most of the cases in the list, you can use modern monitoring systems like Zabbix and Prometheus. In my personal experience, I find atop to be a useful tool for troubleshooting and identifying the root cause of issues. While special monitoring systems can provide consolidated data on resource usage, they may not be able to answer specific questions about which processes led to server inaccessibility. Atop, on the other hand, can provide detailed information on individual processes, making it easier to differentiate between them and understand their impact on system performance. General principles working with atop: Real-time monitoring Incident investigation The first approach can be helpful for debugging or profiling your application, providing insights into its behavior and performance. On the other hand, the second approach is more useful for incident investigations, allowing you to identify the root cause of system failures or performance issues. Setting Up For writing logs, you should launch a demon: Shell # systemctl start atop It is recommended to change the interval for collecting data: Shell # vi /lib/systemd/system/atop.service You can find the env variable: Shell LOGINTERVAL=60 Change this value (in seconds) and reload the systemd unit configuration: Shell # daemon-reload Then start: Shell # systemctl start atop After that, atop will write info into a log file every 60 seconds (as above). Real-Time Monitoring Practical Examples Launching 1. To launch the utility type: Shell # atop In a terminal and track resource consumption: 2. In order to change the interval, press 'I' and enter the number in seconds: I prefer to set up an interval of 1-2 seconds. 3. In case the consumption of server resources reaches a critical value, it will be marked with a specific color: Red if consumption is critical Cyan if consumption is almost critical(80% of critical) The amount considered critical varies for different resources: 90% utilization of CPU 70% usage of disk 90% of network bandwidth 90% of memory occupation 80% of SWAP Of course, these parameters can be modified. Pay attention, the CPU has two cores, and you can see utilization distribution among these cores. 4. For killing a process, press ‘k’ and then type a PID of the process to be killed(it’s similar to ‘top’). Further, you can specify a signal to be sent to a process. Output Options Resource Related Output 1. To show commands how they have been run, type ‘c’: 2. If you would like to show all about memory, use the ‘m’ key: 3. There is ‘g’ for showing generic output. It might be needed when you want to revert to initial output. This is the default output. 4. For output of disk things, press ‘d’: 5. Network-related output (UDP, TCP, and bandwidth). For this, press ‘n’: Please, take into account that a kernel module netatop must be installed. Otherwise, atop won’t be out network-related information. This module allows us to show network activity per process. Refer to the official web page. So, we considered basic options, which is enough for most cases. Also, there are interesting options I recommend considering: ‘y’ — for showing per thread. It is a very useful functionality for examining the behavior of multi-threaded applications(or for debugging such apps). ‘e’ — shows GPU utilization ‘o’ — if you’d like to customize the output, it’s possible in ~/.atoprc, then you can use your own output just by pressing ‘o’ ‘z’ — if you need to pause your atop Aggregation Functions Top of Resources Eaters 1. Switch to show output accumulated per user, push ‘u’: 2. Output per process, hit ‘p’: 3. For output processes accumulated per Docker container, there is ‘j’ key: Where ‘host’ — host native processes. For observing only a specific container, use ‘J’ for this. Sorting Options 1. For sorting by CPU usage, press shift + ‘c’(or capital C) This is default behavior. 2. Sort by memory usage, hit shift + ‘m’(capital M) 3. Sort by disk usage, hit shift + ‘d’(capital D) 4. Network utilization sorting, use shift + ‘n’ (capital N) 5. If you are tracking threads, there is option ‘Y’ to aggregate threads by the process. Note. Sorting and output modifiers are different and should be used in combination. Incidents Examining (Looking to the Past) All those rules for real-time monitoring work for looking for events in logs. Initially, we need to start reading logs instead of real-time status output: Shell # atop -r /var/log/atop/atop.log Will read the log file. Navigating Navigate within the file using the t (forward) and shift+t keys (back). This allows you to go to the next sample or go back to the previous one. Time Limit There are options to limit time: Shell # atop -r /var/log/atop/atop.log -b 1400 Opens atop from 14:00 of the current day to the end of the current log file: <screencast> Shell # atop -r /var/log/atop/atop_20230523.log -b 1400 Opens file written on 25 of May 2023 year after 14:00, and navigates until 23:59 of the 25 of May: <screencast> Shell # atop -r /var/log/atop/atop_20230525 -b 14:00 -e 16:00 You’ll see records from 14:00 until 16:00 written on 25 of May 2023: <screencast> In case your system does not rotate logs, you can use atop's begin and end limitations in such view: Shell [-b [YYYYMMDD]hhmm ] [-e [YYYYMMDD]hhmm ] As was told above, sorting, aggregating data, and showing specific output related to some resources all these work perfectly in this mode. Other Atop Capabilities Atop has a unique feature that allows users to create charts directly in their terminal. To use this feature, you need only Python and pip, then install a specific package atopsar-plot, and you are able to visualize historical data. While this feature may not be particularly useful for modern systems that are already under monitoring, it's worth noting as an additional capability of the program. Monitor a Process Resource Consumption When it comes to monitoring a server, having the right tools in place is crucial to ensure optimal performance and identify potential issues. Two popular systems for server monitoring are Zabbix and Prometheus, both of which are capable of monitoring various process resources consumptions such as memory, CPU, and disk usage. These systems can extract information about a process from the /proc filesystem and send it to the server for storage. I should tell you monitoring systems extract info about spending resources by a specific process only or totally by all processes with no differentiation. Atop, in this case, is a powerful tool. Atop vs. Top While both atop and top are system performance monitoring tools, they differ in their capabilities and level of detail. Top is a simple command-line utility that provides a basic overview of the system's current processes and their resource usage. It is useful for quickly identifying processes that are consuming significant resources, but it does not provide detailed information on system activity. Atop, on the other hand, provides a more detailed report of system activity, including CPU usage, memory usage, and disk I/O. It can also monitor system activity over a period of time, making it useful for analyzing long-term trends and identifying patterns. Conclusion Atop is a powerful tool for system performance monitoring and analysis. It provides detailed information on system activity and can be used to diagnose and troubleshoot performance issues, plan for future capacity requirements, monitor security and compliance and allocate resources effectively. While it may be more complex than traditional tools like top, it offers greater insight into system activity and can be an invaluable tool for system administrators and IT professionals.
4 Signs Your Sprint Is Failing, and 4 Ways to Fix It
June 15, 2023 by
Building Resilience With Chaos Engineering and Litmus
June 15, 2023 by
Integrating Essbase Cubes With OBIEE for Advanced Reporting
June 15, 2023 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by
Building Resilience With Chaos Engineering and Litmus
June 15, 2023 by
4 Signs Your Sprint Is Failing, and 4 Ways to Fix It
June 15, 2023 by
Microservices With Apache Camel and Quarkus (Part 4)
June 15, 2023 by CORE
Mastering AI: The Power of Prompt Engineering Solutions
June 15, 2023 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by