Tips & Tricks

July 18, 2024

7 Minute Read

Visualising Network Patterns with Splunk and Graphistry

By Tanzil Kazi

A picture is worth a thousand words.

One of the best ways to understand what is happening in the environment for security, observability or any other use case is to visualise the data. Network data is vital since it tells us how systems are connected to each other and understanding it uplifts resiliency across the organisation. In this blog post, we use the Splunk App for Data Science and Deep Learning (DSDL) and Graphistry to visualise network data to represent the current state of the environment. Deep understanding of the network can help us identify and stop security threats earlier and improve application performance.

Network data is always challenging to visualise due to the high dimensionality of the data. In any given environment, there can be tens of thousands of IPs, MAC addresses etc and visualising them can be both time-consuming and difficult to interpret. Splunk is an extensible data platform that specialises in analysing unstructured data and providing unparalleled visibility into the environment. There are several apps available on Splunkbase to monitor network traffic with athe focus on statistical analysis of the data. But that does not always tell the full story.

Graph visualisation is necessary to understand and interpret network data. This can also be achieved in Splunk and Splunkbase comes to the rescue! The 3D graph network topology visualisation app is excellent to visualise network data but it takes an increasingly longer time to render the image as the number of nodes increases. And this is exactly what Graphistry solves! Graphistry is a visual graph online platform which leverages GPU acceleration to build and visualise large graphs networks quickly. Banking on Splunk’s extensibility, we can use Jupyter Notebooks in DSDL to integrate with Graphistry. We use the PyGraphistry library in a Jupyter Notebook running in a DSDL container to integrate with the Graphistry API. My esteemed colleague Philipp Drieger provides an excellent overview of how Graphistry can be used for cybersecurity in his blog "Supercharge Cybersecurity Investigations with Splunk and Graphistry: A Powerful Combination for Interactive Graph Exploration".

This blog focuses on the following two generic use cases which should apply to anyone who is interested in understanding network traffic:

Use case 1: Visualise traffic between all nodes
Use case 2: Visualise important (EigenCentrality) nodes and their traffic

In order to explore the data with Graphistry, we are using the Splunk BOTS V3 dataset. The BOTS V3 is a rich open-source security dataset with over 100 source types. Since our focus is on network data, we are using the Network Traffic data model. The data model is a key Splunk capability, used to normalise similar data types into the Common Information Model (CIM), to correlate across them and also to process large volumes of data quickly. Make sure to install all the relevant apps as specified here since the apps make the BOTS V3 data CIM compliant.

Let’s dive into the first use case.

Use Case 1: Visualise Traffic Throughput Between All Nodes

For this use case, we run the following search, to sum up all the bytes sent between hosts. We do a few additional steps to enrich the data:

remove default routes and broadcast addresses
remove IPv6 addresses for simplicity
identify internal IP addresses and mark everything as external

The |fit command sends the data to the DSDL container so that the data can be utilised in the Jupyter Notebook. You can find more details on stage data in DSDL here.

_^Unset | tstats sum(All_Traffic.bytes) from datamodel=Network_Traffic where NOT (All_Traffic.src_ip="0.0.0.0" OR All_Traffic.src_ip="255.255.255.255" OR All_Traffic.dest_ip="0.0.0.0" OR All_Traffic.dest_ip="255.255.255.255") BY All_Traffic.src_ip All_Traffic.dest_ip | rename All_Traffic.src_ip as src_ip All_Traffic.dest_ip as dest_ip sum(All_Traffic.bytes) as bytes | eval src_len=len(src_ip), dest_len=len(dest_ip) | where src_len < 16 AND dest_len < 16 | fields - *len | eval src_isLocal = if(cidrmatch("192.168.0.0/16",src_ip) OR cidrmatch("172.16.0.0/12",src_ip) OR cidrmatch("10.0.0.0/8",src_ip), "yes", "no"), dest_isLocal = if(cidrmatch("192.168.0.0/16",dest_ip) OR cidrmatch("172.16.0.0/12",dest_ip) OR cidrmatch("10.0.0.0/8",dest_ip), "yes", "no") | eval gb = round(bytes/(1024*1024*1024),1) | fields src_ip, dest_ip, gb, src_isLocal, dest_isLocal | sort -gb | eval temp_ip = mvappend(src_ip, dest_ip) | mvexpand temp_ip | join temp_ip [| tstats sum(All_Traffic.bytes) from datamodel=Network_Traffic where NOT (All_Traffic.src_ip="0.0.0.0" OR All_Traffic.src_ip="255.255.255.255" OR All_Traffic.dest_ip="0.0.0.0" OR All_Traffic.dest_ip="255.255.255.255") BY All_Traffic.src_ip All_Traffic.dest_ip | rename All_Traffic.src_ip as src_ip All_Traffic.dest_ip as dest_ip sum(All_Traffic.bytes) as bytes | eval src_len=len(src_ip), dest_len=len(dest_ip) | where src_len > 0 AND dest_len > 0 AND src_len < 16 AND dest_len < 16 | fields src_ip dest_ip bytes | fit GraphCentrality src_ip dest_ip compute="eigenvector_centrality" | sort - eigenvector_centrality | table src_ip eigenvector_centrality | dedup src_ip | sort 5 -eigenvector_centrality | rename src_ip as temp_ip] | fields src_ip, dest_ip, gb, src_isLocal, dest_isLocal | where gb > 1 | fit MLTKContainer mode=stage algo=graphistry_notebook from * into app:ImportantIPs

The data can now be graphically represented in Graphistry, thanks to the power of DSDL. DSDL makes it extremely easy to integrate the data in Splunk to any other system like Graphistry. It provides the flexibility to run any docker container with custom libraries and frameworks, making Splunk infinitely extensible. The below code is run in a Jupyter notebook in DSDL which renders the graph network below.

_^Python df, param = stage("ImportantIPs") src_df = df[['src_ip', 'src_isLocal']].rename(columns={"src_ip":"ip","src_isLocal":"isLocal"}) dest_df = df[['dest_ip', 'dest_isLocal']].rename(columns={"dest_ip":"ip","dest_isLocal":"isLocal"}) ip_df = pd.concat([src_df,dest_df]) g = graphistry.edges(df).bind(source='src_ip', destination='dest_ip', edge_weight='gb').nodes(ip_df,'ip') g.encode_point_color( 'degree_in', categorical_mapping={ '5': 'red', '4': 'blue', '3': 'purple', '2': 'orange', '1': 'green', }, default_mapping='white' ).encode_edge_color('gb', ['green', 'yellow', 'red'], as_continuous=True).plot()

The nodes with internal IP addresses have been coloured white whereas all external nodes have been coloured red to make them abundantly visible. The current dataset has over 3000 nodes and 4000 edges which are drawn in seconds and presented inside the Jupyter Notebook! Graphistry constructs the shape of the network depending on the degree of each node and also the weight of the edges. In this case, the weight of the edge is represented by the amount of data sent between the nodes. From the diagram, it’s clear that the external node with IP 34.215.24.225 is important. Also, the edges are coloured depending on the amount of bytes that are being transmitted with green being the least and red being the most.

What is also impressive about Graphistry is that we can interact with the visualisation. Hovering over the IP in question clearly shows the other nodes that it is connected to.

Use Case 2: Visualise Important (EigenCentrality) Nodes and Their Traffic

For this use case, we again start with a Splunk search. This time around, we have a sub search which determines what are the important nodes. In order to determine importance, we use the metric eigenvector centrality, which essentially calculates the influence of a node by taking into account which other influential nodes it is connected to. This can be achieved with the |fit command in one line of SPL.

Once the top 5 important nodes are determined by the subsearch, we pass the list of nodes to another search (the same search used in use case 1) to determine all the flows from and to only these nodes. We limit the complexity by only looking at flows which are over 1GB. The network graph is again visualised in Jupyter Notebooks in DSDL by leveraging Graphistry.

The most important nodes are all internal nodes as expected, marked by the white colour and the large size in the diagram. Using the flexible framework in Graphistry, we colour-code the external nodes depending on the number of connections coming in from the important nodes. The colour coding allows us to easily explore the network graph and get a deep insight into what is happening in the network by just point and clicks.
Radial axis can be used in order to differentiate between internal and external nodes. This works best in the scenario where there are not that many nodes to represent.

_^Python df, param = stage("ImportantIPs_short") src_df = df[['src_ip', 'src_isLocal']].rename(columns={"src_ip":"ip","src_isLocal":"isLocal"}) dest_df = df[['dest_ip', 'dest_isLocal']].rename(columns={"dest_ip":"ip","dest_isLocal":"isLocal"}) ip_df = pd.concat([src_df,dest_df]) g = graphistry.edges(df).bind(source='src_ip', destination='dest_ip', edge_weight='gb').nodes(ip_df,'ip') # radius of internal and external IN = 500 OUT = 1000 # assign radial axis g2 = g.encode_axis([ {"r": IN, "internal": True}, {"r": OUT, "external": True} ]) #initialize ring and idx series, map radial axis to isLocal g2 = g2.nodes( g2._nodes.assign( idx=g._nodes.reset_index().index, ring=g._nodes["isLocal"].map({'yes': IN, 'no': OUT}) ) ) # use ring and idx to update x,y of nodes g3 = (g2.nodes( g2._nodes.assign( < x = g2._nodes.ring * g2._nodes.idx.apply(math.cos), y = g2._nodes.ring * g2._nodes.idx.apply(math.sin) )).settings(url_params={'lockedR': True, 'play': 0}) ) g3.encode_point_color( 'degree_in', categorical_mapping={ '5': 'red', '4': 'blue', '3': 'purple', '2': 'orange', '1': 'green', }, default_mapping='white' ).encode_edge_color('gb', ['green', 'yellow', 'red'], as_continuous=True).plot()

Conclusion

Visualising network data is crucial for understanding complex environments in security, observability and other domains. By using Splunk DSDL and Graphistry together, we can process and visualise large and complex network datasets quickly and efficiently, leading to a more resilient and secure environment.

Tanzil Kazi

As a Senior Sales Engineer at Splunk, Tanzil brings more than a decade of experience across multiple IT disciplines. With a keen interest in machine learning, Tanzil is focused on developing innovative solutions which showcases the breath of capabilities of Splunk technologies.

Tips & Tricks 10 Min Read

How to Create a Modular Alert

Splunk 6.3 users can use API to write modular alerts for apps-notifications, automation, info-gathering. See apps.splunk.com & the official docs for more info.

Tips & Tricks 3 Min Read

Splunking Microsoft Azure Network Watcher Data

Tips & Tricks 2 Min Read

Splunk Command> Cluster

The Cluster command with Nagios data - an industrial monitoring tool for IT infrastructures - can gather history & diagnostics on issues in the organization.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram