Splunk is committed to using inclusive and unbiased language. This blog post might contain terminology that we no longer use. For more information on our updated terminology and our stance on biased language, please visit our blog post. We appreciate your understanding as we work towards making our community more inclusive for everyone.
Well, NetFlow provides a lens into the why of network use - What applications are being used, what is driving observed load on a network, and what are my users ultimately trying to accomplish with the network access? By combining NetFlow data with other, lower-level network and infrastructure data you can build a rich picture of network use end-to-end and cross-layer, peering into the applications and services which are being supported and are depending on the network infrastructure! This can help solve myriad use cases, ranging from capacity management, peering strategies, security, performance optimization, and more.
And NetFlow is (mostly) standard and it’s ubiquitous! Essentially every IP network device supports NetFlow. As a rich, passively generated, standard data source available out of the box from your gear, it’s just waiting to be added to the mix, providing some additional insights and goodness!
So, now that we understand the importance of NetFlow data, let’s see how Splunk can help us. Ready to take this journey? Let’s go!
First of all, let me introduce you to our sherpa in this Himalayan ascent, my good friend Splunk Stream.
The Splunk Stream app, lets you capture, filter, index, and analyze streams of network event data. The software acts as a network traffic sniffing tool. Through a simple web GUI, you can filter which protocol metadata you want to index, aggregate for statistical analysis of event data, collect NetFlow data, capture full packet streams, monitor network trends and app performance and much more!
So, we will set up our base camp by deploying the Splunk Stream app in our Splunk environment.
A solid base camp is crucial for the success of our ascent. There are different deployment architectures available for deploying Splunk Stream both on-premises and on Splunk Cloud. We'll assume that we have a functioning Splunk on-premises distributed environment where we need to deploy Splunk Stream components.
In the following figure you can see the general architecture for Splunk Stream in a distributed Splunk deployment:
The figure above shows a typical multilayer Splunk deployment with Search Head layer, Indexing layer and Forwarding layer adding to all of them the Splunk Stream components required to make Splunk Stream work.
Note that for the forwarding layer we can set up Splunk universal forwarders, Splunk heavy forwarders or Independent Stream Forwarder (ISF). The ISF is a standalone Stream forwarder. The ISF sends captured network data to Splunk using the HTTP event collector and does not require a Splunk universal forwarder to collect wire data. It is helpful in networks and deployments where a universal forwarder cannot be installed.
NetFlow traffic will be generated at network equipment such as routers or switches where universal forwarders are unlikely to be installed. Therefore we will use Independent Stream Forwarders as our forwarding layer for NetFlow traffic.
For the network input layer, the network equipment such as routers or switches will be directly sending NetFlow logs to the forwarding layer via UDP. In this type of equipment, universal forwarders are unlikely to be installed. Moreover, Splunk’s best practices for scaling flow ingestion include using Independent Stream Forwarders instead of Universal Forwarders. Therefore we will use Independent Stream Forwarders as our forwarding layer for NetFlow traffic. ISF will forward NetFlow traffic to the indexers via HTTP Event Collector (HEC) configured at the indexer layer.
Taking into account all the facts explained above, the specific architecture for Splunking NetFlow in a distributed Splunk environment should be similar to this:
Now that we know specifically the architecture we need to set up, we will pick up the specific steps we need to follow from the “Install Splunk Stream in a distributed deployment general guide”.
The sub-steps to be taken:
The sub-steps to be taken:
Now we will configure Splunk Stream to receive data from remote machines via HTTP event collector (HEC). To do that log in to the search head and launch Stream app. You will get a prompt to configure Stream for the first time:
We will check Collect data from other machines.
Then we will click on Let’s get started.
Now we will install a standalone Stream forwarder and configure it to send captured network data to the Indexing layer using the HTTP event collector. Splunk provides the binary code required to install the ISF on compatible Linux machines.
To perform the installation and configuration to push data to HEC you simply need to follow the following guide: “Install an Independent Stream Forwarder”
Cool! We have successfully settled our base camp. Now we have our environment ready to start configuring NetFlow collection!
Now we need to climb to the first Everest camp: camp 1. For that, we will use the compelling Splunk documentation on Using Splunk Stream to ingest NetFlow and IPFIX data extracting the specific steps we will need to take to reach fast and safe.
In this step, we will perform the configuration at Stream app of a new NetFlow stream that will be collected via HEC from ISF and will be indexed in netflow_index
The sub-steps to be taken:
1) Log in to the search head where the Splunk App for Stream is installed.
2) Navigate to the Splunk App for Stream, then click Configuration > Configure Streams.
3) Click New Stream > Metadata.
4) Enter Name as netflow_test.
5) Select NetFlow as the protocol.
6) The NetFlow option works for NetFlow, sFlow, jFlow, and IPFIX protocols.
Enter a description then click Next.
7) Select No in the Aggregation box then click Next.
8) (Optional) Deselect any fields that do not apply to your use case then click Next.
9) (Optional) Develop filters to reduce noise from high traffic devices then click Next.
10) Select the index for this collection and click Enable then click Next. For example netflow_index (that you should have previously created)
11) Select the Default group and Create_Stream.
Optionally you could set a Forwarder Group at Configuration > Distributed Forwarder Management to help you manage your NetFlow dedicated ISF.
Also optionally you could set aggregation options. Aggregated streams group events into aggregation buckets, with one bucket allocated for each unique collection of Key fields. At the end of the time interval, the app emits an object that represents each bucket.
For example, if you apply the mean and values aggregation functions to the bytes_in field over a 60 second interval and select src_ip as the only Key field, Stream aggregates the mean and values of bytes_in into a separate bucket for each src_ip seen in the selected interval.
Splunk Stream lets you apply aggregation to network data at capture-time on the collection endpoint before data is sent to indexers. You can use aggregation to enhance your data with a variety of statistics that provide additional insight into activities on your network. When you apply aggregation to a Stream, only the aggregated data is sent to indexers. Using aggregation can help you decrease both storage requirements and license usage.
In this step, we will configure the Independent Stream Forwarder on your Splunk platform deployment. To ingest flow data, configure streamfwd to receive data at a specific IP address and port and specify the flow protocol. To do this, add a set of flow configuration parameters to streamfwd.conf as follows:
1) Edit local/streamfwd.conf.
2) Add the following parameters to specify the IP address to bind to, the port number to bind to, and the flow protocol.
For example, to receive NetFlow and sFlow data at IP address 172.23.245.122 on port 9995 and 6343 respectively, configure streamfwd.conf as shown:
3) For high volume with NetFlow data, configure additional NetFlow processing threads as shown:
4) Save your changes.
5) Restart your Splunk platform deployment.
6) Navigate to your independent Stream Forwarder's etc/sysctl.conf directory. Adjust your kernel settings to increase buffer sizes for high-volume packet capture.
7) Reload the settings:
8) Restart the streamfwd service:
To see more configuration options (i.e clustered indexers) have a look at the “Use Splunk Stream to ingest Netflow and IPFIX data”
Finally, you will set your network devices to send NetFlow to your ISF receiver IP address at NetFlow receiver port.
If you do not have any network device ready yet or you just want to test your setup before going to production you can use a NetFlow simulator. This NetFlow simulator is really easy to set up and works perfectly well for testing purposes.
Before going straight to the Splunk Web UI and running searches, verify the Independent Stream Forwarder’s web interface (http://ISF_IP:8089) and check that traffic is going out (Bytes Out increases over time):
If traffic is going out, open the “Search & Reporting” App and have a look at what is being indexed in sources, by clicking in Data Summary filtering by NetFlow:
Finally check NetFlow data is being indexed at netflow_index we previously created:
Remember that Splunk offers a reduced-cost license to ingest your NetFlow data that allows you to ingest NetFlow sourcetype at a lower per GB cost than your normal license. If you want to read more about it and get to know other sourcetypes with reduced-cost-license available have a look at Splunk Licensed Capacity.
Credit to Raúl Marín for creating much of this content and providing a detailed step-by-step guide on deploying a single instance with an Independent Stream Forwarder: “NetFlow traffic ingestion with Splunk Stream and an Independent Stream Forwarder”
Credit to Matt Olson for his guidance and support in the publication of this blog series and his introduction on should anybody care about NetFlow.
Awesome, we have reached Everest camp 1.
Now, imagine you want some help to quickly get value from your NetFlow data at Splunk and you want not only to be able to play with real time data but also with long term data (i.e months) and get some trends or even apply advanced analytics on that. Does it sound good to you? Do you want to reach Everest camp 2? Then do not miss the next chapter of this series!
Meanwhile...happy Splunking!
----------------------------------------------------
Thanks!
Lucas Alados
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.