Security

January 05, 2024

7 Minute Read

Ghost in the Web Shell: Introducing ShellSweep

By Michael Haag, Splunk Threat Research Team

In the cyber realm, where digital defense and offense is an ongoing game of cat and mouse, one of the most potent weapons in an attacker's arsenal is the web shell. A seemingly innocuous piece of code that, once embedded in a server, allows an attacker to maintain their access and control. The hidden danger of web shells is their stealthiness and versatility, making them a challenging threat to uncover and neutralize.

My initiation into the world of web shells took place during my time at a Managed Security Service Provider (MSSP). There, we operated a public-facing JBOSS server that was a magnet for cyber attacks. Knowing the vulnerability of such a system, it was no surprise when I noticed a steady stream of attack attempts while monitoring alerts. Web shells…web shells everywhere. Over time, I developed an appreciation for the creativity and ingenuity of cybercriminals in their manipulation of web shells, like an artist with a flair for the dark side of the internet. One striking example was a jpeg image hiding an embedded Base64 script code – a wolf in sheep's clothing, if you will.

Web shells continue to pose significant threats in the cybersecurity landscape. They are frequently seen wielded by adversaries in sophisticated campaigns, one such case being the recent compromise of MOVEit Transfer and MOVEit Cloud by Progress Software, which involved an ASPX web shell. These instances underscore the importance of continuous vigilance and the need for innovative methods to detect and mitigate these threats.

In response to the evolving threat landscape, Splunk has developed a suite of utilities designed to help organizations detect, catalog, and combat malicious web shells. Today, we're thrilled to introduce you to ShellSweep, a powerful tool designed to hunt down and flag potential web shells lurking in your web servers, which encompasses three utilities: ShellScan, ShellCSV and ShellSweep. These utilities work in tandem to identify potential web shells and help you provide a comprehensive defense against web shells in your environments.

This blog post provides an overview of the concept of entropy and its role in detecting web shells and deeper dives into ShellScan, ShellCSV, and ShellSweep to show how they operate and how you can use them to fortify your defenses.

What is Entropy?

Entropy, in the context of information theory or data science, is a measure of the unpredictability, randomness, or disorder in a set of data. The concept was introduced by Claude Shannon in his 1948 paper A Mathematical Theory of Communication.

When applied to a file or a string of text, entropy can help assess the randomness of the data. Here's how it works:

If a file consists of completely random data (each byte is just as likely to be any value between 0 and 255), the entropy is high, close to 8 (since log2(256) = 8).
If a file consists of highly structured data (for example, a text file where most bytes are ASCII characters), the entropy is lower.
For purposes of detecting potential web shells or malicious files, entropy can be a useful indicator:
- Many obfuscated scripts or encrypted payloads can have high entropy because the obfuscation or encryption process makes the data look random.
- A normal text file or HTML file would generally have lower entropy because human-readable text has patterns and structure (certain letters are more common, words are usually separated by spaces, etc.).

Hence, a file with unusually high entropy might appear suspicious and warrant further investigation. However, it's not a surefire indicator of maliciousness – there are plenty of legitimate reasons a file might have high entropy, and plenty of ways malware might avoid causing high entropy. Entropy is just one tool in a larger toolbox for detecting potential threats.

ShellSweep includes a Get-Entropy function that calculates the entropy of a file's contents by:

Counting how often each character appears in the file.
Using these frequencies to calculate the probability of each character.
Summing -p*log2(p) for each character, where p is the character's probability. This is the formula for entropy in information theory.

ShellScan, ShellCSV and ShellSweep

Working together, the ShellScan, ShellCSV, and ShellSweep utilities offer the ability to scan multiple directories at once, filter known good files by SHA256 or src, scan specific file extensions, and evaluate the potential maliciousness of these files based on their entropy.

The workflow of ShellSweep is as follows:

Baseline the directories or directory of your website(s) using ShellScan and ShellCSV.
Modify the entropy values, file extensions, srcs, and add any filtering needed in ShellSweep.
Run ShellSweep on a regular basis to identify suspicious files written to disk.

Next we’ll take a look at each utility in more detail.

ShellScan

ShellScan provides the ability to scan multiple directories at once or known bad web shell directories and output the average, median, minimum and maximum entropy values by file extension. These scans and values drive what gets put in ShellSweep later in the process flow.

To generate default entropy values, the Splunk Threat Research Team (STRT) used a large number of known web shells found on GitHub (this is not an exhaustive list of every repository out there). Sources included:

In addition to the sources above, the STRT also obtained the samples related to the recent MOVEit exploitation using a web shell dubbed LEMURLOOT. The STRT also obtained the ProxyShell web shell written to disk and compared its values. Findings showed that the MetaSploit web shell written to disk has a very low entropy due to its padding. This required the STRT to modify ShellSweep to include greater than or less than values.

ShellCSV

ShellCSV is similar to ShellScan, but the idea is to run against a server and collect the files and entropy values on a system already running and output to CSV.

ShellCSV assists with identifying the entropy of good files on disk. The idea is that defenders can run this on web servers to gather all files and entropy values to better understand what srcs and extensions are most prominent in their working environment.

Here is a working example of ShellCSV.

ShellSweep

ShellSweep calculates the entropy of file contents to estimate the likelihood of a file being a web shell, a process based on the principle that high entropy is characteristic of the encrypted or obfuscated code often found in web shells.

The script targets specific file extensions commonly used in web shells and provides options to exclude certain directories or ignore files with specific hashes during the scan. After running the script, it produces output detailing any potential web shells discovered along with their file names, entropy values, and hashes. If no potential web shells are found, the script communicates this with a message.

Furthermore, the script contains an entropy threshold function for various file extensions and calculates the entropy of a given string. It includes predefined directories to scan and exclude as well as file hashes to ignore. If a large number of hashes need to be ignored, a text file containing these can be read directly into the script.

The default entropy values were determined by using ShellCSV and ShellScan against a Windows Exchange Server and known bad web shells. In addition, using the ProxyShell MetaSploit module to test validated ShellSweep needed the ability to track greater than and less than for web shells. ProxyShells web shell was below 1.0 entropy.


$fileExtensions = @{
    '.asp' = @(
        @{ 'operation' = 'lt'; 'value' = 0.805376867704514 },
        @{ 'operation' = 'gt'; 'value' =  5.51268104400858 }
    )
    '.ashx' = @(@{ 'operation' = 'gt'; 'value' =  3.75840459657413 })
    '.asax' = @(@{ 'operation' = 'gt'; 'value' = 3.7288741494524 })
    '.jspx' = @(@{ 'operation' = 'gt'; 'value' = 4.87651397975203 })
    '.html' = @(@{ 'operation' = 'gt'; 'value' = 4.8738392644771 })
    '.aspx' = @(
        @{ 'operation' = 'lt'; 'value' = 0.805376867704514 },
        @{ 'operation' = 'gt'; 'value' =  4.15186444439319 }
    )
    '.php' = @(@{ 'operation' = 'gt'; 'value' =  4.23015141285636 })
    '.jsp' = @(@{ 'operation' = 'gt'; 'value' =   4.40958415652662 })
    '.js' = @(@{ 'operation' = 'gt'; 'value' =  4.25868439013462 })
}

Visually, we can see how the low and high entropy values look and where the expected (known) values reside.

Users can modify these values as needed by removing or changing the entropy based on their unique needs and web server environment(s).

Making the Most of ShellSweep

The effectiveness of any security tool such as ShellSweep largely depends on the specific context in which it is used. This includes the nature of the threats it encounters, the environment in which it operates, and the parameters set by the user.

ShellSweep uses a heuristic-based approach, measuring the entropy (or randomness) of file contents to identify potential web shells. While high entropy can be indicative of obfuscated or encrypted code often found in web shells, there may also be legitimate files with high entropy, leading to potential false positives. Conversely, not all malicious scripts will have high entropy, potentially leading to false negatives.

In terms of feature set, ShellSweep has some clear strengths. It is capable of scanning multiple directories, ignoring certain srcs, and excluding files based on their hash values. These features provide a level of flexibility that can help customize the scan to the specifics of the system environment.

ShellSweep In Your Splunk

If you made it this far, you may be asking yourself “How do I get this data in Splunk?” Great question!

First, find a home for ShellSweep or create a new App. Within the App, two items are required:

inputs.conf
ShellSweep.ps1 Script in the Bin src

To begin, add this to your inputs.conf

[powershell://ShellSweep]
script = . "$SplunkHome\etc\apps\win_inputs_app\bin\ShellSweep.ps1"
disabled = false
sourcetype = shellsweep
schedule = 0 0 * * *
index = win

Modify the script src to your bin src and modify the schedule to your liking. The default schedule is set to daily.

Now, add ShellSweep to the bin src. The latest version is located here - https://github.com/MHaggis/ShellSweep/blob/main/ShellSweep.ps1

By default, ShellSweep will output to JSON allowing for easy ingesting and extraction by Splunk.

Once the inputs and script is set up, restart the universal forwarder and data should begin showing up based on the scheduled time.

In Splunk, that data will generally appear as:

The 4 fields are parsed and now anyone can query the data.

Aside from baselining and updating the entropy values in ShellSweep, another use case may be to capture everything and use the entropy to view them all in Splunk by FilePath.

In Summary

Web shells pose a substantial threat to the security of servers and systems due to their stealthy nature and the ability to be independently controlled by an attacker. The introduction of ShellSweep is a helpful leap forward for detecting these threats.

With its heuristic-based approach utilizing entropy calculations, ShellSweep offers a unique method to hunt for potential web shells lurking on your servers. However, it's important to remember that no security tool is a silver bullet. While ShellSweep is a powerful addition to your defense arsenal, it should be used in conjunction with other security measures and regular audits to ensure the utmost protection against web shells and other cyber threats.

Moreover, the ability to integrate and use ShellSweep with Splunk facilitates data ingestion, making it easier to monitor and analyze the security of your servers in real-time. By taking advantage of Splunk and ShellSweep together, you can quickly and efficiently identify potential threats and take necessary actions.

The Splunk Threat Research Team is excited about the possibilities of ShellSweep and is committed to further refining and expanding its capabilities. We believe it will serve as a valuable tool in the ongoing battle against web shells and we look forward to learning from your feedback and experiences using ShellSweep.

In the end, the fight against web shells and other cybersecurity threats is a collaborative effort.

Sharing tools, knowledge, and experience, allows cybersecurity defenders to work together to create a safer and more secure digital world. ShellSweep is our contribution to this collective effort, and we hope it serves you well in your defense endeavors.

Stay vigilant, stay protected, and happy sweeping!

Michael Haag

Michael Haag is Principal Threat Research Enginer at Splunk. Michael led the development of Atomic Red Team, an open-source testing platform that security teams can use to assess detection coverage. An avid researcher, he is passionate about understanding and evaluating the limits of defensive systems. His background includes security analysis, threat research, and incident handling.

Splunk Threat Research Team

The Splunk Threat Research Team is an active part of a customer’s overall defense strategy by enhancing Splunk security offerings with verified research and security content such as use cases, detection searches, and playbooks. We help security teams around the globe strengthen operations by providing tactical guidance and insights to detect, investigate and respond against the latest threats. The Splunk Threat Research Team focuses on understanding how threats, actors, and vulnerabilities work, and the team replicates attacks which are stored as datasets in the Attack Data repository.

Our goal is to provide security teams with research they can leverage in their day to day operations and to become the industry standard for SIEM detections. We are a team of industry-recognized experts who are encouraged to improve the security industry by sharing our work with the community via conference talks, open-sourcing projects, and writing white papers or blogs. You will also find us presenting our research at conferences such as Defcon, Blackhat, RSA, and many more.

CI/CD Detection Engineering: Failing, Part 3

In part 3 of our now 4-part series, we walk you through how we failed to use CircleCI to continually test detentions!

Security 2 Min Read

Staff Picks for Splunk Security Reading June 2022

Hello, everyone! Welcome to the Splunk staff picks blog. Each month, Splunk security experts curate a list of presentations, whitepapers, and customer case studies that we feel are worth a read. To check out our previous staff security picks, take a peek here. We hope you enjoy.

Security 10 Min Read

Enter The Gates: An Analysis of the DarkGate AutoIt Loader

The Splunk Threat Research Team (STRT) provides a deep dive analysis of the DarkGate malware and its use of AutoIt.

About Splunk

The world’s leading organizations rely on Splunk, a Cisco company, to continuously strengthen digital resilience with our unified security and observability platform, powered by industry-leading AI.

Our customers trust Splunk’s award-winning security and observability solutions to secure and improve the reliability of their complex digital environments, at any scale.

Learn more about Splunk

Subscribe to our blog

Get the latest articles from Splunk straight to your inbox.

Connect with Splunk on X

Follow @Splunk

Connect with Splunk on Instagram