MAY 15, 2025

Excel(ent) Obfuscation: Regex Gone Rogue

Join Ido Kringel and the Deep Instinct Threat Research Team in this deep dive into a recently discovered, Office-based regex evasion technique

Microsoft Office-based attacks have long been a favored tactic amongst cybercriminals— and for good reason. Attackers frequently use Office documents in cyberattacks because they are widely trusted. These files, such as Word or Excel docs, are commonly exchanged in business and personal settings. They are also capable of carrying hidden malicious code, embedded macros, and external links that execute code when opened, especially if users are tricked into enabling features like macros.

Moreover, Office documents support advanced techniques like remote template injection, obfuscated macros, and legacy features like Excel 4.0 macros. These allow attackers to bypass antivirus detection and trigger multi-stage payloads such as ransomware or information-stealing malware.

Since Office files are familiar to users and often appear legitimate (e.g., invoices, resumes, or reports), they’re also highly effective tools in phishing and social engineering attacks.

This mixture of social credit and advanced attack characteristics unique to Office files, as well as compatibility across platforms and integration with scripting languages, makes them ideal for initiating sophisticated attacks with minimal user suspicion.

New Excel Regex Functions

Last year, Microsoft announced the availability of three new functions that use Regular Expressions (regex) to help parse text more easily:

Capture_(1).PNG
Figure 1: New Regex functions

Regex are sequences of characters that define search patterns, primarily used for string matching and manipulation. They enable efficient text processing by allowing complex searches, replacements, and validations based on specific criteria.

For example, regex can identify email addresses, phone numbers, or specific word patterns within a text. They are widely used in programming languages like Python, JavaScript, and Perl, and are essential for tasks such as data validation, parsing, and text editing.

The example below demonstrates a practical application, using REGEXTRACT to isolate only names from a mixed-text column:

regex.PNG
Figure 2: Legitimate use of REGEXTRACT function

Proof of Concept: Weaponizing Regex Functions

To demonstrate the security implications of these new Excel functions, we developed a proof of concept that leverages regex functions as an obfuscation technique. Our experiment began by establishing a baseline attack scenario using traditional methods.

First, we created a standard macro-enabled Excel document (XLSM) containing unobfuscated VBA code. This macro uses the "WScript.Shell" object to execute PowerShell commands, which in turn downloads and runs a batch file hosted on Pastebin.

Screenshot_2025-05-13_at_7.17.33_PM.png
Figure 3: Tested attack flow

The macro below demonstrates the core functionality— a simple downloader that can retrieve and execute arbitrary payloads:

sample1_vba.PNG
Figure 4: Simple plain-text VBA Downloader

When submitted to VirusTotal, this plain-text sample triggered significant alerts, with 22 different security vendors flagging it as malicious:

sample1_vt.PNG
Figure 5: VirusTotal result for the plain-text sample

Threat actors typically employ various obfuscation techniques to mask malicious code and evade widespread detection. To demonstrate this technique, we applied the Macro-pack obfuscation tool to our test document, resulting in VBA code that becomes deliberately challenging for both human analysts and automated security tools to interpret.

Selection_100.png
Figure 6: Macro-pack VBA snippet

When analyzed with VirusTotal, this traditionally obfuscated sample triggered more detections than the plain-text version. This increased detection rate is expected, as security vendors have developed specific heuristics to identify common obfuscation patterns:

sample1_macro_pack_vt.PNG
Figure 7: VirusTotal result for Macro-pack-obfuscated sample

Next, we created another document, but this time we used the Excel REGEXEXTRACT function to obfuscate the VBA code.

Unlike traditional VBA obfuscation methods, this approach stores and dynamically reconstructs malicious code components using regular expression pattern matching, creating a significantly more evasive payload.

Our first step was to add a large text to cell “A1” and hide our PowerShell command and any other strings in the text as follows:

obfuscated_string.PNG
Figure 8: Simple obfuscation of "WScript.Shell"

Then, we created a function that uses REGEXEXTRACT to retrieve these hidden strings from the text. Combined with the REPLACE function, this allows dynamic reconstruction of the payload at runtime:

sample1_re_vba.PNG
Figure 9: Macro1 calls getval function to return the hidden value from cell A1

The implementation extracts each component using tailored regex patterns and assigns them to intentionally obscured variable names (getval0-2), making static analysis challenging. When executed, the macro seamlessly reconstructs and runs the PowerShell command that downloads and executes our remote batch file.

The evasion effectiveness was remarkable— VirusTotal detection dropped from 22 vendors with the plaintext sample to just two with our regex-obfuscated version:

sample1_re_vt.PNG
Figure 10: VirusTotal detections for our specially crafted XLSM

We’ve also analyzed both samples using OLEVBA, a specialized tool for VBA macro analysis that’s widely used in security operations. While OLEVBA easily identified high-risk indicators in our original sample (including PowerShell usage, Shell object creation, and suspicious string operations), it failed to detect any of these indicators in our regex-obfuscated version. The tool couldn’t identify critical indicators like PowerShell execution or WScript.Shell object instantiation because these strings never appear directly in the code— they’re dynamically constructed at runtime from regex pattern matches.

This demonstrates why this technique is particularly concerning: it defeats not just signature-based detection, but also many heuristic analysis methods that security tools rely on.

Selection_097.png

Selection_098.png
Figure(s) 11, 12: OLEVBA output for the original sample (above) vs. our crafted sample (below)

Current Limitations & Deployment Status

While this technique demonstrates significant potential for security evasion, several factors currently limit its immediate threat:

  • Microsoft has disabled VBA macro execution by default since 2022, requiring explicit user action to enable macros in downloaded documents
  • The new regex functions have limited deployment, currently available only to Beta Channel users on:
    • Windows: Version 2406 (Build 17715.20000) or later
    • Mac: Version 16.86 (Build 24051422) or later

As these functions roll out to the general release channels, the potential attack surface will expand significantly.

Prevention

At the time of writing, we have not observed this technique being used in the wild. And while most legacy antivirus tools fail to detect regex-obfuscated malicious files, Deep Instinct’s deep-learning agent detects and prevents all three files presented in this article. Additionally, Deep Instinct’s Artificial Neural Network Assistant (DIANNA) can easily detect the use of regex obfuscation in documents.

image_(1).png
Figure 13: DIANNA analysis

Organizations, with or without Deep Instinct, should also implement the following protective measures:

  • Maintain strict macro security policies, especially “Block macros from running in Office files from the Internet”
  • Deploy advanced endpoint protection with behavioral analysis capabilities
  • Consider application control solutions that restrict Excel’s ability to invoke system commands
  • Implement network monitoring to detect unusual outbound connections from Office applications
Future Use

The regex-based obfuscation technique demonstrated here represents just the beginning of potential exploitation. While our proof of concept used relatively simple VBA code, this approach could easily be combined with more sophisticated attack techniques:

  • Multi-stage execution chains that further obscure malicious intent
  • Advanced persistence mechanisms to maintain access after initial compromise
  • Privilege escalation techniques hidden behind regex-extracted components
  • Data exfiltration methods that leverage the same obfuscation principles

Additionally, Microsoft’s introduction of Python functionality in Excel creates another potential avenue for attack. While this feature runs calculations in Microsoft’s cloud environment and has inherent latency limitations, it introduces yet another powerful scripting language into the Office ecosystem that determined threat actors could weaponize.

Want to prevent threats in your environment? Request your free scan.

Indicators of Compromise

sample1_re_new.xlsm - dedbe856891dd633ce3dd66ecc120ef4f1ae0a61a37dbb4cc6a59f7eae7019d9
sample1.xlsm - 2c99e702609d549440952ef72f2386a74e0da1462df65ab4206f44c94e8dbc72
sample1_mp.xlsm  - 5af1bd3d95e6307d95e9973aa4a084ae210f9038cbea2235d14b02d97abd4f2b

References

https://github.com/sevagas/macro_pack
https://techcommunity.microsoft.com/blog/microsoft365insiderblog/new-regular-expression-regex-functions-in-excel/4226334