ModSecurity Breach

ModSecurity: Features: Set-Based Matching

Introduction

ModSecurity 2.5 introduces two new operators (@pm and @pmFromFile) which implement set-based pattern matching by using the Aho-Corasick algorithm. Set-based matching can be much faster at matching fixed, plain text key word lists than using a regular expression operator and this is especially true as the word lists grow larger. While there is a tremendous increase in speed, you do lose a certian amount of flexibility that come with regular expression usage. This means that set based matching may not be appropriate for all types of rule writing. For those users who are concerned with performance (meaning trying to limit latency from a legitimate client's perspective) then set-based pattern matching is a great enhancement. The concept is that transactions can first be analyzed by the more faster set-based matching rules to pre-qualify them to determine if any of the individual, more expensive, regular expression rules may match. If there are any matches with the set-based matching rules, the transaction will proceed through the more detailed individual rules. If, on the other hand, a transaction passes the set-based matching pre-qualifier rule(s) then it will be allowed to skip the more expensive individual rules. It is important to note the differences between set based matching and regular expressions so that you understand which circumstances are appropriate to use either one individually or when you might need to use them together.

  1. Set based inspection algorithms that perform high-speed multi-pattern plain text content searches in HTTP payloads.

    • Set based matching operators use fixed text strings to search the transaction content for matches.

    • The text strings used should be the core string which indicates their might be an attack.

  2. Regular expression inspection techniques which allow for complicated logic checks.

    • The regular expression operators are valuable as they allow for the proper complex logic of most security rules.

    • When used in combination with a set based matching operator, they are mainly used to better qualify the core text string used in the set based matching rule

    • Regular expressions can help to reduce both false positives and false negatives.

  3. Combining Set Based and Regular Expression Rule Inpsection - Prequalification

    • There are many scenarios where it is advantageous to combine both the speed of the set based matching and the logic of the regular expression operators.

Set based inspection provides much needed performance boost for inspecting modern high traffic web applications. The use of ModSecurity's standard parameterized inspection with regular expressions allows the rule language to be enhanced without affecting the high performance inspection engine. The combination of these strategies applies the strength of each strategy where it works best, and allows the ModSecurity rule language to remain flexible for future enhancements. ModSecurity's Multi-Rule Inspection Engine is capable of inspecting high speed traffic without a significant impact on latency, while detecting and logging events using very large rule sets.

Set Based Rule Inspection

ModSecurity 2.5 introduces the phrase matching operator (@pm) to match against a list of phrases. The new operator uses the Aho-Corasick algorithm and is many times faster than the default regular expression operator (@rx) for large lists of OR'd phrases. For example, if you want to accept only GET, POST and HEAD requests the following rules are equivalent, but the second is faster (even more so as the list grows):

SecRule REQUEST_METHOD "!^(?:GET|POST|HEAD)$" t:none,deny
SecRule REQUEST_METHOD "!@pm GET POST HEAD" t:none,deny

The new @pm operator should be used for static lists of phrases (black/white lists, spam keywords, etc). However, for large lists, this new operator may cause the rule to become difficult to read and maintain. If your lists are large, you can use an alternate form (@pmFromFile) that accepts a list of files and place the phrases into a file or multiple files (one per line) instead of inline. In this form, the phrase file(s) will be read on startup. To allow for easy inclusion of third party phrase lists, if the filename is given with a relative path, then ModSecurity will look for it starting in the same directory as the file that contains the rule specifying the file. For example:

SecRule REQUEST_METHOD "!@pmFromFile allowed_methods.txt" t:none,deny
### And in allowed_methods.txt in the same directory:
GET
POST
HEAD

Regular Expression Rule Inspection

The most popular mode of inspection used by ModSecurity rule writers is to use the regular expression operator, which sequentially tests against the specified variable list. If any parameter search returns false (either the operator did not match or the variable was not present) the search has failed. The complexity and performance of such a search is generally a function of the number rules (individual SecRule directives), the number of variables searched by each rule, the chosen operator, and the size of the combined memory space of all of the parameters. The strength of a regular expression inspection is that it allows you to have a flexible inspection language, while the cost is linearly proportional with respect to the number of rules. This presents a performance problem for a large number of rules.

Combining Set Based and Parameterized Rule Inpsection - Prequalification

A set based search can be a highly efficient search technique to quickly identified text strings. While the set-based matching is very fast, you will still be missing some logic to be able to validate the attack. It is for this reason that a good approach is to combine set-based matching with regular expression rules by chaining the indivudual rules together. Essentially, the 1st part of the chained rule uses the set-based matching operator to run as a pre-qualifier to very quickly check to see if the transaction data has a high likelihood of matching. If the set-based matching portion matches, then th 2nd part of the chained rule (which uses the standard regular expression strings) is executed. The end result to this configuration is that for normal, non-malicious users, the latency for running all of the ModSecurity inspection rules will be decreased.

Here is an example of a normal regular expression rule from the Core Rule set that is looking for Blind SQL Injection attacks:

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \ "capture,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"

The following updated rule utilizes the set based matching and regular expression combination to prequalify the data:

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "@pm sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_views sys.tab charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_tables sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships msyscolumns msysqueries" \ "chain, ctl:auditLogParts=+E,log,auditlog,msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"
SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \ "capture,t:replaceComments"

Let's now take a look at the rule operator performance measurements to see the time difference for these two rules. This is the contents of the custom rule file that was used for testing (note the line numbers added for display below as they will help to identify the rule processing times in the modsec_debug.log file):

1. SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \
2. "capture,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"
3.
4. SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "@pm sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_views sys.tab charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_tables sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships msyscolumns msysqueries" \
5. "chain, ctl:auditLogParts=+E,log,auditlog,msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"
6.
7. SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer "(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \
8. "capture,t:replaceComments"

The SecRuleEngine was set to DetectionOnly so that both sets of rules would get equal opportunity to inspect the requests. The following test request was then sent to the web server as it will trigger these Blind SQL Injection rules:

http://www.example.com/cgi-bin/foo.cgi?LoginEmail='%20or%201=convert(int,(select%20@@version%2b'/'%2b@@servername%2b'/'%2bdb_name()%2b'/'%2bsys.user_objects))--sp_password

After sending the request and ModSecurity completed inspection, we can now review the modsec_debug.log to identify processing times for all rules.

# cat /usr/local/apache/logs/modsec_debug.log
[20/Jan/2008:00:19:15 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Phase 1: 20 usec
[20/Jan/2008:00:19:15 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Rule 895ae60 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 11 usec
[20/Jan/2008:00:19:27 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Phase 2: 1172 usec
[20/Jan/2008:00:19:27 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Rule 895bf88 [id "950007"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "2"]: 514 usec
[20/Jan/2008:00:19:27 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Rule 896ea30 [id "950007"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "5"]: 338 usec
[20/Jan/2008:00:19:27 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Rule 896fa20 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "8"]: 279 usec
[20/Jan/2008:00:19:30 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Phase 3: 32 usec
[20/Jan/2008:00:19:30 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Rule 895b8a0 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "301"]: 26 usec
[20/Jan/2008:00:19:30 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Phase 4: 0 usec [20/Jan/2008:00:19:30 --0500] [www.example.com/sid#88caf48][rid#89db5b0][/cgi-bin/foo.cgi][1] Phase 5: 0 usec

As you can see from looking at the modsec_debug.log data, line 2 of our configuration (which uses the @rx operator) completed in 514 usec, while line 5 (which uses the @pm operator) look 338 usec. While this may not seem like much of a difference and would not be noticable to a client, the efficiency of the set based matching operator becomes much more noticable when you utilize large rule sets and also want to include very large word list files (which we will describe in the next section).

Please refer to this separate document which explains the process for compile ModSecurity with an addition flag to measure performance statistics and testing steps.

Large Wordlist Example

You will find the greatest benefit of using the set based matching opertors when you have a requirement to look for an extremely large word list in the variable data. A perfect example of this is if you want to search request content for the presence of SPAM keywords or references to known SPAM hosting locations. The GotRoot rule set includes a rule file called blacklist.conf that includes rules that look similar following and has a approximately 7600 individual rules:

SecRule HTTP_Referer|ARGS "best-deals-blackjack\.info"
SecRule HTTP_Referer|ARGS "best-deals-casino\.info"
SecRule HTTP_Referer|ARGS "best-deals-cheap-airline-tickets\.info"
SecRule HTTP_Referer|ARGS "best-deals-diet\.info"
SecRule HTTP_Referer|ARGS "best-deals-flowers\.info"
SecRule HTTP_Referer|ARGS "best-deals-hotels\.info"
SecRule HTTP_Referer|ARGS "best-deals-online-gambling\.info"
SecRule HTTP_Referer|ARGS "best-deals-online-poker\.info"
SecRule HTTP_Referer|ARGS "best-deals-poker\.info"
SecRule HTTP_Referer|ARGS "best-deals-roulette\.info"
SecRule HTTP_Referer|ARGS "best-deals-weight-loss\.info"
SecRule HTTP_Referer|ARGS "bestdims\.com"
SecRule HTTP_Referer|ARGS "bestdvdclubs\.com"
SecRule HTTP_Referer|ARGS "best-e-site\.com"
SecRule HTTP_Referer|ARGS "best-gambling\.biz"
SecRule HTTP_Referer|ARGS "bestgamblinghouseonline\.com"

Let's see the average time that it takes ModSecurity to run through all of these individual rules in phase:2.

# head -3 /usr/local/apache/logs/modsec_debug.log
[20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 1: 18 usec
[20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Rule 918e140 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 10 usec
[20/Jan/2008:02:59:47 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 2: 83751 usec

So, it took 83751 usec to process the ~7600 individual rules. Now, lets run a similar test however this time, we will use the @pmFromFile operator and the input file will have approximately the same number of text lines. Instead of having thousands of individual SecRule lines, I will use this one line:

SecRule REQUEST_HEADERS:Referer|ARGS "@pmFromFile spam_domains.txt"

The spam_domains.txt file contains approximately 6900 lines such as these:

01-beltonen.com
01-klingeltoene.at
01-klingeltoene.de
01-loghi.com
01-logo.com
01-logot.com
01-logotyper.com
01-melodia.com
01-melodias.com
01-ringetone.com

When I run the same test with this new rule that uses the @pmFromFile operator, you can see the dramatic difference in processing time:

# head -4 /usr/local/apache/logs/modsec_debug.log
[20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 1: 20 usec
[20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9202980 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 11 usec
[20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 2: 10 usec
[20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9203890 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "1"]: 6 usec

As you can see, it only took 6 usec to complete the @pmFromFile set based matching operator check! That is a gigantic improvement for overall performance.

Conclusion

Set based pattern matching can increase the overall performance of your ModSecurity rules when used in the proper circumstances. Any situation where you need to inspect a large word list, you should try and leverage these new operators.