An alternative introduction to rspamd configuration: Modules (2/4)

Updated August, 24th, 2018

We already learned that the choice of modules basically determines which tests rspamd executes in analyzing a message. But what are the modules we can choose from? The official documentation has an alphabetical list but let’s instead approach them differently here. If we group them by functionality we can identify the following categories:

  1. Modules acting on the sender or recipient domain/IP
  2. Modules verifying message signatures (against DNS records; they can also add signatures)
  3. Modules comparing message contents (or parts of the contents) against (DNS) blacklists
  4. Modules analyzing messages without DNS interaction
  5. Modules pushing metadata such as sender domains and scores to external databases
  6. Modules that implement actions instead of tests, and
  7. Other modules

I’ll walk over each of these categories now, listing the modules we associate with them. For each module I give a brief description and also indicate if it is built-in (linked into the rspamd binary) or an external Lua module, whether it is enabled in the default configuration and whether it requires additional configuration in order to do any good and finally whether it requires Redis.

Redis is a key-value store that stores and associates keys with values (a giant array or hash table, you could say). Because it’s specialized at this and its store is kept in memory, it’s so fast external applications use it as a simple database or a message broker. It’s also trivial to setup. Many rspamd modules require Redis for temporary and permanent storage.

Last not least I’ll also try to give some personal comments on module selection.

Modules acting on sender or recipient domain/IP

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
asn Looks up Autonomous System Number (ASN), country code and subnets of sender IP address, allowing for ASN/geographic discrimination No Yes No
ip_score Tracks the number of messages received from a given IP, subnet, ASN or country No Yes Yes
mx_check Checks if the sender domain in the message envelope has at least one connectable MX No No Yes
ratelimit Implements a Leaky bucket algorithm to rate-limit messages from certain senders/recipients No Requires config Yes
rbl Checks a message’s sender IP address against Realtime Blackhole Lists (RBLs), reverse DNS names, “Received:” addresses and HELO/EHLO parameters No Yes No
spf Checks the proclaimed sender domain’s Sender Policy Framework (SPF) policy Yes Yes No

In this category you will want to keep the major two modules enabled, rbl and spf. I know blacklisting is controversial to some, I personally don’t think you can do without (Note that some blacklists have usage restrictions). And SPF might have its shortcomings, but there’s no reason not to use it.

As for the other modules, asn looks very useful to me since you can use it to give penalties not so much to ASNs but especially countries from which you receive next to no regular mail but a lot of spam. ip_score looks interesting but also complex to setup. mx_check being disabled by default looks like a warning sign to me and for the time being I’ll do without. Same goes for ratelimit which gives me a bit of a headache as I can’t think of a case yet where I’d want to limit a good guy and the bad guys are not that easily targetable (think botnets).

Modules verifying message signatures against DNS records (can also add signatures)

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
arc Verifies Authenticated Received Chain (ARC) signatures added by mail relays (augments DKIM signatures) No Yes No
dkim Verifies/Adds Domain Keys Identified Mail (DKIM) signatures to validate a mail really comes from the proclaimed domain Yes Yes No
dmarc Looks up DMARC TXT records to determine how to handle messages after DKIM/SPF checks failed No Requires config Yes
mid Suppresses penalties for invalid/missing “Message-Id” headers for DKIM-signed messages from certain configured domains (i.e. to workaround broken setups) No Yes No
whitelist Negates or increases scores for messages from trusted sources based on DKIM/DMARC/SPF properties No Yes No

As with spf you’ll want dkim. arc tries to address one of DKIM’s shortcomings, forwarded mail, so you want that, too. I’m not sure if you want to enable dmarc: I like to keep control over the policies my mail setup implements.

mid looks like a helper if you do know domains with broken Message-Ids and DKIM signing, I personally don’t, so I’m disabling it. whitelist is for those that want to play safe on important messages, but personally I still like to believe that a mail setup should be able to work without having to resort to these mechanisms.

Modules comparing message contents (or parts of the contents) against (DNS) blacklists

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
dcc Performs Distributed Checksum Clearinghouses (DCC) lookups to determine if a message is a bulk email No Requires config No
emails Extracts email addresses and compares them against static maps or DNS blacklists No Requires config No
fuzzy_check Applies fuzzy matching of patterns against messages, trying to catch variants of spammy word phrases Yes Yes No
phishing Attempts to detect phishy URLs in HTML text parts with optional OpenPhish support No Yes No
surbl Extracts URLs and domains from these URLs and compares them against (restricted use) blacklists such as surbl.org, uribl.com, rambler.ru, spamhaus.org and spameatingmonkey.net Yes Yes No
url_reputation Experimental module to extract URLs and assign reputations to their domains No No Yes

dcc has license restrictions and requires an external daemon, which is why I think I can do without it. Same for the emails module since I personally find it easier to focus on other aspects than on the email addresses contained in spam messages. fuzzy_check looks powerful but also complex, you need a separate worker and a separate learning step, so let’s see if we can do without it, too.

The phishing module looks useful, I’d keep it enabled. surbl and url_reputation both deal with URLs in messages, but the latter is (for good reasons) experimental, so I’ll keep it disabled while I leave the former enabled but have to watch it for some time to learn about its impact.

Modules analyzing messages without DNS interaction

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
antivirus Passes the message to external virus scanners such as clamav No Yes, but requires config No
chartable Penalizes conspiciously frequent charset (e.g. latin, chinese) changes within words Yes Yes No
maillist Neutralizes certain rules if a message was sent by mailing list software No Yes No
mime_types Implements MIME type sanity checks and checks archive contents No Yes No
neural_networks Experimental module that implements a neural network for message classification No No Yes
once_received Implements checks for messages with a single “Received:” header No Yes No
regexp Implements regular expression matches against message headers and body Yes Yes No
replies Tracks replies to own messages via the “Message-Id” header No Yes Yes
reputation An experimental generic reputation module that so far has not been documented yet No No No
trie Searches for configurable strings in messages “blazingly fast” using the Aho–Corasick algorithm No Requires config No

I used to run virus scanners on my MX all the time but I can’t recall a single case where they actually prevented a virus mail from arriving, so I’ll not use the antivirus module. I’d never have guessed that chartable could make an impact but apparently it does, plus it’s cheap, so I’ll leave it enabled. maillist sounds like a wise choice, so I’ll leave it enabled as well. Unlike Apache where you’d mostly care about MIME types to define custom ones, the mime_types does merely consistency checks, you want that.

neural_networks sounds like “the hottest shit”, of course, but I don’t need the “hottest shit” adding more complexity, so I’ll try to live without it. once_received might have a small but positve impact and comes cheap, keep it as well. Same for regexp which implements essential core functionality.

replies can help ensure that mail threads don’t get teared apart due to rspamd filtering too heavy, so keep it but probably increase the default expiry time since replies don’t necessary happen within 24 hours. reputation is not only experimental but most of all undocumented, so keep it disabled.

trie sounds totally superdope except that I only understand half of it which tells me I probably don’t need it.

Modules pushing metadata such as sender domains and scores to external databases

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
clickhouse Pushes metadata to a ClickHouse instance No Requires config No
elastic Pushes metadata to a ElasticSearch instance No Requires config No
history_redis Stores history data in a Redis database with optional compression and cluster support No Yes Yes
metadata_exporter Exports metadata via Redis Pub/Sub channels, HTTP POST URLs and SMTP messages No Requires config No
metric_exporter Exports metrics to an external monitoring/graphing system (currently only Graphite) No Requires config No

All but one of these modules are relevant if you want to perform advanced analysis of rspamd operations, so in our case we can keep them disabled. history_redis could come handy, but unfortunately the docs say few about the differences in practice if you disable or enable it.

Modules that implement actions instead of tests

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
dkim_signing Not really an action because it’s not triggered by a score, this is intended to be a simple alternative to the dkim module’s sign_condition No Requires config No
force_actions Forces an action if particular symbols are found or not found (independently from a message’s total spaminess score) No Requires config No
greylisting Implements greylisting (temporarily refusing messages so a legitimate sender has to retry later) as an action No Yes Yes
milter_headers Adds/removes message headers such as “x-spamd-bar” and “authentication-results” No Requires config No

You will want dkim_signing unless your outgoing mail does not pass rspamd. I haven’t found a use case for force_actions yet. We do enable greylisting, of course, and milter_headers is also handy so we can inspect mail headers to see test results.

Other modules

This category consists of the following modules:

Name Description Built-in? Enabled by default? Requires Redis?
bayes_expiry Provides expiration of Bayes tokens No Requires config Yes
multimap Handles maps dynamically reloaded from files and URLs No Yes No
rspamd_update Loads new rspamd rules, symbol scores and actions without full daemon restart No Yes No
spamassassin Imports SpamAssassin rules No Requires config No
url_redirector A helper module for the surbl module to dereference redirects in URLs No Requires config Yes
url_tags Experimental module that caches URL tags (whatever that is) in Redis No No Yes

bayes_expiry confuses me yet as bayes operation itself is not part of this module and also does not seem to be implemented by any other module but inside rspamd itself. Not sure if we really need the expiry part. Keep multimap and rspamd_update enabled, they are useful helper modules. spamassassin is only needed if you really want to migrate an existing SpamAssassin setup, which I’d try to avoid.

url_redirector is used by surbl, so keep it. Last not least url_tags being experimental means we probably don’t need it.


Blog post series index: