Updated August, 24th, 2018
We already learned that the choice of modules basically determines which tests rspamd executes in analyzing a message. But what are the modules we can choose from? The official documentation has an alphabetical list but let’s instead approach them differently here. If we group them by functionality we can identify the following categories:
- Modules acting on the sender or recipient domain/IP
- Modules verifying message signatures (against DNS records; they can also add signatures)
- Modules comparing message contents (or parts of the contents) against (DNS) blacklists
- Modules analyzing messages without DNS interaction
- Modules pushing metadata such as sender domains and scores to external databases
- Modules that implement actions instead of tests, and
- Other modules
I’ll walk over each of these categories now, listing the modules we associate with them. For each module I give a brief description and also indicate if it is built-in (linked into the rspamd binary) or an external Lua module, whether it is enabled in the default configuration and whether it requires additional configuration in order to do any good and finally whether it requires Redis.
Redis is a key-value store that stores and associates keys with values (a giant array or hash table, you could say). Because it’s specialized at this and its store is kept in memory, it’s so fast external applications use it as a simple database or a message broker. It’s also trivial to setup. Many rspamd modules require Redis for temporary and permanent storage.
Last not least I’ll also try to give some personal comments on module selection.
Modules acting on sender or recipient domain/IP
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
asn | Looks up Autonomous System Number (ASN), country code and subnets of sender IP address, allowing for ASN/geographic discrimination | No | Yes | No |
ip_score | Tracks the number of messages received from a given IP, subnet, ASN or country | No | Yes | Yes |
mx_check | Checks if the sender domain in the message envelope has at least one connectable MX | No | No | Yes |
ratelimit | Implements a Leaky bucket algorithm to rate-limit messages from certain senders/recipients | No | Requires config | Yes |
rbl | Checks a message’s sender IP address against Realtime Blackhole Lists (RBLs), reverse DNS names, “Received:” addresses and HELO/EHLO parameters | No | Yes | No |
spf | Checks the proclaimed sender domain’s Sender Policy Framework (SPF) policy | Yes | Yes | No |
In this category you will want to keep the major two modules enabled, rbl and spf. I know blacklisting is controversial to some, I personally don’t think you can do without (Note that some blacklists have usage restrictions). And SPF might have its shortcomings, but there’s no reason not to use it.
As for the other modules, asn looks very useful to me since you can use it to give penalties not so much to ASNs but especially countries from which you receive next to no regular mail but a lot of spam. ip_score looks interesting but also complex to setup. mx_check being disabled by default looks like a warning sign to me and for the time being I’ll do without. Same goes for ratelimit which gives me a bit of a headache as I can’t think of a case yet where I’d want to limit a good guy and the bad guys are not that easily targetable (think botnets).
Modules verifying message signatures against DNS records (can also add signatures)
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
arc | Verifies Authenticated Received Chain (ARC) signatures added by mail relays (augments DKIM signatures) | No | Yes | No |
dkim | Verifies/Adds Domain Keys Identified Mail (DKIM) signatures to validate a mail really comes from the proclaimed domain | Yes | Yes | No |
dmarc | Looks up DMARC TXT records to determine how to handle messages after DKIM/SPF checks failed | No | Requires config | Yes |
mid | Suppresses penalties for invalid/missing “Message-Id” headers for DKIM-signed messages from certain configured domains (i.e. to workaround broken setups) | No | Yes | No |
whitelist | Negates or increases scores for messages from trusted sources based on DKIM/DMARC/SPF properties | No | Yes | No |
As with spf you’ll want dkim. arc tries to address one of DKIM’s shortcomings, forwarded mail, so you want that, too. I’m not sure if you want to enable dmarc: I like to keep control over the policies my mail setup implements.
mid looks like a helper if you do know domains with broken Message-Ids and DKIM signing, I personally don’t, so I’m disabling it. whitelist is for those that want to play safe on important messages, but personally I still like to believe that a mail setup should be able to work without having to resort to these mechanisms.
Modules comparing message contents (or parts of the contents) against (DNS) blacklists
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
dcc | Performs Distributed Checksum Clearinghouses (DCC) lookups to determine if a message is a bulk email | No | Requires config | No |
emails | Extracts email addresses and compares them against static maps or DNS blacklists | No | Requires config | No |
fuzzy_check | Applies fuzzy matching of patterns against messages, trying to catch variants of spammy word phrases | Yes | Yes | No |
phishing | Attempts to detect phishy URLs in HTML text parts with optional OpenPhish support | No | Yes | No |
surbl | Extracts URLs and domains from these URLs and compares them against (restricted use) blacklists such as surbl.org, uribl.com, rambler.ru, spamhaus.org and spameatingmonkey.net | Yes | Yes | No |
url_reputation | Experimental module to extract URLs and assign reputations to their domains | No | No | Yes |
dcc has license restrictions and requires an external daemon, which is why I think I can do without it. Same for the emails module since I personally find it easier to focus on other aspects than on the email addresses contained in spam messages. fuzzy_check looks powerful but also complex, you need a separate worker and a separate learning step, so let’s see if we can do without it, too.
The phishing module looks useful, I’d keep it enabled. surbl and url_reputation both deal with URLs in messages, but the latter is (for good reasons) experimental, so I’ll keep it disabled while I leave the former enabled but have to watch it for some time to learn about its impact.
Modules analyzing messages without DNS interaction
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
antivirus | Passes the message to external virus scanners such as clamav | No | Yes, but requires config | No |
chartable | Penalizes conspiciously frequent charset (e.g. latin, chinese) changes within words | Yes | Yes | No |
maillist | Neutralizes certain rules if a message was sent by mailing list software | No | Yes | No |
mime_types | Implements MIME type sanity checks and checks archive contents | No | Yes | No |
neural_networks | Experimental module that implements a neural network for message classification | No | No | Yes |
once_received | Implements checks for messages with a single “Received:” header | No | Yes | No |
regexp | Implements regular expression matches against message headers and body | Yes | Yes | No |
replies | Tracks replies to own messages via the “Message-Id” header | No | Yes | Yes |
reputation | An experimental generic reputation module that so far has not been documented yet | No | No | No |
trie | Searches for configurable strings in messages “blazingly fast” using the Aho–Corasick algorithm | No | Requires config | No |
I used to run virus scanners on my MX all the time but I can’t recall a single case where they actually prevented a virus mail from arriving, so I’ll not use the antivirus module. I’d never have guessed that chartable could make an impact but apparently it does, plus it’s cheap, so I’ll leave it enabled. maillist sounds like a wise choice, so I’ll leave it enabled as well. Unlike Apache where you’d mostly care about MIME types to define custom ones, the mime_types does merely consistency checks, you want that.
neural_networks sounds like “the hottest shit”, of course, but I don’t need the “hottest shit” adding more complexity, so I’ll try to live without it. once_received might have a small but positve impact and comes cheap, keep it as well. Same for regexp which implements essential core functionality.
replies can help ensure that mail threads don’t get teared apart due to rspamd filtering too heavy, so keep it but probably increase the default expiry time since replies don’t necessary happen within 24 hours. reputation is not only experimental but most of all undocumented, so keep it disabled.
trie sounds totally superdope except that I only understand half of it which tells me I probably don’t need it.
Modules pushing metadata such as sender domains and scores to external databases
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
clickhouse | Pushes metadata to a ClickHouse instance | No | Requires config | No |
elastic | Pushes metadata to a ElasticSearch instance | No | Requires config | No |
history_redis | Stores history data in a Redis database with optional compression and cluster support | No | Yes | Yes |
metadata_exporter | Exports metadata via Redis Pub/Sub channels, HTTP POST URLs and SMTP messages | No | Requires config | No |
metric_exporter | Exports metrics to an external monitoring/graphing system (currently only Graphite) | No | Requires config | No |
All but one of these modules are relevant if you want to perform advanced analysis of rspamd operations, so in our case we can keep them disabled. history_redis could come handy, but unfortunately the docs say few about the differences in practice if you disable or enable it.
Modules that implement actions instead of tests
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
dkim_signing | Not really an action because it’s not triggered by a score, this is intended to be a simple alternative to the dkim module’s sign_condition | No | Requires config | No |
force_actions | Forces an action if particular symbols are found or not found (independently from a message’s total spaminess score) | No | Requires config | No |
greylisting | Implements greylisting (temporarily refusing messages so a legitimate sender has to retry later) as an action | No | Yes | Yes |
milter_headers | Adds/removes message headers such as “x-spamd-bar” and “authentication-results” | No | Requires config | No |
You will want dkim_signing unless your outgoing mail does not pass rspamd. I haven’t found a use case for force_actions yet. We do enable greylisting, of course, and milter_headers is also handy so we can inspect mail headers to see test results.
Other modules
This category consists of the following modules:
Name | Description | Built-in? | Enabled by default? | Requires Redis? |
bayes_expiry | Provides expiration of Bayes tokens | No | Requires config | Yes |
multimap | Handles maps dynamically reloaded from files and URLs | No | Yes | No |
rspamd_update | Loads new rspamd rules, symbol scores and actions without full daemon restart | No | Yes | No |
spamassassin | Imports SpamAssassin rules | No | Requires config | No |
url_redirector | A helper module for the surbl module to dereference redirects in URLs | No | Requires config | Yes |
url_tags | Experimental module that caches URL tags (whatever that is) in Redis | No | No | Yes |
bayes_expiry confuses me yet as bayes operation itself is not part of this module and also does not seem to be implemented by any other module but inside rspamd itself. Not sure if we really need the expiry part. Keep multimap and rspamd_update enabled, they are useful helper modules. spamassassin is only needed if you really want to migrate an existing SpamAssassin setup, which I’d try to avoid.
url_redirector is used by surbl, so keep it. Last not least url_tags being experimental means we probably don’t need it.
Blog post series index:
- Part 1: Introduction
- Part 2: Modules
- Part 3: Scores
- Part 4: Configuration file structure