An alternative introduction to rspamd configuration: Modules (2/4)

Updated August, 24th, 2018

We already learned that the choice of modules basically determines which tests rspamd executes in analyzing a message. But what are the modules we can choose from? The official documentation has an alphabetical list but let’s instead approach them differently here. If we group them by functionality we can identify the following categories:

Modules acting on the sender or recipient domain/IP
Modules verifying message signatures (against DNS records; they can also add signatures)
Modules comparing message contents (or parts of the contents) against (DNS) blacklists
Modules analyzing messages without DNS interaction
Modules pushing metadata such as sender domains and scores to external databases
Modules that implement actions instead of tests, and
Other modules

I’ll walk over each of these categories now, listing the modules we associate with them. For each module I give a brief description and also indicate if it is built-in (linked into the rspamd binary) or an external Lua module, whether it is enabled in the default configuration and whether it requires additional configuration in order to do any good and finally whether it requires Redis.

Redis is a key-value store that stores and associates keys with values (a giant array or hash table, you could say). Because it’s specialized at this and its store is kept in memory, it’s so fast external applications use it as a simple database or a message broker. It’s also trivial to setup. Many rspamd modules require Redis for temporary and permanent storage.

Last not least I’ll also try to give some personal comments on module selection.

Modules acting on sender or recipient domain/IP

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
asn	Looks up Autonomous System Number (ASN), country code and subnets of sender IP address, allowing for ASN/geographic discrimination	No	Yes	No
ip_score	Tracks the number of messages received from a given IP, subnet, ASN or country	No	Yes	Yes
mx_check	Checks if the sender domain in the message envelope has at least one connectable MX	No	No	Yes
ratelimit	Implements a Leaky bucket algorithm to rate-limit messages from certain senders/recipients	No	Requires config	Yes
rbl	Checks a message’s sender IP address against Realtime Blackhole Lists (RBLs), reverse DNS names, “Received:” addresses and HELO/EHLO parameters	No	Yes	No
spf	Checks the proclaimed sender domain’s Sender Policy Framework (SPF) policy	Yes	Yes	No

In this category you will want to keep the major two modules enabled, rbl and spf. I know blacklisting is controversial to some, I personally don’t think you can do without (Note that some blacklists have usage restrictions). And SPF might have its shortcomings, but there’s no reason not to use it.

As for the other modules, asn looks very useful to me since you can use it to give penalties not so much to ASNs but especially countries from which you receive next to no regular mail but a lot of spam. ip_score looks interesting but also complex to setup. mx_check being disabled by default looks like a warning sign to me and for the time being I’ll do without. Same goes for ratelimit which gives me a bit of a headache as I can’t think of a case yet where I’d want to limit a good guy and the bad guys are not that easily targetable (think botnets).

Modules verifying message signatures against DNS records (can also add signatures)

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
arc	Verifies Authenticated Received Chain (ARC) signatures added by mail relays (augments DKIM signatures)	No	Yes	No
dkim	Verifies/Adds Domain Keys Identified Mail (DKIM) signatures to validate a mail really comes from the proclaimed domain	Yes	Yes	No
dmarc	Looks up DMARC TXT records to determine how to handle messages after DKIM/SPF checks failed	No	Requires config	Yes
mid	Suppresses penalties for invalid/missing “Message-Id” headers for DKIM-signed messages from certain configured domains (i.e. to workaround broken setups)	No	Yes	No
whitelist	Negates or increases scores for messages from trusted sources based on DKIM/DMARC/SPF properties	No	Yes	No

As with spf you’ll want dkim. arc tries to address one of DKIM’s shortcomings, forwarded mail, so you want that, too. I’m not sure if you want to enable dmarc: I like to keep control over the policies my mail setup implements.

mid looks like a helper if you do know domains with broken Message-Ids and DKIM signing, I personally don’t, so I’m disabling it. whitelist is for those that want to play safe on important messages, but personally I still like to believe that a mail setup should be able to work without having to resort to these mechanisms.

Modules comparing message contents (or parts of the contents) against (DNS) blacklists

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
dcc	Performs Distributed Checksum Clearinghouses (DCC) lookups to determine if a message is a bulk email	No	Requires config	No
emails	Extracts email addresses and compares them against static maps or DNS blacklists	No	Requires config	No
fuzzy_check	Applies fuzzy matching of patterns against messages, trying to catch variants of spammy word phrases	Yes	Yes	No
phishing	Attempts to detect phishy URLs in HTML text parts with optional OpenPhish support	No	Yes	No
surbl	Extracts URLs and domains from these URLs and compares them against (restricted use) blacklists such as surbl.org, uribl.com, rambler.ru, spamhaus.org and spameatingmonkey.net	Yes	Yes	No
url_reputation	Experimental module to extract URLs and assign reputations to their domains	No	No	Yes

dcc has license restrictions and requires an external daemon, which is why I think I can do without it. Same for the emails module since I personally find it easier to focus on other aspects than on the email addresses contained in spam messages. fuzzy_check looks powerful but also complex, you need a separate worker and a separate learning step, so let’s see if we can do without it, too.

The phishing module looks useful, I’d keep it enabled. surbl and url_reputation both deal with URLs in messages, but the latter is (for good reasons) experimental, so I’ll keep it disabled while I leave the former enabled but have to watch it for some time to learn about its impact.

Modules analyzing messages without DNS interaction

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
antivirus	Passes the message to external virus scanners such as clamav	No	Yes, but requires config	No
chartable	Penalizes conspiciously frequent charset (e.g. latin, chinese) changes within words	Yes	Yes	No
maillist	Neutralizes certain rules if a message was sent by mailing list software	No	Yes	No
mime_types	Implements MIME type sanity checks and checks archive contents	No	Yes	No
neural_networks	Experimental module that implements a neural network for message classification	No	No	Yes
once_received	Implements checks for messages with a single “Received:” header	No	Yes	No
regexp	Implements regular expression matches against message headers and body	Yes	Yes	No
replies	Tracks replies to own messages via the “Message-Id” header	No	Yes	Yes
reputation	An experimental generic reputation module that so far has not been documented yet	No	No	No
trie	Searches for configurable strings in messages “blazingly fast” using the Aho–Corasick algorithm	No	Requires config	No

I used to run virus scanners on my MX all the time but I can’t recall a single case where they actually prevented a virus mail from arriving, so I’ll not use the antivirus module. I’d never have guessed that chartable could make an impact but apparently it does, plus it’s cheap, so I’ll leave it enabled. maillist sounds like a wise choice, so I’ll leave it enabled as well. Unlike Apache where you’d mostly care about MIME types to define custom ones, the mime_types does merely consistency checks, you want that.

neural_networks sounds like “the hottest shit”, of course, but I don’t need the “hottest shit” adding more complexity, so I’ll try to live without it. once_received might have a small but positve impact and comes cheap, keep it as well. Same for regexp which implements essential core functionality.

replies can help ensure that mail threads don’t get teared apart due to rspamd filtering too heavy, so keep it but probably increase the default expiry time since replies don’t necessary happen within 24 hours. reputation is not only experimental but most of all undocumented, so keep it disabled.

trie sounds totally superdope except that I only understand half of it which tells me I probably don’t need it.

Modules pushing metadata such as sender domains and scores to external databases

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
clickhouse	Pushes metadata to a ClickHouse instance	No	Requires config	No
elastic	Pushes metadata to a ElasticSearch instance	No	Requires config	No
history_redis	Stores history data in a Redis database with optional compression and cluster support	No	Yes	Yes
metadata_exporter	Exports metadata via Redis Pub/Sub channels, HTTP POST URLs and SMTP messages	No	Requires config	No
metric_exporter	Exports metrics to an external monitoring/graphing system (currently only Graphite)	No	Requires config	No

All but one of these modules are relevant if you want to perform advanced analysis of rspamd operations, so in our case we can keep them disabled. history_redis could come handy, but unfortunately the docs say few about the differences in practice if you disable or enable it.

Modules that implement actions instead of tests

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
dkim_signing	Not really an action because it’s not triggered by a score, this is intended to be a simple alternative to the dkim module’s `sign_condition`	No	Requires config	No
force_actions	Forces an action if particular symbols are found or not found (independently from a message’s total spaminess score)	No	Requires config	No
greylisting	Implements greylisting (temporarily refusing messages so a legitimate sender has to retry later) as an action	No	Yes	Yes
milter_headers	Adds/removes message headers such as “x-spamd-bar” and “authentication-results”	No	Requires config	No

You will want dkim_signing unless your outgoing mail does not pass rspamd. I haven’t found a use case for force_actions yet. We do enable greylisting, of course, and milter_headers is also handy so we can inspect mail headers to see test results.

Other modules

This category consists of the following modules:

Name	Description	Built-in?	Enabled by default?	Requires Redis?
bayes_expiry	Provides expiration of Bayes tokens	No	Requires config	Yes
multimap	Handles maps dynamically reloaded from files and URLs	No	Yes	No
rspamd_update	Loads new rspamd rules, symbol scores and actions without full daemon restart	No	Yes	No
spamassassin	Imports SpamAssassin rules	No	Requires config	No
url_redirector	A helper module for the surbl module to dereference redirects in URLs	No	Requires config	Yes
url_tags	Experimental module that caches URL tags (whatever that is) in Redis	No	No	Yes

bayes_expiry confuses me yet as bayes operation itself is not part of this module and also does not seem to be implemented by any other module but inside rspamd itself. Not sure if we really need the expiry part. Keep multimap and rspamd_update enabled, they are useful helper modules. spamassassin is only needed if you really want to migrate an existing SpamAssassin setup, which I’d try to avoid.

url_redirector is used by surbl, so keep it. Last not least url_tags being experimental means we probably don’t need it.

Blog post series index: