An alternative introduction to rspamd configuration: Introduction (1/4)

rspamd is a mighty spam filtering solution but it can be hard to get a grip on its configuration. For this reason I’m starting a small blog posts series to write down a few things I have learned, current for rspamd version 1.7.x.

When I learn about a new piece of software as an Admin, there are two important questions that I like to be able to answer at least roughly before I can say “I know the software”:

  1. What is there to configure?” and
  2. How do I configure things?”

If we take the Apache webserver as an example, the question for the what could be answered with “Listening sockets, Virtual hosts, Directory permissions, Authentication configurations” and a whole lot of other things. The question for the how could be (partially) answered with “Files beneath /etc/apache2. You can use a single monolithic httpd.conf and put everything in there, but most distributions use a setup where aspects are sourced out into individual files such as listen.conf.

For rspamd, the question of the what there is to configure can be answered as follows:

  1. which modules to activate. Modules are units of code that analyze messages and implement certain tests or actions that act upon messages. Each test is associated with a symbol, i.e. a short string (by convention in uppercase), that will be included in the report explaining a message’s total “spaminess” rating.

    Example: There is a test that compares the sender address in a message’s envelope with the one in the message’s “From:” header. If these differ, the sender address is assumed to be forged and the symbol FORGED_SENDER is set.

    rspamd features a core set of six modules that are written in C and compiled into the rspamd binaries. The majority of the modules, however, is written in Lua, a scripting language especially suited for extending programs. We will discuss these modules in depth later.

    By the way, there is one exception where functionality appears not to have been put into a module but into rspamd itself: the bayes classifier.

  2. module-specific configuration such as feature toggles and paths.

    Example: the antivirus module needs to know which virus scanners it should use and how it can use them.

  3. the weights or scores. A weight (or score) is a floating point value configured for a symbol that will be added to (or subtracted from) a message’s total spaminess rating if the associated test evaluates to be true.

    Example: In the stock rspamd configuration, the symbol FORGED_SENDER has a weight of 0.30 assigned. This means that if a message has different sender addresses in mail envelope and “From:” header, this fact alone contributes a penalty of 0.30 to the message’s “spaminess” rating.

  4. the actions to perform when certain scores are reached.

    Example: The stock configuration applies greylisting to a message if a score of 4.0 is reached, adds X-Spam-* headers to the message at a score of 6.0 and rejects receiving the message at all at a score of 15.0.

  5. the workers to use and their configuration. rspamd operation is split into separate processes cooperating together, the workers. Each worker needs to be configured to listen on a different socket, e.g. a different TCP port.

  6. last not least general options such as e.g. timeouts for DNS operations.

So while the last points are more or less technical details, you should have learned by now that on the one hand the choice of modules and on the other hand the choice of scores is key. rspamd ships with a default config that has almost all modules enabled but only some of them have actually been assigned a non-zero score, i.e. only some have an effect. rspamd also helps you by providing the ability to update the default config, or “rule set”, dynamically from upstream rspamd servers without having to restart or even upgrade rspamd itself, thereby providing the ability to benefit from collectively improved rules as known from SpamAssassin.

Yet as a “real” admin you’ll probably want to know the fussy details and that’s what this blog post series will focus on.


Blog post series index: