Protect Mastodon against AI bots with Anubis
Cet article en anglais est également disponible en français ici.
Introduction
In a previous article in French, I explained my motivations and choices to protect Mastodon (and other Fediverse applications) against AI bots. This new article, targeting administrators, details the technical steps to achieve that; it is also available in French here.
For reference, my objectives are to:
- Protect our users against content systematic bulk-scraping;
- Protect our servers against massive AI bot attacks;
- Maintain public visibility of our servers, as it is to me a core purpose of social networks, each user retaining control over the visibility of published content;
- Not use a dedicated online service such as CloudFlare, as one of my values is to protect our users’ privacy, and I consider that depending on an external solution which has full traffic visibility, including credentials in clear text and browsing content, is not acceptable;
- Go beyond the “standard” filtering, thus targeting the AI bots situation whatever their IP address and however they identify as human users (meaning they pretend to use a browser).
I therefore chose to install Anubis, an open-source application (available here under the MIT license) which I self-host so as to be autonomous and to guarantee security and confidentiality of browsing.
Challenges ahead
The official Anubis website is well designed, and details how to install the application and protect a standard website. However, for Mastodon and more generally for Fediverse applications, I made some choices and had to solve some specific pitfalls, namely:
- One common Anubis instance to protect all applications hosted on the same server;
- Configured using subrequest authentication;
- Return a
200code and a fake page to rejected bots (rather than a403code); - Deal with streaming API and potentially legitimate IP address changes;
- Deal with application authentication using OAuth;
- Deal with embedded media;
- Operate in failsafe mode (if Anubis is down, services remain operational);
- Metrics showing rejection rates of clients.
I will describe all these aspects, using a linux Ubuntu 24.04LTS server with Nginx and a native installation (no Docker). I let you adapt these instructions to your own configuration.
Installing Anubis
Please refer to the official site’s documentation for installing Anubis on your server. We will be using the subrequest authentication configuration.
In that specific mode, Nginx remains internet-facing and will query Anubis to authorize the request: the latter will return 200 if the client is authorized or already has an authorization cookie, 403 if it is rejected and 401 if it requires verification; in that last case, the request is redirected (307) to the Anubis page and must fulfill the challenge to obtain an authorization cookie, and finally reach the target page in case of success.
As described in the official site, you must modify the Anubis policy so DENY returns a 403 code instead of a 200 by default, so Nginx understands the result correctly:
status_codes:
CHALLENGE: 200
DENY: 403
We shall however improve this part, as some AI bots may keep trying when receiving a 403 code.
Configuring Anubis
Regarding Anubis configuration, the first part is about environment variables and the second about policies.
Environment variables
Here is an extract of the environment variables I use, let me describe just after the important aspects relating to our subject:
BIND=/run/anubis/anubis.sock
BIND_NETWORK=unix
SOCKET_MODE=0660
COOKIE_EXPIRATION_TIME=168h
JWT_RESTRICTION_HEADER="User-Agent"
METRICS_BIND=:8240
METRICS_BIND_NETWORK=tcp
POLICY_FNAME=/usr/local/etc/botPolicies.yaml
REDIRECT_DOMAINS="example.net,*.example.net"
SERVE_ROBOTS_TXT=false
TARGET=" "
The first three lines are about the Anubis server listening endpoint. I elected to use a unix socket, you could also use a tcp port, in which case you will need to adapt the following Nginx configurations. Make sure the www-data user has proper read access rights to the socket (belongs to the group executing Anubis).
Cookie expiration is set to one week. You could reduce this duration, knowing that every time it expires the Mastodon timeline flow will break on the browser and require a refresh from the user. So you need to choose the right balance between protection and user experience.
For the same reason, JWT_RESTRICTION_HEADER is tied to the User-Agent rather than the user IP address (default). The issue with the IP address is that it may dynamically change when browsing on a mobile device or when using some VPN networks, in which case the Mastodon timeline breaks, sometimes only after a few minutes, which makes the service unusable. Thus we need to adapt the protection level by ignoring the IP address.
The four following variables relate to observability using Prometheus (we will touch on this further down), policies file and authorized redirect domains which you should configure (those of your applications to protect).
The SERVE_ROBOTS_TXT variable does not make sense in the Anubis mode we use, however we will serve our own robots.txt file directly from Nginx.
Finally, the value of TARGET=" " is essential so subrequest authentication mode works.
Policies
I use only one Anubis instance to protect all my applications, so policies must be adapted to all of those, which in practice is no issue.
The critical point is the one mentioned previously about status_codes. Otherwise, default policies make perfect sense, possibly adapted to your liking. Remember that when no policy matches the request, it will be accepted by default: so federation between servers, for example, will not be an issue.
You might however want to explicitly accept some flows, such as federation or RSS feeds:
- name: federation
action: ALLOW
user_agent_regex: >-
^(Mastodon|Pleroma|Akkoma|Misskey|Calckey|Firefish|gotosocial|Friendica|Hubzilla|Lemmy|Kbin|PeerTube|Pixelfed|BookWyrm|WriteFreely|Mobilizon|Funkwhale|Sharkey|Loops|AodeRelay)
expression:
all:
- '"Accept" in headers'
- 'headers["Accept"].matches("^application/(activity\\+json|jrd\\+json|ld\\+json)(\\s*;.*)?$")'
- name: rss-feeds
action: ALLOW
expression:
all:
- '"Accept" in headers'
- 'headers["Accept"].matches("^application/(rss|atom|rdf)\\+xml(\\s*;.*)?$")'
Let’s just remember that Anubis’ objective is not to block all bots but specifically those which want to mass-scrape your internet pages. For malicious bots, I also use other tools such as iptables and ipset to block toxic IP addresses and fail2ban for brute-force attacks (but that’s a different story).
Configuring Nginx
Now that our Anubis service is ready, let’s move to Nginx which remains the public internet facing server. In the file relating to Mastodon (or any other application to protect), for example /etc/nginx/sites-available/mastodon, we shall add the following directives.
General configuration
Here are the lines to add, slightly adapting the official documentation:
location ^~ /.within.website/ {
proxy_pass http://unix:/run/anubis/anubis.sock:;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $http_host;
proxy_set_header X-Original-URI $request_uri;
proxy_set_header Accept $http_accept;
proxy_set_header User-Agent $http_user_agent;
add_header Cache-Control "no-store";
proxy_pass_request_body off;
proxy_set_header Content-Length "";
auth_request off;
proxy_cache off;
proxy_connect_timeout 1s;
proxy_read_timeout 1s;
proxy_send_timeout 1s;
}
location @redirectToAnubis {
return 307 /.within.website/?redir=$scheme://$host$request_uri;
auth_request off;
proxy_cache off;
}
Note that we explicitly instruct not to use a cache when directing the request to Anubis (which would impair verification) and that we reduce the timeout so failsafe configuration can work acceptably from a user’s standpoint.
Optional : serve fake pages to AI bots
You can have the same behavior of Anubis as in proxy mode and not return a 403 code to rejected bots (as those may retry again and again), for this we have three possible options.
Basic option: static page
You can serve a static text whilst returning a 200 code by adding the following lines to the same file:
location @fake200 {
default_type text/html;
return 200 "<h2>Service not available</h2>\n";
auth_request off;
}
Advanced option: dynamic page
You can also serve dynamic pages generated on-the-fly with lua scripting. Why do that? Because this allows for variable content that may look like real html pages (different content but stable per bot, varying jitter, realistic content). The objective here is that the AI bot finds content and concludes: “nothing interesting here”.
To achieve that, we shall rather add the following lines to the same file (replacing previous option):
location @fake200 {
access_by_lua_file /var/www/html/fake_200.lua;
auth_request off;
}
For this to work, you need two conditions: the lua module must be installed with Nginx (it is not by default on Ubuntu 24.04LTS), and a file named fake_200.lua must be present under /var/www/html/ (or any other directory of your choice). To install the module with Ubuntu :
$ sudo apt install libnginx-mod-http-lua
If you are interested by my own complete fake_200.lua file, please contact me.
local ngx = ngx
--
-- Logic : cache, fingerprinting, generate random content, noise, padding, jitter
--
ngx.say(body)
return ngx.exit(ngx.HTTP_OK)
Nuclear option: zip bomb
It is possible to go even further, and rather than serve deceptive content to the bot, serve it really toxic content. To achieve this, there are some simple techniques such as “zip bombs” which will be very costly resource-wise for the robot, or even trigger a failure.
You can easily find examples on the internet and adapt the previous configurations. As for myself, I am reluctant to use these techniques, but it’s up to you…
Nonetheless, here is an example of implementation (inspired by Ibrahim Diallo on his blog), which will serve a 10 MB file which will unzip to 10 GB, usually enough to trigger a failure for unaware bots. Generate the file:
$ dd if=/dev/zero bs=1G count=10 | gzip -c > /var/www/html/10GB.gz
And now, add the following lines to the same Nginx configuration file (replacing previous options):
location @fake200 {
add_header Content-Encoding gzip;
return 200 /10GB.gz
auth_request off;
}
location = /10GB.gz {
root /var/www/html;
auth_request off;
}
Optional: failsafe mode
With Anubis virtually sitting in front of all my applications, a failure would make them all unavailable at once. Even if that has never happened yet, I prefer that users can continue to use my applications should an issue occur with Anubis, waiting for resolution. You simply need to add the following lines to the same file:
location @allowOnFailure {
try_files $uri @mastodon;
auth_request off;
}
The lines above auth_request off; must be adapted to what is served by default on the protected paths (only root / for Mastodon).
Note that I don’t use add_header Strict-Transport-Security "max-age=63072000; includeSubDomains"; in each location directive, as opposed to what is listed in the Mastodon official documentation, as I set this header at the server-level using always :
server {
...
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains" always;
...
}
OAuth flow
The OAuth authorization flow is broken for apps that don’t use a browser and don’t follow a 307 redirect (case of Tusky as an example). To allow this flow type, you must explicitly bypass Anubis, so you need to add in the Nginx configuration file for Mastodon (to adapt for other Fediverse applications):
location ^~ /oauth/ {
try_files $uri @mastodon;
auth_request off;
}
location ^~ /api/v1/apps {
try_files $uri @mastodon;
auth_request off;
}
location ^~ /auth/sign_in {
try_files $uri @mastodon;
auth_request off;
}
It may seem counter-intuitive to bypass Anubis for the path /auth/sign_in but once again, Anubis is not a protection against brute-force attacks, use fail2ban or an equivalent for that.
Embedded media
Embedded media are also an issue as recent browsers do not allow iframe-type embedded pages to store their cookies. Thus if you want to embed posts (e.g. Mastodon) or media (e.g. PeerTube), you must also bypass Anubis. Here are the lines you should add in Nginx for Mastodon:
location ~ ^/@[^/]+/\d+/embed$ {
try_files $uri @mastodon;
auth_request off;
}
location ~ ^/api/v1/statuses/\d+$ {
try_files $uri @mastodon;
auth_request off;
}
location = /embed.js {
try_files $uri @mastodon;
auth_request off;
}
This functionality is less used on Mastodon so you could choose to do without, but it remains necessary for PeerTube. Without those lines, the embedded window will display the Anubis “unauthorized” page.
Main part
On each path to protect, we can now add the following lines in the appropriate Nginx location. For Mastodon, this only needs to be done on the root directory / :
location / {
auth_request /.within.website/x/cmd/anubis/api/check;
auth_request_set $anubis_status $upstream_status;
error_page 401 = @redirectToAnubis;
error_page 403 = @fake200;
error_page 500 502 503 504 = @allowOnFailure;
try_files $uri @mastodon;
}
Note the line auth_request_set which is used for observability and to associate to Nginx logs the Anubis status, we will come back to this just after. If you are not interested, you can omit this line.
For all other location directives, we add auth_request off; to avoid verification. This usually covers static assets and other pages which need not or should not be protected (internal proxying, streaming…).
The robots.txt file
Finally, to remain consistent and even though many AI bots disregard it, you can serve your ownrobots.txt file to replace the one served by Mastodon (or any other Fediverse application) which is minalistic. Here an example of lines to add in Nginx:
location = /robots.txt {
auth_request off;
root /var/www/html;
default_type text/plain;
access_log off;
log_not_found off;
}
I put my own robots.txt file to be consistent with Anubis policies. I can provide it on request.
Let’s go!
Everything is now ready. Of course, if you are using S3 object storage with a Nginx reverse-proxy, you don’t modify anything in its configuration file, Anubis should not be used there.
If not done already, after setting everything up:
$ sudo systemctl restart anubis
$ sudo systemctl reload nginx
You now should have the Anubis verification page on your first site visit, and then you will be fine for the duration you configured (one week if you followed exactly this guide).
After the cookie expiry, the Mastodon timeline flow on a browser will break: if the browser were still open, you will then get a Mastodon error message, inviting the user to refresh the page, which resolves the situation. It’s a minor hindrance that I could not get rid of, but it seems acceptable. Of course, this won’t happen on client apps.
This configuration works fine for me after several months, with no visible impact on performances, Anubis protecting five Fediverse applications on each of my two main servers. The toll taken by AI bots has significantly decreased, and unresolved challenges vary from 70% to 95% depending on the period, giving a hint on the important proportion of AI bots trying to mimic they are legitimate users!
Bonus: observability
If you want to add some metrics to evaluate efficiency, you can use two complementary sources.
Prometheus
If you configured Anubis as described, you will find on port 8240 Prometheus metrics you may use:
curl http://127.0.0.1:8240/metrics
As for myself, I export those using Alloy to a Prometheus / Grafana server. Contact me if you want more details.
Nginx
By using the $anubis_status variable configured above, you can associate the “real” status returned by Anubis on each Nginx request in the logs (200, 401, 403…) and enrich statistics (e.g. by country).
As for myself, I upload filtered logs using Alloy to the same server with Loki / Grafana. Contact me also for more details.
Result
For an example of what you can expect, you can browse our Mastodon server status page here. Corresponding statistics are on the last section.
For the other Fediverse applications
The same logic applies to other applications, but configurations and paths in Nginx differ and require minor adaptations. Don’t hesitate to contact me if you want all details.
I successfully use Anubis to protect Mastodon, Pixelfed, PeerTube, Lemmy, Friendica, WriteFreely, Loops, Mobilizon, BookWyrm, Funkwhale, as well as FreshRSS and Movim blogs.
I welcome any feedback and suggestions. And down with the AI bots!
Techno-Fil et faits divers

Le blog d’un informaticien, animateur de multiples réseaux sociaux du fédiverse, administrateur système versé dans la sécurité informatique et la défense de la vie privée.
Je publierai des articles relatifs à l’informatique, la sécurité, la protection des données personnelles, avec le souci de vulgariser au maximum, plutôt en français mais pas exclusivement.
#infosec #security #privacy #dataprivacy #opensource #sysadmin #linux #fediverse #hardware #watercooling