â€Ļ has too many hobbies.

Setting Up & Testing Spotlight + Samba + Elasticsearch

It's possible for Samba to use Elasticsearch to allow macOS Spotlight to use server-side searching on SMB shares. It's annoying to set up and debug, but/and this blog post aims to document everything I've learned.

Versions

It's critical to understand that Samba is not compatible with a broad range of Elasticsearch versions. I'm not sure why this is, but in my experience Samba 4.23.1 is:

  • not compatible with Elasticsearch 9.x
  • compatible with Elasticsearch 8.8.2

The symptom of a version incompatibility is basically that the server returns no search results. You'll see this when testing with mdsearch on the server side (see Testing below).

You'll probably want to have a dedicated Elasticsearch server for Samba use since this and other Samba-related limitations reduce the flexibility you'd want if using the same server to support other applications.

Elasticsearch Configuration

Please see the deploy-example/elasticsearch directory in my samba-docker repo for example Elasticsearch configuration and a docker-compose file.

Note that to make things easier, I've disabled HTTPS on this Elasticsearch setup. This is fine for me since it's only accessible over Tailscale; I recommend you do something similar. Otherwise you'll need to set up Samba to trust your Elasticsearch TLS certificate somehow (this might be easy if you let Tailscale handle certificate issuance; otherwise đŸ¤ˇâ€â™‚ī¸).

Importantly, note that Samba supports neither basic auth nor API key based authentication for Elasticsearch. This means you need to allow anonymous ES users access to the relevant indexes. This is achieved by assigning anonymous users a role, then assigning that role read & monitor access on the relevant ES indexes.

Running a Recent Samba Version

Running a recent Samba version is the easiest part of this setup by far. This is accomplished using my samba-docker repository; see the README for setup instructions and the deploy-example directory for a docker-compose stack including supporting files.

FSCrawler Setup

The samba-docker repository also includes an example FSCrawler setup, complete with docker-compose file and FSCrawler configs. Refer to the FSCrawler docs for more information.

You'll need an API key to allow FSCrawler to authenticate to Elasticsearch. I created an API key for the elastic user for this purpose using the Kibana UI (available by default on port 5601 on the same host as Elasticsearch).

This was easy enough to do but is not a secure setup. A future update to this setup will add a dedicated fscrawler Elasticsearch user with just the permissions needed to monitor & update the relevant ES indexes.

Samba Configuration

Set up Elasticsearch in the [global] section of your smb.conf:

   spotlight backend = elasticsearch
   elasticsearch:address = es.tailnet-XXYY.ts.net
   elasticsearch:port = 9200
   elasticsearch:max results = 50
   elasticsearch:ignore unknown attribute = yes
   elasticsearch:ignore unknown type = yes

And then, in smb.conf, configure each share indexed by FSCrawler:

spotlight = yes
elasticsearch:index = fscrawler-index-name-for-this-share

See the example smb.conf in my samba-docker repo for a complete example.

Testing & Debugging

Testing and debugging this setup is by far the most time-consuming and annoying part of the process. It will (unfortunately) require you to know how to use the tools involved here — Docker, Docker Compose, Linux, and Elasticsearch — and I can't possibly provide a comprehensive guide.

However, the basic outline is this:

  1. Verify FSCrawler is writing results into ES (check FSCrawler logs and the Kibana UI)
  2. Verify that the Samba server can read results from ES (mdsearch command, within the Samba Docker container)
  3. Verify your Mac client is using Server Search for your Samba share (mdutil -s /Volumes/my-share on your Mac)
  4. Verify your Mac is getting results from Samba (mdfind -onlyin /Volumes/my-share <search-term> on your Mac)

To run mdsearch within the Samba Docker container (and therefore test whether Samba can get search results from ES):

docker compose exec samba mdsearch localhost "my_samba_share" '*=="my_search_term"' -U "my_samba_username"

The fs2es-indexer README contains a nice troubleshooting guide that I've found very valuable.

Make Single-Node FSCrawler Indexes Green

If you set up monitoring software on this Elasticsearch stack, it'll be unhappy because your fscrawler indexes are yellow (i.e. unhealthy). This is because each index wants to have more than zero replicas, but on this single-node setup they (inevitably) have zero replicas.

You could fix this by deploying another Elasticsearch node. But this is overkill for indexing a home NAS.

I fix it by issuing the following command for each FSCrawler index:

curl -XPUT \
    -u "elastic:my_elastic_password" \
    http://es.tailnet-XXYY.ts.net:9200/fscrawler-index-name/_settings \
    -H 'Content-Type: application/json' \
    -d '{ "index": {"number_of_replicas" : 0 } }'

Resources & Reference Materials

On Samba/Elasticsearch Version Compatibility