Posted 11 Sep 2017 in Swiftlang

Failing Actors Reading Series

Brief thoughts on lessons we can learn from Erlang when designing reliable actors in Swift.

⚠️

This post was automatically migrated from my old blogging software, and I have not reviewed it for problems yet. Please contact me if you notice any important issues.

Toward the end of Fatal Error episode 42 (teaser), Soroush and I discussed the fault isolation section of Chris Lattner’s Swift concurrency manifesto.

I recently reread an older post about how Elixir (via the Erlang VM) provides fault tolerance, and I think there are some useful ideas here when thinking about how the “reliable” actor model Chris proposes should handle failiures.

The manifesto proposes two options to handle an actor’s failure or crash:

Option 1. Provide a standard library API to register failure handlers for actors, allowing higher level reasoning about how to process and respond to those failures. …

Option 2. Force all actor methods to throw, with the semantics that they only throw if the actor has crashed. …

We plan to revisit this in detail in an upcoming episode of Fatal Error, but before we record this episode we’d like interested listeners to read the following articles and send us their thoughts and questions:

Concurrent and Distributed Programming with Erlang and Elixir: Part 1
Errors and Processes
Who Supervises The Supervisors?
Fault Tolerance doesn't come out of the box
Updated 2017-09-12 to add: Concurrency in Erlang & Scala: The Actor Model

Please read those links and let us know your thoughts. We record tomorrow evening.

My gut feeling, having read these articles, is that something like Option 1 is the right way to handle actor failures. Option 2 shifts too much potentially-complex responsibility to call sites, and they’re likely to delegate up to some more central supervisor-like object to handle failures anyway.

But the exact proposal in Option 1 is too simplistic, especially for server-side code which may use distributed actors; we need either language-level or stdlib-level tools which allow linking dependencies between actors and specifying respawn/retry behaviors for actor groups. If the language or standard library doesn't provide these tools, every application will end up reimplementing some subset of them, probably poorly.

As always, I welcome discussion and feedback; I’m @cdzombak on Twitter.