Making of Símarómur for iOS – Part 1/3

Part 1: Welcome to Apple’s Sandbox

Before I got into Icelandic language technology, I spent most of my career in embedded software. That world is all about writing code for devices and machines – usually without any graphical interface. Think microcontrollers in washing machines, or compute clusters in rockets.

You learn to live with constraints early on. Embedded systems are almost always a tight marriage of software and hardware, built for a specific purpose – and squeezed onto minimal resources because every cent counts. Limited memory. Limited processing power. Limited options.

If you’re lucky, you get a say while the hardware is still being designed. A bit more RAM here, a faster chip there. More often, you don’t. And then it’s all about creative problem-solving and relentless optimization to make things work anyway.

An iPhone Is Not a Washing Machine

When Apple opened up voice interfaces to third-party developers with iOS 16 in 2022, we took note. But we had other priorities first: getting Símarómur to run on Android with a modern neural TTS model. By early 2024, that was done. Time to tackle iOS.

Apple’s TTS interface is built on something called Audio Unit Extensions (AUE) – a framework borrowed from the music world, where it powers cross-app filters and effects.

AUE are an odd beast. They spin up as a separate process inside whatever app needs them, yet they physically live inside another app – the “Containing App.” In our case, that’s Símarómur.

                              ┌─────────────────────┐
                              │  Símarómur          │
┌─────────────┐               │  (Containing App)   │
│  Safari     │               │                     │
│  ┌───────┐  │    loaded     │  ┌───────────────┐  │
│  │ AUE ◄─┼──┼───────────────┼──│               │  │
│  └───────┘  │               │  │  TTS          │  │
└─────────────┘               │  │  Extension    │  │
                              │  │               │  │
┌─────────────┐               │  │ (Code + Data) │  │
│  Books      │               │  │               │  │
│  ┌───────┐  │    loaded     │  │               │  │
│  │ AUE ◄─┼──┼───────────────┼──│               │  │
│  └───────┘  │               │  └───────────────┘  │
└─────────────┘               │                     │
     │                        └─────────────────────┘
     ▼
 runs here                         lives here

On Android, things work differently. A TTS engine runs as a standalone service that any app can call:

┌─────────────┐     ┌─────────────┐
│  Browser    │     │  E-Reader   │
└──────┬──────┘     └──────┬──────┘
       │                   │
       └─────────┬─────────┘
                 ▼
┌─────────────────────────────────┐
│  Android TTS Service            │
│  (Símarómur)                    │
│                                 │
│  · runs here                    │
│  · lives here                   │
│  · all in one place             │
└─────────────────────────────────┘

Two platforms, two architectures. What they have in common: the sample code provided by Apple and Google barely hints at the actual complexity involved. In both cases, the demo projects are outdated, rely on deprecated interfaces, and won’t even compile without significant rework.

That aside:

We came prepared. Years of embedded experience. A working Android implementation. We knew how to deal with constraints.

What we didn’t expect: an iPhone has better specs than most of the embedded systems I’ve worked on. But the constraints Apple imposes? A current washing machine wouldn’t run on that.

Next: The Rules Apple Doesn’t Document →