Making of Símarómur for iOS – Part 3/3

Part 3: Welcome to the Desert

Search the App Store for TTS apps that work as actual system voices, and you won’t be searching long. Not because it’s well-organized – but because there’s almost nothing there.

Apart from Símarómur, we’re only aware of eSpeakNG and CereProc offering system voices through Audio Unit Extensions.

That’s not a coincidence.

The Mirage

Most TTS apps in the Store aren’t AUE implementations at all. They’re standalone apps that typically rely on cloud connectivity – which regular apps are allowed to use.

These standalone solutions dodge the restrictions, but they come with major trade-offs:

No other app can use those voices
No integration with iOS Accessibility
No internet, no speech

If you want a voice that truly integrates into the system, works from any app, and runs offline – you have to take the hard road through Audio Unit Extensions.

Crossing the Dunes

The technical bar is high. The documentation has holes. The margins are razor-thin.

We took this path anyway. Símarómur runs. On a platform that doesn’t make it easy, as a real system voice, offline, in under 60 megabytes. It’s available on the App Store since June 2025.

But how? Well … the same way you’ve shipped embedded systems for 25 years: refuse to quit, keep optimizing, keep the coffee coming.

Everything gets scrutinized. Model architecture. Runtime behavior. Every third-party library. In classic embedded work, you know exactly how much memory you have, so static allocation works fine. Here, there’s no room to keep things around. Every processing step needs to release its memory before the next one can start. We went with C++ and lean heavily on dynamic memory management – allocate, process, release, move on.

What we didn’t compromise on: voice quality and speed. The goal was never “make it fit and sound acceptable.” It was “make it fit and sound right.” Getting there forced us to develop an approach that takes model size off the critical path.

Today, with everything in place, we run with < 0.15 sec. latency on 1 CPU core only, under 40MB RAM.
There’s even still headroom left.

The Oasis?

Apple’s restrictions were a gift in disguise. They forced us to stop treating models as black boxes.

Shipping ML isn’t about throwing data at a model and waiting for magic. It’s about controlling and understanding the whole chain: data, model, architecture, and the engineering that ties it together. Python gets you started. But domain knowledge and classic software engineering gets you to the finish line.

Or we just wait for the LLMs to figure out 60 MB constraints?
We at Grammatek don’t hold our breath.