Our demand for ever higher quality audio devices continues to drive innovation. Mobility through hearables (earbuds, wireless headphones and gaming headsets) are exploiting this opportunity, at the same time driving advances in hearing aids and over-the-counter hearing enhancement devices. After a slow year in 2023 hearable product volumes are taking off again, reaching 106M units in Q2 2024, at 10%+ growth year over year. True wireless stereo (TWS) and wireless headphones are cited as drivers for this growth. Naturally, product designers are eager to add AI capabilities to differentiate their hearable products. This is already starting to happen, through intelligently enhancing the quality of the audio experience rather than through more exotic AI features.
AI enhancing audio quality
Audio quality in hearables is a powerful differentiating feature, especially in noisy environments where the quality of the components can only do so much to block out unwanted noise. This is where AI and software come into play. For example, all hearables support some level of noise reduction, but this usually can only suppress steady background noise. More advanced adaptive noise cancellation solutions use AI to analyze ambient noise and adjust cancellation parameters in real time.
Similarly isolating speech from background noise is critical when on a phone call using a headset or earbuds. Mobility gives us the freedom to talk anywhere but the ambient background sounds – street noise, crowd chatter – as we walk or drive can easily drown out the conversation we want to hear. Speech is a distinguishable signal versus other audio but it takes AI to extract that signal and filter out the background noise. This capability is also important in hearing aids where the hearing impaired quickly lose conversation threads among even modest background noise.
Personalization is another area where AI is used to apply various audio processing techniques to match user preferences and hearing-impaired bands. Personalization is also useful for specific applications like gaming, say to enhance footsteps in first-person shooter games or to enhance in-game chat dialogue in multi-player games.
It is also worth noting that both Android and iOS are aiming to support Bluetooth 6 in sync with certified device releases, in anticipation of the need to take full advantage of opportunities in all these areas.
Building a competitive product
One important learning from TWS earbuds this year is that hearable demand can be very price sensitive, while consumers still expect that a total solution fit in a tiny footprint and run for many hours before needing to recharge. You need to stand out (or at least stay level) against other hearable options. How is that possible? Running those AI options on a general purpose MCU won’t work – that platform would be too slow and power hungry.
You at least need a DSP for high quality audio processing and to execute end-to-end applications. Streaming audio should also support the latest Bluetooth® standards and codecs for maximum quality. Or better still is a processor that can handle both DSP and AI functions. Able to efficiently fuse inputs from multiple sensors to handle TWS quality music streams, speech and ambient noise. Immersive spatial audio must also fuse IMU-based headtracking with audio to correctly position the sound source with head pose, also trending to AI-based methods. All this fitting into an earbud footprint supported by a tiny battery.
Is your product plan ready for AI in hearables?
The trick in making this work is to pack all that AI functionality into a small, even very small and ultra-low power footprint while still preserving low latencies for a high quality audio experience. This demands an embedded NPU core able to handle all the processing elements of a standalone DSP + NPU, including code execution and memory management. It should be fully programmable, to serve feature extraction, DSP functions and ML (machine learning) processing, plus control code. Equally it must manage power very carefully across applications, especially limiting power-hungry data flow between the system device and DRAM. In always-on mode it must be able to drop to tiny power levels.
Naturally the NPU should support today’s advanced machine learning data types and operators for CNN, DNN and native transformer models, fully interoperable with leading open-source inference frameworks such as TensorFlow Light for Microcontrollers (TFLM) and microTVM. To minimize product development time, developers should demand a robust Model Zoo of pre-trained and optimized ML models covering voice and sensing use cases important in hearable applications, and a comprehensive portfolio of optimized runtime libraries and off-the-shelf application-specific software.
Ready to re-examine your product plan? Check out Ceva’s NPU IP for embedded AI or give us a call to discuss your goals and suggestions we can offer.
More from Deep Learning
How The Smart Edge Drives Demand For Efficient Chip Design Strategies
Iri Trashanski, Chief Strategy Officer at Ceva, is shaping the future of the Smart Edge with extensive experience across tech sectors. Back …
The AIPC is Reinventing PC Hardware
We first started hearing about AI-enabled PCs (AIPCs) from Microsoft. As a platform, PCs may seem a mature and unpromising …