If you grew up in the 90s, chances are Siri was your first encounter with AI. Introduced as part of the iPhone 4S in 2011, Siri quickly became a household name, simplifying tasks and adding a fun element to interactions. However, in recent years, there haven’t been significant updates to Siri. With AI gaining traction, particularly after the launch of OpenAI’s ChatGPT, reports suggest that Siri might soon get smarter.
Reports of Apple working on generative AI features for Siri have circulated for some time. A research paper from Cornell University discusses a new Multimodal Large Language Model (MLLM) called Ferret-UI, which aims to understand how a phone’s interface functions. While AI has made progress, it still faces challenges in interacting with user interfaces effectively.
Ferret-UI, launched last year, is designed to understand UI screens and potentially how apps on a phone operate. It boasts “referring, grounding, and reasoning capabilities.” One major challenge is the diverse aspect ratios and compact visual elements of smartphone displays. Ferret-UI addresses this by magnifying details and utilizing enhanced visual features, surpassing existing models in understanding and interacting with app interfaces.
If integrated into Siri, Ferret-UI could make the digital assistant much smarter. It might enable Siri to perform complex tasks within apps, like booking flights or making reservations, by seamlessly interacting with the corresponding app.
Ferret, an open-source multimodal large language model, resulted from collaboration between Apple and Cornell University. It enables interfaces to handle queries akin to those handled by ChatGPT or Gemini. While Ferret was released for research purposes last year, its integration into Siri could revolutionize the capabilities of Apple’s voice assistant.
As AI continues to evolve, Siri’s potential to understand and interact with iPhone apps may drastically improve, enhancing user experience and functionality.