WWDC, Apple, and AI: Waiting for the gift
I will sit right down (waiting for the gift of sound and vision)And I will sing (waiting for the gift of sound and vision)
— David Bowie
Apple is planning to sponsor and present 14 AI research papers at the annual IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in Denver next week, just days before it introduces major new AI features at its Worldwide Developer Conference (WWDC).
The fresh research explores topics such as using LLMs in image generation, quality testing, and user interface prototyping. For months, supply chain rumors have hinted at a radical evolution for the ubiquitous AirPods in the form of built-in ambient cameras. With this in mind, it’s noteworthy that one of the research papers, “From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs,” specifically seems to cater for such use cases.
Accessibility for the people
In application, this tech promises profound potential for accessibility. It suggests that someone with limited vision might be able to get their AirPods to guide them through an unfamiliar room. This is something that should fit well inside the company’s ongoing narrative around machine vision intelligence and accessibility.
Accessibility is central to a second presentation to be made during the Generative AI for Sign Language Workshop at the conference. Led by Apple’s Colin Lea, who presented a session on speech tech for people with speech disabilities at a similar event, this focus on machine vision intelligence and accessibility is entirely deliberate.
Indeed, even though the industry and critics condemn Apple for lagging behind others in the AI space, the publication of these 14 papers at a key industry session just before WWDC shows the company has been doing a great deal of foundational work behind the scenes. We expect this work to bear its first fruit at WWDC, and it is important to understand the disclosures as a power move. Apple is using the show to celebrate its strengths in AI development, and given its decade work on Apple Car, many of those strengths relate to machine vision intelligence.
Apple is so advanced in the field it is already deploying advanced models that empower consumers. Just last week, it promised to introduce a new tool called Image Explorer in VoiceOver to help partially sighted customers later this year. Among many other features, this will arrive alongside a system to let disabled users control compatible wheelchairs with spoken word commands.
Apple is pushing boundaries all the way. Its paper “VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models,” proves it is actively refining models to process live video instantly on consumer hardware.
What matters, the human or the machine?
The difference between Apple and its competitors is deep and philosophical. I’d argue that while others build cloud-dependent chatbots, Apple is embedding AI tools that solve real human problems in its systems.
This extends to its plans at WWDC, where it will introduce a raft of AI tools made with help from Google Gemini and a host of AI services it has developed in house. The latter will include a great many accessibility tools of the type it will discuss at the CVPR event, the beauty of which being that they will run privately and on-device. You could argue that while other tech giants are using AI to automate white-collar jobs or build a surveillance dystopia, Apple is searching for applications of machine intelligence that solve real human problems.
The company seems pretty realistic about the ongoing AI transformation. It recognizes that its own ecosystem must become a peer player in the emerging AI-augmented environment the tech industry seems intent on building.
With that in mind, Apple is willing to engage in strategic, mutually beneficial partnerships, such as permitting Siri to use third-party AI services to handle requests. But even as it does that, it is also focusing on those areas in which it can make a unique difference, such as the accessibility features Apple as a platform has always provided.
Open up
As the Vision Pro demonstrated, and as these mythical video-enabled AirPods will in the future suggest, computers are steadily getting smarter. So, the way we use them is also changing as we move away from the rigid boundaries of keyboards, mice, and touchscreens. Apple’s quest for ambient computing began long before the sudden gold rush for generative AI chatbots.
In the end, as the latter services become commodified, the way humans interact with them will define the next generation of hardware. That’s exciting for Apple, given that product design is where it excels. The era of sound and vision may finally have arrived.
You can follow me on social media! Join me on BlueSky, LinkedIn, Mastodon, and MeWe. WWDC, Apple, and AI: Waiting for the gift – ComputerworldRead More