New Alibaba model Qwen3-Omni heightens competition in multimodal AI

5gDedicated

Alibaba is taking direct aim at US tech giants with Qwen3-Omni, a new open-source AI model that processes text, images, audio, and video, and is freely available under the enterprise-friendly Apache 2.0 license.

The release positions Alibaba as a potential alternative to OpenAI and Google by offering enterprises a no-cost way to deploy multimodal AI at scale.

“Qwen3-Omni adopts the Thinker-Talker architecture,” Alibaba said in a blog post. “Thinker is tasked with text generation while Talker focuses on generating streaming speech tokens by receives [receiving] high-level representations directly from Thinker. To achieve ultra–low-latency streaming, Talker autoregressively predicts a multi-codebook sequence.”

According to the company, Qwen3-Omni performed on par with single-modal models in its Qwen series and showed stronger results in audio tasks.

The model also ranked highest on 32 open-source benchmarks and 22 overall benchmarks, ahead of closed-source models such as Google’s Gemini 2.5 Pro, Seed-ASR, and OpenAI’s GPT-4o-Transcribe, according to Alibaba.  If accurate, these results suggest enterprises could expect stronger performance in speech recognition, transcription, and multimodal reasoning compared with many closed-source rivals.

Open-source advantage widens reach

Analysts say Alibaba’s release of Qwen3-Omni strengthens its position in the open-source AI market and will help expand its global partner ecosystem.

“Making Qwen3-Omni available under a permissive Apache 2.0 license materially changes the options on the table for enterprises,” said Tulika Sheel, senior VP at Kadence International. “It removes vendor lock-in and lowers the barrier to experimentation and customization-companies can run, adapt, and integrate the model inside their own environments without licensing friction.”

Others point to Alibaba Cloud’s track record in releasing models. “While OpenAI and Google have also made some of their models open source, they lag behind Alibaba Cloud,” said Lian Jye Su, chief analyst at Omdia. “Alibaba Cloud has made its Qwen model family open source from day one, releasing more than 300 models. The Qwen family has also seen widespread adoption, with over 400 million downloads worldwide.”

Su noted that developers have created more than 140,000 Qwen-based derivative models on Hugging Face, adding that enterprises seeking mature open-source options are increasingly viewing Alibaba Cloud as a leading choice.

Shaping enterprise AI strategy

If Qwen3-Omni’s hybrid reasoning, multimodal capabilities, and strong benchmark results translate into real-world performance, it could accelerate two key shifts in enterprise AI strategy.

“First, organizations will increasingly adopt multi-model stacks, blending open and proprietary models to match capabilities to tasks,” said Sheel. “Second, expect more investment in internal capabilities (MLOps, fine-tuning, safety testing, and infrastructure) so firms can operationalize high-performance open models on-prem or in trusted clouds.”

Su pointed out that handling all data modalities within one model could reduce resource demands and shorten the time needed to train and manage multiple domain-specific systems.

However, analysts also caution that technology maturity must be matched with safeguards.

“Technically, there is no difference between the Chinese models and those from the rest of the world,” said Charlie Dai, VP and principal analyst at Forrester. “Whether it’s the GPT series, Llama, Mistral, or Qwen by Alibaba, enterprise leaders should ensure guardrails for security, privacy, and regulatory compliance.”

That said, Dai expects support for multi-model to remain a central focus. “Multi-model support is the key area of model development and related technical domains, from data infrastructure to agentic AI applications, in the next 12 months. There will be more to come from leading vendors around the globe,” he said.New Alibaba model Qwen3-Omni heightens competition in multimodal AI – ComputerworldRead More