Microsoft slashes prices 60% on genAI tech that understands audio, video, and text

July 8, 2025 Yanac

Microsoft is cutting prices by 60% on generative AI (genAI) technologies in Azure designed to provide better understanding and insights for videos, text, speech, and images.

The Azure AI Content Understanding analyzes information from multimedia documents, extracts insights, and validates the information. Organizations can then use these insights to build AI agents.

“Whether you’re extracting layout from documents or identifying actions in a video, the new pricing structure delivers up to 60% cost savings for many typical tasks and more control over your spending,” Microsoft said in a blog post Monday.

The feature transforms unstructured information across all types of documents into more usable data for human workers, Vinod Kurpad, group product manager for Azure AI, said during a presentation about the technology at the recent Build conference.

“It’s designed to process multi-modal data including documents, images, audio, and video and transform all of that multi-modal and mixed-modal content into actionable insights,” Kurpad said.

Users can feed audio, video, images, or text into the tool, which will extract content from the documents. They can also set up a template that specifies the type of information to extract and understand and what kind of summaries to generate. An analyzer provides a confidence score, which validates the understanding extracted from the document.

The feature can also be used for sentiment analysis from audio and video files, which can be used for customer support. The features can be integrated into agents workers are already using to automate workflows.

The Azure AI Content Understanding is a three-step process from start to finish, with each step priced individually depending on the content type and engagement levels with documents. The steps include content extraction (which includes speaker recognition, identity verification, and layout and structure); field extraction (which includes tuning and genAI processing); and contextualization (which includes validating the information).

For example, Microsoft said content extraction from a 1,000-page document will cost $5, compared to $13 previously. Field extraction for a 1,000-page document now costs $14.14, down from $30 under the old pricing model.

The AI Content Understanding for Video now costs $3.83 for one hour of video, which includes content extraction, genAI processing, and contextualization.

The pricing is based on tokens, which are also offered to developers by Google and OpenAI. “We’ve restructured how you pay for document, audio, and video analysis — moving from rigid field-based pricing to a flexible, token-based system that lets you pay only for what you use,” Microsoft said.

This technology can be customized for agents in specialized verticals such as the finance industry, compliance, and healthcare.

ASC Technologies AG is already using the technology to analyze all Microsoft 365 communications, including emails and chat. The results are delivered five times faster and provide a clear view of costs, Tobias Fengler, chief engineering officer at the Germany-based company, said during the Build presentation.

“We have 30% less R&D effort because we have to work with fewer services, and we have implemented a few new agentic AI workflows,” Fengler said.

One Microsoft customer in the finance sector, Ramp, uses Content Understanding to automatically transform receipts, bills, and multi-line invoices into structured data, the presenters said.Microsoft slashes prices 60% on genAI tech that understands audio, video, and text – ComputerworldRead More