Optimized phonetic transcription for Python using AssemblyAI's Universal-1 - Latest View

Optimized phonetic transcription for Python using AssemblyAI's Universal-1



Jesse Ellis
25 November 2024 at 17:52

Learn how to transcribe audio files using Python with AssemblyAI's Universal-1, a model that provides near-human accuracy and multiple pricing levels to meet diverse needs.



Optimized phonetic transcription for Python using AssemblyAI's Universal-1

AssemblyAI has introduced its latest speech recognition model, Universal-1, setting a new standard for automatic speech recognition (ASR) accuracy. This model is designed to achieve near-human transcription accuracy, even in difficult audio environments with accents, background noise, and complex phrases. according to AssemblyAIThe Universal-1 model can now be accessed via the same web API as previous ASR models.

New pricing levels for Universal-1

Along with the launch of Universal-1, AssemblyAI has unveiled two new pricing tiers: Best and Nano. The Finest layer is optimized for maximum accuracy, while the Nano layer provides a cost-effective solution that supports transcription in 99 different languages. This flexibility allows developers to choose the right balance between accuracy and cost for their specific needs.

Getting started with the AssemblyAI Python SDK

To make copying easier, AssemblyAI provides an official Python SDK. Developers can easily install the SDK using the command:

pip install --upgrade assemblyai

After installation, users need to register for an AssemblyAI account to obtain an API key, which is necessary to authorize API calls in Python scripts.

Rip audio files using Universal-1

Once set up, developers can transcribe audio files by creating a Python script. By default, the SDK uses the best layer for copy operations, ensuring the highest accuracy. The process includes importing the SDK, configuring the API client using the API key, and specifying the audio file URL or local path.

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"
transcriber = aai.Transcriber()
audio_file = "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3"
transcript = transcriber.transcribe(audio_file)

if transcript.error:
    print(transcript.error)
else:
    print(transcript.text)

Running the script will output the copy results into the device, demonstrating the impressive capabilities of the model.

Nano layer exploration

For those looking for a more economical option, switching to nano-coat is straightforward. Developers can adjust TranscriptionConfig Object to use Nano model by adjusting speech_model Parameter to “nano”.

config = aai.TranscriptionConfig(speech_model="nano")
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file)

This flexibility allows for efficient use of resources while still taking advantage of AssemblyAI's powerful replication capabilities.

Beyond transcription: additional features

AssemblyAI's offerings extend beyond basic transcription. The platform provides advanced features such as entity detection, content moderation, personally identifiable information (PII) redaction, and applying large language models (LLMs) to audio data. These capabilities enhance the utility of the transcription service, making it suitable for a wide range of applications.

Developers interested in taking advantage of these features can explore AssemblyAI's documentation and research resources for more ideas on building advanced speech AI solutions.

Image source: Shutterstock


Leave a Comment