Meta has created an AI model for the metaverse

Artificial intelligence will be the backbone of virtual worlds.

Ai can be combined with a variety of related technologies in the metaverse, such as computer vision, natural language processing, blockchain, and digital twins.

In February, Zuckerberg showed off what The metaverse might look like at Inside The Lab, The company's first virtual event. He said the company is developing a range of new generative AI models that will allow users to generate their own virtual reality avatars simply by describing them.

Zuckerberg announced a number of upcoming projects, such as CAIRaoke, a fully end-to-end neural model for device voice assistants that will help users interact with them more naturally.

At the same time, Meta is working to build a universal speech translator that will provide direct speech-to-speech translation for all languages.

A few months later, Meta made good on their promise.

However, Meta isn't the only tech company with skins in the game.

Companies like Nvidia have also released their own AI models to provide richer metaverse experiences.

GAN verse 3D

GANverse 3D, developed by Nvidia AI Research, is a model that uses deep learning to process 2D images into 3D animated versions, a tool described in a Research paper published in last year's ICLR and CVPR that can generate simulations faster and at lower cost.

The model uses StyleGAN to automatically generate multiple views from a single image. The application can be imported as an extension to NVIDIA Omniverse to accurately render 3D objects in virtual worlds.

Nvidia's Omniverse helps users create simulations of their final thoughts in a virtual environment.

The production of 3D models has become a key element in the construction of the metaverse. Retailers such as Nike and Forever21 have set up their virtual stores in the metaverse to drive e-commerce sales.

Visual Acoustic Matching Model (AViTAR)

Meta's Reality Lab team worked with the University of Texas to build an AI model to improve sound quality in metaspace. This model helps match audio and video in a scene.

It converts audio clips to make them sound like they were recorded in a specific environment. The model uses self-supervised learning after extracting data from random online videos.

Ideally, users should be able to watch their favorite memories on their AR glasses and listen to the exact sounds generated during the actual experience.

Meta AI released AViTAR open source along with two other acoustic models, which is rare considering sound is an often overlooked part of the Metaverse experience.

Shock Absorption for Visual Impact (VIDA)

A second acoustic model released by Meta AI was used to remove reverberation from acoustics.

The model is trained on a large-scale dataset with various realistic audio renderings from 3D models of the home. Reverberation not only degrades the quality of the audio, making it difficult to understand, but also improves the accuracy of automatic speech recognition.

VIDA is unique in that it uses audio as well as visual cues. Improving on the typical audio-only approach, VIDA enhances speech and recognizes both the speech and the speaker.

VisualVoice (VisualVoice)

VisualVoice, the third acoustic model released by Meta AI, can extract speech from video. Like VIDA, VisualVoice is trained on audio-visual cues in unlabeled videos. The model has automatically separated speech.

This model has important application scenarios, such as making technology for the hearing-impaired, enhancing the sound of wearable AR devices, and transcribing speech from environmentally noisy online videos.

Audio2Face

Last year, Nvidia released an open beta version of Omniverse Audio2Face to generate AI-powered facial animations to match any voice acting. The tool simplifies the long and tedious process of animating games and visuals. The app also allows users to issue commands in multiple languages.

Earlier this year, Nvidia released an update to the tool that added features like BlendShape Generation, which helps users create a set of Blendhapes from a neutral avatar. In addition, streaming audio player capabilities have been added, allowing audio data streaming using text-to-speech applications.

Audio2Face sets up a 3D character model that can be animated with audio tracks. The audio is then fed into a deep neural network. Users can also edit roles and change their performance in post-processing.

Scroll to Top