About: Audio-visual speech recognition

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: Audio-visual speech recognition Goto Sponge NotDistinct Permalink

An Entity of Type : owl:Thing, within Data Space : el.dbpedia.org associated with source document(s)

Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving preponderance among near probability decisions.

Attributes	Values
rdfs:label	Audio-visual speech recognition (en)
rdfs:comment	Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving preponderance among near probability decisions. (en)
sameAs	Audio-visual speech recognition Audio-visual speech recognition
dbp:wikiPageUsesTemplate	dbt:Comp-ling-stub
Subject	Applications of computer vision Computational linguistics Speech recognition Multimodal interaction
Link from a Wikipage to an external page	https://ai.googleblog.com/2018/04/looking-to-listen-audio-visual-speech.html https://arxiv.org/abs/1804.03619 https://web.archive.org/web/20060910004053/http:/www.research.ibm.com/AVSTG/
prov:wasDerivedFrom	http://en.wikipedia.org/wiki/Audio-visual_speech_recognition?oldid=1017609551&ns=0
Wikipage page ID	6990718 (xsd:integer)
page length (characters) of wiki page	1387 (xsd:nonNegativeInteger)
Wikipage revision ID	1017609551 (xsd:integer)
Link from a Wikipage to another Wikipage	Applications of computer vision Computational linguistics Speech recognition Multimodal interaction Digital image processing Lip reading Phone (phonetics) Speech recognition dbr:Feature_fusion
has abstract	Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing undeterministic phones or giving preponderance among near probability decisions. Each system of lip reading and speech recognition works separately, then their results are mixed at the stage of . As the name suggests, it has two parts. First one is the audio part and second one is the visual part. In audio part we use features like log mel spectogram, mfcc etc. from the raw audio samples and we build a model to get feature vector out of it . For visual part generally we use some variant of convolutional neural network to compress the image to a feature vector after that we concatenate these two vectors (audio and visual ) and try to predict the target object. (en)
foaf:isPrimaryTopicOf	http://en.wikipedia.org/wiki/Audio-visual_speech_recognition
is Wikipage redirect of	Visual speech recognition Avsr Audio visual speech recognition Audiovisual speech recognition AVSR
is Link from a Wikipage to another Wikipage of	Automated Lip Reading LipNet Avsr Audio visual speech recognition Audiovisual speech recognition AVSR
is foaf:primaryTopic of	http://en.wikipedia.org/wiki/Audio-visual_speech_recognition

Faceted Search & Find service v1.17_git151 as of Feb 20 2025

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3240 as of Nov 11 2024, on Linux (x86_64-ubuntu_focal-linux-gnu), Single-Server Edition (72 GB total memory, 1 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software