Support for Video Retrieval

Video Retrieval and Multimedia Support

The video retrieval component extends the multimedia capabilities of the webLyzard platform, building upon existing content acquisition services to gather and analyze user-generated content from YouTube and other social media channels. The new component (i) collects and processes the transcripts of YouTube videos, (ii) makes them available to the platform’s analytic tools, and (iii) provides playback functionality to not only show the entire video, but also individual fragments on a more granular sentence level.

Video Playback

Video data is gaining importance when it comes to social media. Users can play full-length videos directly within the portal, either by clicking on the icons that appear on mouse-over in the list of search results, or by using the “Play” button in the full text view.

A separate data source allows searching and showing specific sentences within the video transcript. If such video fragments are available for a given search term, the system automatically adds a video column with play buttons for each listed video fragment.

Data Collection and Transcription

While YouTube allows video owners to provide their own transcript files for closed captioning, they have also been offering automatic transcripts for some time. The automated transcription technology has seen constant improvement over the years and now produces high levels of accuracy for the identification of key terms within speech. webLyzard processes the URLs of YouTube videos to gather transcripts of each video, identify temporal fragments, and annotate the named entities that occur in each fragment.

Metadata Generation

Initially developed together with Media Mixer, a support action funded by the European Commission’s 7th Framework Programme, and further extended as part of the InVID Horizon 2020 Innovation Action, the transcript search is deployed in combination with innovative multimedia solutions, enabling the platform to:

  • split videos into temporal fragments, generally corresponding to the sentence level in speech;
  • annotate text fragments, using Linked Data to provide a unique identification for each concept, thereby resolving ambiguities in natural language and connecting annotations to additional metadata about identified entities.
  • create machine-processable video annotations to connect temporal fragments to the annotated entities, enabling advanced semantic search capabilities for video material at the fragment level.

References

Nixon, L., Bauer, M. and Scharl, A. (2014). Enhancing Web Intelligence with the Content of Online Video Fragments. International Semantic Web Conference (ISWC-2014), Proceedings of the Posters and Demonstrations Track.. Riva del Garda, Italy: CEUR Volume 1272: 109-112.

Last Update as of release 2018-12 (Queensland Lizard)