Video Retrieval

Initially released in April 2014, the video retrieval component extends the multimedia capabilities of the webLyzard platform, building upon existing content acquisition services to gather and analyze user-generated content from YouTube and other social media channels. The new component (i) collects and processes the transcripts of YouTube videos, (ii) makes them available to the platform’s analytic tools, and (iii) provides playback functionality to not only show the entire video, but also individual fragments on a more granular sentence level.

Video Playback (Source Tab: “Social Media”). Video data is gaining importance when it comes to social media. Users can play full-length videos directly within the portal, either by clicking on the icons that appear on mouseover in the list of search results, or by using the “Play” button in the full text view.

Video Fragment Playback (Source Tab: “Video”). A separate source tab allows searching and showing specific sentences within the video transcript. If such video fragments are available for a given search term, the system automatically adds a video column with play buttons for each listed video fragment.

Data Collection and Transcription. While YouTube allows video owners to provide their own transcript files for closed captioning, they have also been offering automatic transcripts for some time. The automated transcription technology has seen constant improvement over the years and now produces high levels of accuracy for the identification of key terms within speech. webLyzard processes the URLs of YouTube videos to gather transcripts of each video, identify temporal fragments, and annotate the named entities that occur in each fragment.

Metadata Generation. webLyzard collaborates with Media Mixer, a support action funded by the European Commission’s 7th Framework Programme, to make use of transcripts in combination with innovative multimedia solutions. The collaboration with Media Mixer consortium has enabled us to:

  • split videos into temporal fragments, generally corresponding to the sentence level in speech;
  • annotate text fragments, using Linked Data to provide a unique identification for each concept, thereby resolving ambiguities in natural language and connecting annotations to additional metadata about identified entities.
  • create machine-processable video annotations to connect temporal fragments to the annotated entities, enabling advanced semantic search capabilities for video material at the fragment level.

References

Nixon, L., Bauer, M. and Scharl, A. (2014). Enhancing Web Intelligence with the Content of Online Video Fragments. International Semantic Web Conference (ISWC-2014), Proceedings of the Posters and Demonstrations Track.. Riva del Garda, Italy: CEUR Volume 1272: 109-112.

Available as of release 2013-03.1 (Desert Iguana)