Story detection identifies and describes groups of related documents (= stories) from digital content streams. webLyzard extracts a rich set of metadata for each story identified. This includes the origin of the story in terms of publication time and author, its impact on the public debate, the temporal distribution of related publications and the best keywords to summarize the story’s content.
The following figure summarizes the results of a query on “Tesla” as an interactive Story Graph. In the screenshot of the full dashboard at the end of this article, it is shown together with a list of top stories. Each story includes a headline with the keywords and size of the cluster, a characteristic lead article, and a list of related documents. A short video tutorial shows the integration of the Story Graph and other visual tools of the InVID project into webLyzard’s visual analytics dashboard.
Clustering Digital Content Streams
The story detection component builds on highly scalable methods to cluster documents in real time. These methods work across multiple content sources and languages (English, French, German and Spanish). They are also robust vis-à-vis noisy data. Examples for such noisy data include user postings from social media platforms or results from speech-to-text conversion. webLyzard’s clustering produces high-quality results even when applied to documents of very different structure and length. Three keywords per cluster serve as a label to describe its contents.
Story Detection – Graph Representation
- Tooltips. Hovering individual stories indicates their duration, the number of documents that belong to a particular story, and the associated keywords. A synchronization mechanism automatically highlights the corresponding story in the Story View as well. Users can use the tooltip to focus on this particular story or exclude its content from the query.
- Settings. The settings icon in the upper right corner provides various graph rendering options. These options include labels, the underlying metric (document count vs. weight) and methods to stack (silhouette, expand, zero, wiggle) and sort (default, inside-out, reverse) the stories.
Story Detection – Table Representation
- Clicking on the title or snippet of an article activates its full-text view.
- Clicking on the name of the source opens a separate window with the original article or posting.
- The number of articles in the grey headline triggers a search for those articles. The arrow down expands the list of shown articles.
Last major update with release 2020-06 (Sagebrush Lizard).