Cluster Map

The cluster map is an intuitive way to group search results by topic. Its structured representation arranges documents by their semantic similarity using different clustering methods [1] in combination with a force-directed layout algorithm [2]. The system assigns each document to a specific cluster, which acts as a local gravity point. Its largest node rests at the center and attracts other nodes that belong to this cluster.

Layout. The Cluster Map highlights groups of similar documents by a convex hull shape that visually holds its nodes together. The size of this shape is dynamic and depends on the number of contained nodes. Each of the nodes of variable size and color represents one of the documents returned by the search function:

  • Node size is proportional to the reach of the document’s media source (a CNN.com article, for example, is rendered larger than a report published on a local community site).
  • Node color reflects normalized document sentiment, ranging from red (negative) to grey (neutral) and green (positive). The saturation depends on the degree of polarity – vivid colors indicate emotional articles, lower saturation a more factual coverage.

Cluster Map Thumbnail

Three keyword labels per cluster describe its contents. The system renders the hull shapes of nodes and clusters with reduced opacity to decrease the visual load and increase the labels’ readability. The computation of labels is based on the document keywords within the cluster, and considers the reach of the documents’ sources.

Interactive Features

  • Hovering over a cluster hides its keywords and highlights its shape and nodes through higher opacity. Node colors within a highlighted cluster become more vivid.
  • Clicking on a cluster triggers a new search, narrowing down the set of results to documents within the selected cluster.
  • Hovering over a single node highlights this node with an orange stroke and shows a tooltip with document keywords and the favicon of the source.

Clustering. Users can choose between two different methods: K-means divides the collection of documents into a fixed amount of clusters. Each document belongs to the cluster with the nearest centroid. In contrast to K-means, agglomerative hierarchical clustering is deterministic (= a given set of documents always results in the same layout) and uses an iterative “bottom up” approach to pair clusters into a tree-like structure.

References

  1. Jain, A.K. (2010). Data Clustering: 50 Years Beyond K-means, Pattern Recognition Letters, 31(8): 651-666.
  2. Syed, K.A.A., Kröll, M., Sabol, V., Scharl, A. and Gindl, S. (2012). Incremental and Scalable Computation of Dynamic Topography Information Landscapes, Journal of Multimedia Processing and Technologies, 3(1): 49-65.