Cluster Map

Cluster visualizations are an intuitive way to group search results by topic. Their structured representation arranges documents by their semantic similarity using different clustering methods [1] in combination with a force-directed layout algorithm [2]. Each document is assigned to a specific cluster, which acts as a local gravity point. Its largest node rests at the center and attracts other nodes that belong to this cluster.

Layout. The Cluster Map highlights groups of similar documents by a convex hull shape that visually holds its nodes together. The size of this orange shape is dynamic and depends on the number of contained nodes. Each of the nodes represents one of the documents returned by the search function. It is shown as a circle shape of variable size and color:

  • Node size is proportional to the reach of the document’s media source (a CNN.com article, for example, is rendered larger than a report published on a local community site).
  • Node color reflects normalized document sentiment, ranging from red (negative) to grey (neutral) and green (positive). Sentiment is shown with variable saturation, depending on the degree of polarity – vivid colors indicate emotional articles, lower saturation a more factual coverage.

Three keywords per cluster are used as a label to describe its contents. The hull shapes of nodes and clusters are rendered with reduced opacity to decrease the visual load and increase the readability of cluster labels. These labels are extracted from the ordered list of all document keywords within the cluster, considering the reach of the documents’ sources.

Interactive Features

  • Hovering over a cluster hides its keywords and highlights its shape and nodes through higher opacity. Node colors within a highlighted cluster become more vivid.
  • Clicking on a cluster triggers a new search, narrowing down the set of results to documents within the selected cluster.
  • Hovering over a single node highlights this node with an orange stroke and shows a tooltip with document keywords and the favicon of the source.

Clustering. Users can choose between two different methods: K-means divides the collection of documents into a fixed amount of clusters. Each document belongs to the cluster with the nearest centroid. In contrast to K-means, agglomerative hierarchical clustering is deterministic (= a given set of documents always results in the same layout) and uses an iterative “bottom up” approach to pair clusters into a tree-like structure.

References

  1. Jain, A.K. (2010). Data Clustering: 50 Years Beyond K-means, Pattern Recognition Letters, 31(8): 651-666.
  2. Syed, K.A.A., Kröll, M., Sabol, V., Scharl, A. and Gindl, S. (2012). Incremental and Scalable Computation of Dynamic Topography Information Landscapes, Journal of Multimedia Processing and Technologies, 3(1): 49-65.