Registered users can use the topic management sidebar to create and edit topics, combine multiple topics into complex queries, or set customized email alerts. On mouse over, the gear icon to activate the floating menu replaces the frequency count of the right column.
Defining and Accessing Topics
Users have access to the full set of advanced search options when defining or revising a topic. This includes the ability to filter by metadata elements, and to use logical operators (AND, OR, AND NOT) to combine various filters and restrictions. Even complex queries to search in the full text, title or URL of documents can be stored, revised and later accessed at the click of a button.
It is possible to exclude certain aspects of recent coverage – e.g., restrict queries to one or more countries, or select content from a specific set of Web sites. Clicking on a topic name triggers a search for the stored query, clicking on the small rectangular marker activates (or deactivates) the topic in the trend chart. All matching documents are included in the list of search results, and used for computing various charts and metrics. The topic label (i.e., the name displayed in the topic management section) itself is not considered in the matching process.
Dealing with Ambiguous Topics
For ad-hoc queries, simple text fields typically suffice. Defining and disambiguating topics, however, often requires a larger number of terms. To properly describe abstract concepts like “climate change” or popular but ambiguous brand names such as “Amazon”, “Apple”, “Gap” and “Three”, one needs to consider synonyms, singular and plural versions of a term, grammatical variations, lists of related products and services, etc.
The built-in editor allows users to define and manage such lists. It expects one word or phrase per line. There is no need to use quotation marks to mark phrases such as big data. The column on the right shows the number of documents matching the query defined by this particular line – considering the currently selected content source(s) and time interval. The lines can be sorted alphabetically, or by the number of matching documents.
Each line can contain (i) a single word, (ii) a phrase, or (iii) a regular expression (RegExp) that supports optional wildcards for defining queries more effectively. The Topic Editor uses the following simplified RegExp notation:
- Question marks instruct the system to treat the preceding token as optional; ‘networks?’, for example, considers both the singular (‘network’) as well as the plural (‘networks’) of the term.
- Brackets support the grouping of tokens. While ‘networks?’ is identical to ‘network(s)?’, brackets are mandatory to mark more than one character as optional; e.g. ‘network(ed)?’ > ‘network’, ‘networked’.
- Vertical bars ‘|‘ represent an ”or” operator, considering the document whenever one or more of its operands match; e.g. ‘network(s|ed)?’ > ‘network’, ‘networks’, ‘networked’.
On mouse-over, the editor provides an expand/shrink option to preview a list of all phrases matching the regular expression. The gear icon opens a tooltip to access the Topic Wizard, a visual editor for regular expressions. If the line is not in a valid RegExp format supported by the editor, it will show a brief help text and disable the wizard for this particular line.
Below the text input lines, users can (ii) activate on-the-fly negation detection with standard prefix sets for different parts of speech, and (ii) specify the minimal number of RegExp lines that a document must match to be included in the search results. This can improve the precision of the query at the cost of lower recall, especially for terms that are ambiguous without additional context information.
Both the regular expression list and the expanded term list can be exported as a comma-separated values (CSV) or plain text file. To import existing term lists, users can copy/paste text into the editor, which automatically creates the required number of lines.