Methodology

Step 1

Identify and monitor novel online content for a certain topic

Based on a simple user-defined search query that defines the topic of interest (for example “industry 4.0”), the risk radar searches its field of sources for relevant content. The field of sources covers the largest online blogging platforms namely Wordpress and Blogspot. The blogs provide expert opinion on issues the blogging experts deem relevant, and the harvesting of such indication provides early signals for potentially upcoming issues and risks. Expert blogs are often more insightful and usually earlier than journalistic reports – and certainly they are much earlier than scientific journal articles. Every month, the risk radar retrieves and stores relevant blog-contents based on the user’s query, which can be narrowed or broadened using Boolean combinations of search queries (for instance “industry 4.0 AND energy”). The risk radar discriminates novel contents from already retrieved ones, and the user doesn’t need to go through “hits” already acknowledged earlier.

Step 2

Identify potential high impact trends - evaluating the retrieved online content to identify those texts and documents that have the highest potential for impact for the insurance industry with using an unsupervised, quantitative big data analysis

Methodologically, the first step is to construct a so-called term-document matrix for the retrieved documents. This can be seen as a network with two sets of nodes, where in one set the nodes correspond to documents and in the other set the nodes correspond to terms. A link in this network always goes from one of the documents to one of the terms and presence (absence) of the link indicates that the given document contains (does not contain) the term. The documents are ranked high in their potential for future imp act if they contain a large number of terms (have many links) or contain terms that are relevant for a large number of other documents. In order to classify reports according to their novelty we can simply compare the word bag for the given report with the known word bags of prior reports. An additional issue is that the aim is to retrieve those documents that have the largest potential impact for the insurance industries.

Step 3

Visualize recommendations - providing recommendations for future risk notions by identifying those topics within the high-impact documents that have the highest novelty value

The third module of the Risk Radar will provide recommendations for future risk notions based on the documents identified in the second module. To this end the topics of the potential high-impact documents will be extracted and ranked according to their novelty values. This can be done by comparing the frequency of a given term in the current month with its frequency in the past months. A strong increase between these two frequencies indicates substantial novelty value within the high-impact documents. We could equally assign a high novelty value to a term if not its frequency itself, but the words that co-occur in the texts together with the given term change. To extract topics from these terms we can again consider co-occurrences of terms in the sense that a topic can be described by a “bag of words” that are often used in texts about the given topic. The output of the recommendation system of the Risk Radar consists of topics that have, both, (i) potential for high future impact in particular with respect to insurance industries and (ii) are new issues in the sense that they have not been discussed in the given contexts before. These topics can then be visualized, for example, as tagclouds that show the terms driving the results, together with links to the resources (documents, blog posts) out of which the topics have been extracted. These will serve as a recommendation for future risk notions on a particular topic.

The contents of Methodology

Legend

Network View

Topics

Analytics

Keywords

Articles

Overview

Changes