AI ENGINE for Text Research

AI engine for  text understanding.

Our algorithms for text similarity, tabular data extraction, domain-specific entity representation learning and entity disambiguation and linking measure up to the best in the world.

On top of that we build a comprehensive knowledge graph containing all entities and their linkages to allow humans to learn from it, use it and give feedback to the system. Applying these on scientific and technical text is a complicated challenge few others can achieve.

Content based search
Bypass keywords and explore interdisciplinary with a content-based recommendation engine.

Wherein, your starting point is giving the machine a text – either your own self-written description of the problem you are attempting to map out, or another research paper or document. The tool will identify the most meaning bearing words in your text, enrich it with contextual synonyms and hypernyms, and turn all this into a “fingerprint” which will be matched with the fingerprint of every paper – federated across all sources you have selected.

Context filtering

Filter down a document set based on criteria one can explain in a sentence, but not put into one keyword.

One can write their own context descriptions of ≈ 50-100 words, which is matched against every article in your content list. One can add as many words as they desire and use them either for inclusion or exclusion. Think of a Venn diagram of contexts applied to the reading list, allowing one to rapidly filter down.

Data filtering

Filter based on information extracted from documents like specific entities, data points or data ranges.

Whether one needs to filter on a Named Entity, a specific Data point or Data range, the advanced filtering extracts and identifies the exact information from the documents and one can then use the identified variables in the articles to filter down the list.

Extracting and systematizing data

Automatically extract and systematize any key data points from text and tables into a table layout of your own design.

Manually extracting – and linking – the data from a PDF of free text, tables, graphs, figures, and a plethora of layouts requires major effort from highly skilled manual labour. The Extract tool fetches and links all the key data from these documents into a tabular, machine readable, systematic format. A full month of data extraction work can be done in minutes, at 90% accuracy.

Document set analysis

Analyze a large set of documents to form an overview of the content – and decide what to include and exclude.

Summarization

One can ask the machine to summarize single or multiple documents to get a quick overview, or kickstart writing. The AI Engine comes with a configurable summarization module. It can rapidly produce summaries of multiple abstracts, one full text or multiple full text documents. These summaries are great for either rapidly reviewing larger document sets of similar documents – or to kickstart scientific writing.

After analysing a document set – from just a few handfuls of documents up to 20,000 is supported – one will be presented with a variety of results; like Topic groups of the literature list, both from a global topic (what topics do these articles fall within from an overall scientific level) as well as a specific topic (within this reading list, what topics do the articles fall within). As one article can be part of multiple topic groups, this is a helpful way to select groups for inclusion and exclusion without missing any relevant documents.

One will also be able to explore both the most meaning-bearing words of the document set, the rare words which may carry special meaning in this context – and all their related synonyms.

The Summary tool does abstractive summarization – meaning the tool writes its own summary, as opposed to copy-pasting sentences together (called Extractive Summarization). This means the summary flows quite well and contains the most important bits the document(s) have in common. The summaries can also be configured – is it a short, two-sentence summary of 10 documents you are after, or a one-page summary of a 20-page documen

Monitoring and alerts

Set the system to rerun searches, filters, and extractions on your content on a regular basis.

When the initial document set is from one or multiple proxy datasets (i.e., “live” sources) – such as an internal repository, PubMed, USPTO – each search, filtering, analysis, and even extraction one can be set up to monitor the proxy for new documents fitting the desired criteria.

This goes far beyond subscribing to a topic or keyword in a search engine. Here one can automate the entire process to run at regular intervals and receive a notification for example when the clinical data from a new article exactly fitting the inclusion/exclusion criteria. For any field requiring ongoing reviews – e.g., post-market surveillance and regulatory and safety reviews – this is a game-changer.