LEVERAGING UNSTRUCTURED DATA IN THE FIGHT AGAINST CANCER
Our work with annotators:
- Improved accuracy
- A cleaner system
Our work with the analysis of drug labels:
- Precision and recall for the extracted entities: approximately 95% and 75% (early results).
- The work on improving accuracy continues.
Behind the scenes
To enable the client to find and extract the information needed for their research in large volumes of unstructured data, Klarrio’s initial task was to assess an existing set of annotators to improve their accuracy. Klarrio had the added objective of making the maintenance of the annotators more efficient.
This approach was applied to assess and improve additional existing annotators as well as the development of new annotators.
Regarding the analysis of drug labels, Klarrio used spaCy, an open-source NLP library, to combine its capabilities of a machine-learning Named Entity Recognizer to detect the entities with deep sentence parsing and rule-based matching to detect the relations between the entities. The extracted entities and relationships are saved back into a relational database for further analysis by researchers and clinicians.