Abstract

Evaluation of SURUS: a Named Entity Recognition Model for Unstructured Text of Interventional Trials

by Casper Peeters, Koen Vijverberg, Tobias van Rossum, Jesús García Bascuñan, Marianne Pouwer (Medstone Holding BV), and Suzan Verberne (Leiden University)

The processes of systematic literature review (SLR) and meta-analysis are crucial for medical decision-making. These processes involve a systematic selection of all medical evidence from trials and studies relevant to a specific research question. Selection and extraction of study elements are manual, labour-intensive processes, which may involve screening of >3000 scientific abstracts. Natural language processing can assist researchers in literature selection and data extraction.
We present SURUS, a BERT-based model which can identify and extract study design and Population, Intervention, Comparison and Outcome (PICO) and other elements from trial publication records. Through automized identification of PICO elements, we speed up the systematic screening process. Because SURUS recognizes important study elements in context, the number of articles included in manual screening is dramatically reduced.
SURUS is a state-of-the-art, BERT-based named entity recognition model. It is fine-tuned using a high-quality training set of >800 scientific abstracts describing pharmaceutical interventions for 7 different disease indications, which was manually annotated by experts in the pharmaceutical field. The model was validated, using a 10% sample of the dataset, for recognition of 26 different entity categories, relevant to PICO and study design extraction.
Overall, SURUS achieves an average F1 score of 0.926 (±0.0016) after training with 90% and validating on 10% of the dataset. We are currently in the process of evaluating SURUS for the task of literature screening by reviewers performing a SLR. The first results indicate that SURUS greatly improves the efficiency of the task, saving valuable time.

<< back

-