SnomedTranslate.nl: a high-quality clinical notes translation engine based on Transformers
by François Remy, Peter De Jaeger and Kris Demuynck (Ghent University)
It is estimated that up to 80% of the relevant information in electronic health records (EHR) can only be found in the unstructured text typed by clinicians, despite the introduction of many structured-information sources over the years. Because of that, a wide variety of models and datasets have been developed to take advantage of these notes in different ways, with examples including entity recognition, natural language inference and question answering. Unfortunately, most resources usable for training remain available in English only. There are therefore few medical models of note in other languages.
A possible solution is to translate letters from Dutch to English. But, so far, commercially available translation pipelines have not yet met the required quality standard to make EHR translation possible in a way that is compatible with the strict requirements of the medical domain. Patrick Devies noted in 2014 that “Google Translate has only 57.7% accuracy when used for medical phrase translations and should not be trusted for important medical communications”.
In this presentation, we will explain how we tackled the various challenges of Dutch clinical translation. We then showcase our results by looking at multiple quality metrics, including downstream tasks such as clinical term extraction. We also show using ablation studies the impact of our data sourcing and the sentence augmentation applied on a concept translation dictionary.
-