A Comparative Study of Fuzzy Topic Models and LDA for Interpretability in Text Classification of Clinical Notes
by Emil Rijcken
Inpatient violence at psychiatry departments is a common and severe problem. Typical adverse reactions that the professionals who are victims of violent behaviour face include emotional reactions, symptoms of post-traumatic stress disorder, and a negative impact on work functioning. It is therefore important to assess the violent behaviour risk of a patient and take precautionary measures.
The psychiatry department of the Utrecht Medical Center Utrecht uses questionnaires to predict the likelihood of patients becoming violent. Filling out these forms is time-consuming and partly subjective. Instead, automated machine-learning approaches based on existing patient information could overcome the time burden and help make more objective predictions. Various automated approaches utilizing clinical notes in the electronic health record allow more accurate predictions than questionnaires (Menger et al., 2018), (Mosteiro et al., 2021). However, an intuitive understanding of these models’ inner workings is missing as the clinical notes are represented numerically by highly dimensional dense matrices with unknown semantic meaning.
A more intuitive and potentially interpretable approach is topic modeling, where clinical notes can be represented as a collection of topics. To do so, a topic model is trained on all the written notes (average note length: 1481 words) to find k topics. Each topic consists of the n most likely words associated with that topic and weights for each word. After training the topic model, all the documents associated with one patient can be represented by a k-length vector. The assumption is that if the generated topics are well interpretable by humans, the model’s decision making may be more explainable.
With this goal in mind, we propose two new fuzzy topic models; FLSA-W and FLSA-V. Both models are derived from the topic model Fuzzy Latent Semantic Analysis (FLSA). After training each model ten times, we use the mean coherence score to compare the different models with the benchmark models Latent Dirichlet Allocation (LDA) and FLSA. Our proposed models generally lead to higher coherence scores and lower standard deviations than the benchmark models. These proposed models are specifically useful as topic embeddings in text classification since the coherence scores do not drop for a high number of topics, as opposed to the decay that occurs with LDA and FLSA.
This research is part of the COVIDA (COmputing Visits Data) consortium.
-