O1-H Semantic Deep Learning for Electronic Health Records

PI: Thien Nguyen

We propose a novel “Semantic Deep Learning” method to analyze the electronic health records of real patients. Our previous work as successfully used a hypergraph- based approach in the clinical text notes from Stanford Hospital’s Clinical Data Warehouse (STRIDE). Previous experiments based on ontology (i.e., domain knowledge) annotated electronic health records show that hypergraph mining is successful in finding semantic (i.e., indirect) associations. This proposed method will take the success to the next level by adding the deep learning-based embedding in place of the basic hypergraphs of the previous approaches. The findings in this study will provide guidance to medical researchers for further investigations. We will create an ontology-based semantic system to combine rule/knowledge embedding and deep learning that better analyze EHRs. Our goal is to devise a novel method for generating biomedical knowledge embedding with Apache SystemML and improve the performance of information extraction systems (e.g., SystemT) in general. The learned embeddings can then be used to find related entities (i.e., closely related drugs that can act as substitutes in case a particular patient has medical complications for a prescribed drug) or entities that are linked by a relation (i.e., diseases which potentially can be treated with a particular drug). Another objective is to develop new entity linking algorithms for the biomedical domain, which is underexplored in the literature. With the rapid development of large pre-trained language models, we are pre-training a clinicalt5 model with different versions. They can also be used to in downstream applications in the medical domain, e.g., readmission prediction and mortality prediction.