Mason Archival Repository Service

A Study of Administrative Data Representation for Machine Learning

Show simple item record

dc.contributor.advisor Wojtusiak, Janusz Asadzadehzanjani, Negin
dc.creator Asadzadehzanjani, Negin 2022-08-03T20:18:38Z 2022-08-03T20:18:38Z 2022
dc.description.abstract Administrative data, including medical claims, are frequently used to train machine learning-based models used for predicting patient outcomes. Despite many efforts in using administrative codes (medical codes) in claims data, little systematic work has been done in understanding how the codes in such data should be represented before model construction. Traditionally, the presence/absence of these codes representing diagnoses or procedures (Binary Representation) over a fixed period (typically one year) is used. More recently, some studies included temporal information into data representation, such as counting, calculating time from diagnosis, and using multiple time windows. However, these methods were not able to comprehensively capture temporal information in data and much of temporal information such as the exact time of the occurrence of an event, and the exact sequence of an event are missed. This dissertation presents the results of development and investigation of two additional methods of administrative data representation (Temporal Min-Max and Trajectory Representation) specific to diagnoses extracted from claims data before applying machine learning algorithms. It then presents a large-scale experimental evaluation of these methods by comparing them with traditional Binary Representation using four classification problems: one-year mortality prediction and high utilization of medical services prediction, prediction of chronic kidney disease and prediction of congestive heart failure. It was shown that the optimal way of representing the data is problem-dependent, thus optimization of representation parameters is required as part of the modeling.
dc.format.extent 188 pages
dc.language.iso en
dc.rights Copyright 2022 Negin Asadzadehzanjani
dc.subject Public health
dc.subject Artificial intelligence
dc.subject Health sciences
dc.subject Data Preprocessing
dc.subject Health Informatics
dc.subject Medical Claims
dc.subject Supervised Learning
dc.subject Temporal Machine Learning
dc.title A Study of Administrative Data Representation for Machine Learning
dc.type Dissertation Ph.D. in Health Services Research Ph.D. Health Services Research George Mason University

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


My Account