The Scalable Computing Software Laboratory (SCS) at Illinois Tech is seeking talented and self-motivated students to work on an interdisciplinary project about biological science and data analytics. This project is in partnership with the Department of Food Science and Nutrition. The positions are intended to be semester-based and may continue into the following semester. Applicants should have the ability and curiosity to learn various modeling methods/ML techniques. Experience with Python is highly preferred. Selected applicants may optionally have the opportunity to publish research papers and travel to workshops and conferences.
Project Summary
Dietary habits have a significant impact on overall health. A broad understanding of different diets' impacts on metabolic health can reduce disease and dietary disorders. Various datasets have been collected over the past few decades showing the impact of particular food types on human metabolic responses. However, these datasets have all been analyzed in isolation using manual statistical methods. This approach results in a mass volume of small studies, which have narrow, simplistic results. It is unreasonable to expect a dietitian to know and fully utilize all such studies that exist. By combining these datasets and using machine learning techniques for analysis, new trends that have previously evaded manual methods can be identified. This information can then be used to agglomerate the wealth of information provided in these studies into a food recommendation system, which can suggest diets automatically based on initial readings of metabolic and demographic data.
Preliminary experiments were done using Random Forest Regressor in combination with Recursive Feature Elimination. The machine learning model was able to predict the key metabolic features level (Insulin, glucose, etc.), with an accuracy score above 0.8.
Project Challenges
There are two primary challenges: preparing well-structured datasets and producing accurate, unbiased models of the combined dataset.
Datasets for machine learning should be well structured. The raw experimental datasets have been collected by various researchers that serve different research purposes. A good dataset should be clean with the correct data types and correct data values. Additionally, different studies do not always contain the same core measurements, and forms of data imputation are required.
After data pre-processing, the next challenge is to model the dataset. Even after combining the datasets, the overall amount of data for a particular training target might be small when compared to other ML problems. We must identify and train models to ensure there is little bias.
Essential job functions and key responsibilities
Minimum requirements
Knowledge, Skills, and Abilities required
Knowledge, Skills, and Abilities preferred