Professor, Mathematical Sciences and Computer Science & Associate Director, Institute for Data Exploration and Application (IDEA) | Mathematical Sciences
Troy, NY, UNITED STATES
Extracts information from data using novel predictive or descriptive mathematical models
Ph.D., Computer Sciences
Albany Business Review print
CDPHP is working with Kristin Bennett at Rensselaer Polytechnic Institute to use artificial intelligence to figure out which patients could benefit from more personalized care.view more
The Daily Gazette print
Collaboration with Kristin Bennett at Rensselaer Polytechnic Institute seeks to help policy holders with greatest needs.view more
Data Science Imposters
In this episode, Dr. Bennett takes us back to school and teaches us a few things about machine learning, artificial intelligence, data analytics, and visualization. Along the way, we discuss how to incorporate teaching of these topics in colleges and high schools and some of the moral issues that may arise with artificial intelligence.view more
Kristin P Bennett, Elisabeth M Brown, Hannah De los Santos, Matthew Poegel, Thomas R Kiehl, Evan W Patton, Spencer Norris, Sally Temple, John Erickson, Deborah L McGuinness, Nathan C Boles
Increased understanding of developmental disorders of the brain has shown that genetic mutations, environmental toxins and biological insults typically act during developmental windows of susceptibility. Identifying these vulnerable periods is a necessary and vital step for safeguarding women and their fetuses against disease causing agents during pregnancy and for developing timely interventions and treatments for neurodevelopmental disorders. We analyzed developmental time-course gene expression data derived from human pluripotent stem cells, with disease association, pathway, and protein interaction databases to identify windows of disease susceptibility during development and the time periods for productive interventions. The results are displayed as interactive Susceptibility Windows Ontological Transcriptome (SWOT) Clocks illustrating disease susceptibility over developmental time. Using this method, we determine the likely windows of susceptibility for multiple neurological disorders using known disease associated genes and genes derived from RNA-sequencing studies including autism spectrum disorder, schizophrenia, and Zika virus induced microcephaly. SWOT clocks provide a valuable tool for integrating data from multiple databases in a developmental context with data generated from next-generation sequencing to help identify windows of susceptibility.
Borja Seijo-Pardo, Amparo Alonso-Betanzos, Kristin P Bennett, Verónica Bolón-Canedo, Julie Josse, Mehreen Saeed, Isabelle Guyon
Feature selection is of great importance for two possible scenarios: (1) prediction, i.e., improving (or minimally degrading) the predictions of a target variable while discarding redundant or uninformative features and (2) discovery, i.e., identifying features that are truly dependent on the target and may be genuine causes to be determined in experimental verifications (for example for the task of drug target discovery in genomics). In both cases, if variables have a large number of missing values, imputing them may lead to false positives; features that are not associated with the target become dependent as a result of imputation. In the first scenario, this may not harm prediction, but in the second one, it will erroneously select irrelevant features. In this paper, we study the risk/benefit trade-off of missing value imputation in the context of feature selection, using causal graphs to characterize when structural bias arises. Our aim is also to investigate situations in which imputing missing values may be beneficial to reduce false negatives, a situation that might arise when there is a dependency between feature and target, but the dependency is below the significance level when only complete cases are considered. However, the benefits of reducing false negatives must be balanced against the increased number of false positives. In the case of binary target variable and continuous features, the t-test is often used for univariate feature selection. In this paper, we also introduce a de-biased version of the t-test allowing us to reap the benefits of imputation, while not incurring the penalty of increasing the number of false positives.
Alexander New, Kristin P. Bennett
We consider the problem in precision health of grouping people into subpopulations based on their degree of vulnerability to a risk factor. These subpopulations cannot be discovered with traditional clustering techniques because their quality is evaluated with a supervised metric: the ease of modeling a response variable for observations within them. Instead, we apply the more appropriate supervised cadre model (SCM). We extend the SCM formalism so that it may be applied to multivariate regression and binary classification problems and develop a way to use conditional entropy to assess the confidence in the process by which a subject is assigned their cadre. Using the SCM, we generalize the environment-wide association study (EWAS) to be able to model heterogeneity in population risk. In our EWAS, we consider more than two hundred environmental exposure factors and find their association with diastolic blood pressure, systolic blood pressure, and hypertension. This requires adapting the SCM to be applicable to data generated by a complex survey design. After correcting for false positives, we found 25 exposure variables that had a significant association with at least one of our response variables. Eight of these were significant for a discovered subpopulation but not for the overall population. Some of these associations have been identified by previous researchers, while others appear to be novel. We examine discovered subpopulations in detail, finding that they are interpretable and suggestive of further research questions.view more