Awarded by the Systems, Man, and Cybernetics Society2016-01-01
Ph.D., Computer Science
M.S., Electrical Engineering
Richmond Times-Dispatch print
“It is an unfulfilled need,” said Krys Cios, the chairman of Virginia Commonwealth University’s computer science department. “Everyone is looking for employees with computer science skills.”view more
Richmond Times-Dispatch print
“Everybody’s using computer science. We can’t live without computers,” Cios said. “This just creates more opportunities for students and gets them more prepared.”view more
Down syndrome (DS) is a chromosomal abnormality (trisomy of human chromosome 21) associated with intellectual disability and affecting approximately one in 1000 live births worldwide. The overexpression of genes encoded by the extra copy of a normal chromosome in DS is believed to be sufficient to perturb normal pathways and normal responses to stimulation, causing learning and memory deficits. In this work, we have designed a strategy based on the unsupervised clustering method, Self Organizing Maps (SOM), to identify biologically important differences in protein levels in mice exposed to context fear conditioning (CFC). We analyzed expression levels of 77 proteins obtained from normal genotype control mice and from their trisomic littermates (Ts65Dn) both with and without treatment with the drug memantine. Control mice learn successfully while the trisomic mice fail, unless they are first treated with the drug, which rescues their learning ability. The SOM approach identified reduced subsets of proteins predicted to make the most critical contributions to normal learning, to failed learning and rescued learning, and provides a visual representation of the data that allows the user to extract patterns that may underlie novel biological responses to the different kinds of learning and the response to memantine. Results suggest that the application of SOM to new experimental data sets of complex protein profiles can be used to identify common critical protein responses, which in turn may aid in identifying potentially more effective drug targets.
One-class learning algorithms are used in situations when training data are available only for one class, called target class. Data for other class(es), called outliers, are not available. One-class learning algorithms are used for detecting outliers, or novelty, in the data. The common approach in one-class learning is to use density estimation techniques or adapt standard classification algorithms to define a decision boundary that encompasses only the target data. In this paper, we introduce OneClass-DS learning algorithm that combines rule-based classification with greedy search algorithm based on density of features. Its performance is tested on 25 data sets and compared with eight other one-class algorithms; the results show that it performs on par with those algorithms.
Supervised discretization is one of basic data preprocessing techniques used in data mining. CAIM (class-attribute interdependence maximization) is a discretization algorithm of data for which the classes are known. However, new arising challenges such as the presence of unbalanced data sets, call for new algorithms capable of handling them, in addition to balanced data. This paper presents a new discretization algorithm named ur-CAIM, which improves on the CAIM algorithm in three important ways. First, it generates more flexible discretization schemes while producing a small number of intervals. Second, the quality of the intervals is improved based on the data classes distribution, which leads to better classification performance on balanced and, especially, unbalanced data. Third, the runtime of the algorithm is lower than CAIM’s. The algorithm has been designed free-parameter and it self-adapts to the problem complexity and the data class distribution. The ur-CAIM was compared with 9 well-known discretization methods on 28 balanced, and 70 unbalanced data sets. The results obtained were contrasted through non-parametric statistical tests, which show that our proposal outperforms CAIM and many of the other methods on both types of data but especially on unbalanced data, which is its significant advantage.