Abstract:
The functional magnetic resonance imaging (fMRI) technique is widely used in studying
human brain functions. It measures brain activities both spatially and temporally. The past
decade has witnessed a growing interest in the fMRI community in constructing accurate
predictive models. Though achieving high prediction accuracy is crucial in building strong
diagnostic models (e.g. for brain functional disorders), information mapping or model
interpretability is critical in advancing a fundamental understanding of brain functions.
Recently, two notable multivariate methods, recursive feature elimination using support
vector machine (RFESVM) and logistic regression with an elastic net penalty (LREN),
have been applied to meet the challenge of simultaneous classification and mapping of fMRI
data patterns. However, both methods have limitations. First, they suffer from the curse
of dimensionality by solving classification and feature selection tasks directly in the whole
feature space. Second, feature selections in both methods critically depend on sampled
values of tuning parameters and they lack of a control over false selections.
In this dissertation, I seek to address both limitations within a random subspace framework.
The random subspace method is an effective approach to lessen the curse of dimensionality
and explore data patterns from different local perspectives. It has been separately
applied in classifications and feature selections with success. But no previous studies have
attempted to integrate them together, possibly because of the high computational cost
involved in using a double-loop cross validation scheme to avoid the parameter selection bias.
In chapter 2, I seek to solve the methodological issues. I first extend an efficient method,
only using a single K-fold cross validation procedure, to alleviate the parameter selection
bias of an ensemble classifier formed by the random subspace method. The extension
allows independent tuning of base classifiers, making feature selection more adaptive to the
local data structure. I then integrate a random probe method into the random subspace
framework to control false selections. A threshold is derived based on the distribution of
scores of permuted artificial variables.
In chapter 3 and 4, using extensive simulations, I empirically evaluate the developed
random subspace framework with applications to LREN and RFESVM, respectively. I find
that (1) the developed random subspace framework can boost performance of both LREN
and RFESVM in classifications and feature selections; (2) the random probe method can
effectively control false selection rates; (3) the proposed novel feature scoring method is
capable of ranking informative features based on their individual discriminative capacities;
(4) the random subspace framework is able to correctly determine informative features’
discrimination directions.