Naive bayes ranking of variables (using CV accuracy criterion) syntax: [ bestVariables bestToWorst accuracy ] = sortVariablesNB( featureVect, classLabels, ... numSamplesPerSubj, topVarsToKeep ) Inputs: featureVect: all the the data samples in (dim x numSamples) classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure. The ones labeled class 2 will not be used. numSamplesPerSubj: featureVect assumed to be such that each subject has some number of samples (specified by each entry of numSamplesPerSubj), and they are grouped consecutively. This paramter is needed to do the leave 1 subject out cross validation. topVarsToKeep: index of number of best variables to return Outputs: bestVariables: indices of top variables to separate the classes bestToWorst: index ordering all the variables (not just the top) accuracy: associated number correct for those indices
0001 % Naive bayes ranking of variables (using CV accuracy criterion) 0002 % 0003 % syntax: [ bestVariables bestToWorst accuracy ] = sortVariablesNB( featureVect, classLabels, ... 0004 % numSamplesPerSubj, topVarsToKeep ) 0005 % 0006 % Inputs: 0007 % featureVect: all the the data samples in (dim x numSamples) 0008 % classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure. 0009 % The ones labeled class 2 will not be used. 0010 % numSamplesPerSubj: featureVect assumed to be such that each subject has 0011 % some number of samples (specified by each entry of numSamplesPerSubj), 0012 % and they are grouped consecutively. This paramter is needed to do 0013 % the leave 1 subject out cross validation. 0014 % topVarsToKeep: index of number of best variables to return 0015 % 0016 % Outputs: 0017 % bestVariables: indices of top variables to separate the classes 0018 % bestToWorst: index ordering all the variables (not just the top) 0019 % accuracy: associated number correct for those indices 0020 % 0021 0022 function [ bestVariables, bestToWorst, accuracy ] = sortVariablesNB( featureVect, classLabels, ... 0023 numSamplesPerSubj, topVarsToKeep ) 0024 0025 if nargin < 4 || isempty( topVarsToKeep) 0026 topVarsToKeep = 10; 0027 end 0028 0029 % leave one subject out cross validation 0030 [ dim, numSamples] = size( featureVect); 0031 expLabels = getLeave1OutLabels( numSamples, numSamplesPerSubj); 0032 numTrials = length(expLabels); 0033 accuracy = zeros( dim,1); %,numTrials); 0034 0035 % center and scale variables to unit variance 0036 featureVect = featureVect - repmat( mean(featureVect,2), [1,numSamples] ); 0037 featStdev = std( featureVect, 0, 2); 0038 featureVect( featStdev ~= 0,:) = featureVect( featStdev ~= 0,:)./repmat(featStdev(featStdev ~= 0), [1,numSamples]); 0039 0040 for i1 = 1:numTrials 0041 0042 trainLabels = classLabels(:,expLabels(i1).train); 0043 trainFeatures = featureVect(:,expLabels(i1).train); 0044 trainFeatures( :, trainLabels==2) = []; 0045 trainLabels( :, trainLabels==2) = []; 0046 0047 testLabels = classLabels(:,expLabels(i1).test); 0048 testFeatures = featureVect(:,expLabels(i1).test); 0049 testFeatures( :, testLabels==2) = []; 0050 testLabels( :, testLabels==2) = []; 0051 0052 % do the tests on each feature 0053 for i2 = 1:dim 0054 if var( trainFeatures(i2,:)) < 1e-5 || ... 0055 var( trainFeatures(i2,trainLabels==0))< 1e-5 || ... 0056 var( trainFeatures(i2,trainLabels==1))< 1e-5 , 0057 else 0058 nb = NaiveBayes.fit(trainFeatures(i2,:)', trainLabels', 'Prior', 'uniform'); 0059 estimate = nb.predict(testFeatures(i2,:)'); 0060 accuracy(i2) = accuracy(i2) + sum( estimate == testLabels'); %/length(testLabels); 0061 end 0062 end 0063 % training is an N-by-D numeric matrix of training data. Rows of training 0064 % correspond to observations; columns correspond to features. class is a 0065 % classing variable for training (see Grouped Data) taking K distinct levels. 0066 % Each element of class defines which class the corresponding row of training 0067 % belongs to. training and class must have the same number of rows. 0068 0069 % 'Prior' – The prior probabilities for the classes, specified as one of the following: 0070 % 'empirical' (default) fit estimates the prior probabilities from the relative frequencies of the classes in training. 0071 0072 end 0073 0074 % sort variables in order best to worst 0075 [ accuracy bestToWorst] = sort( accuracy, 1,'descend'); 0076 0077 % remove redundancies 0078 unqIdx = findRedundancies( featureVect( bestToWorst,:) ); 0079 bestToWorst = bestToWorst(unqIdx); 0080 0081 bestVariables = bestToWorst(1:min(topVarsToKeep, length(bestToWorst) ));