Home > src > sortVariablesNB.m

sortVariablesNB

PURPOSE ^

Naive bayes ranking of variables (using CV accuracy criterion)

SYNOPSIS ^

function [ bestVariables, bestToWorst, accuracy ] = sortVariablesNB( featureVect, classLabels,numSamplesPerSubj, topVarsToKeep )

DESCRIPTION ^

 Naive bayes ranking of variables (using CV accuracy criterion)
 
 syntax: [ bestVariables bestToWorst accuracy ] = sortVariablesNB( featureVect, classLabels, ...
                 numSamplesPerSubj, topVarsToKeep )
 
 Inputs:
   featureVect: all the the data samples in (dim x numSamples)
   classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure.  
      The ones labeled class 2 will not be used.
   numSamplesPerSubj: featureVect assumed to be such that each subject has
      some number of samples (specified by each entry of numSamplesPerSubj),
      and they are grouped consecutively. This paramter is needed to do 
      the leave 1 subject out cross  validation.
   topVarsToKeep: index of number of best variables to return
 
 Outputs:
   bestVariables: indices of top variables to separate the classes
   bestToWorst: index ordering all the variables (not just the top)
   accuracy:  associated number correct for those indices

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 % Naive bayes ranking of variables (using CV accuracy criterion)
0002 %
0003 % syntax: [ bestVariables bestToWorst accuracy ] = sortVariablesNB( featureVect, classLabels, ...
0004 %                 numSamplesPerSubj, topVarsToKeep )
0005 %
0006 % Inputs:
0007 %   featureVect: all the the data samples in (dim x numSamples)
0008 %   classLabels: all class labels (0 for not learned, 1 for learned, 2 unsure.
0009 %      The ones labeled class 2 will not be used.
0010 %   numSamplesPerSubj: featureVect assumed to be such that each subject has
0011 %      some number of samples (specified by each entry of numSamplesPerSubj),
0012 %      and they are grouped consecutively. This paramter is needed to do
0013 %      the leave 1 subject out cross  validation.
0014 %   topVarsToKeep: index of number of best variables to return
0015 %
0016 % Outputs:
0017 %   bestVariables: indices of top variables to separate the classes
0018 %   bestToWorst: index ordering all the variables (not just the top)
0019 %   accuracy:  associated number correct for those indices
0020 %
0021 
0022 function [ bestVariables, bestToWorst, accuracy ] = sortVariablesNB( featureVect, classLabels, ...
0023                 numSamplesPerSubj, topVarsToKeep )
0024 
0025 if nargin < 4 || isempty( topVarsToKeep)
0026     topVarsToKeep = 10;
0027 end
0028 
0029 % leave one subject out cross validation
0030 [ dim, numSamples] = size( featureVect);
0031 expLabels = getLeave1OutLabels( numSamples, numSamplesPerSubj);
0032 numTrials = length(expLabels);
0033 accuracy = zeros( dim,1); %,numTrials);
0034 
0035 % center and scale variables to unit variance
0036 featureVect = featureVect - repmat( mean(featureVect,2), [1,numSamples] );
0037 featStdev = std( featureVect, 0, 2);
0038 featureVect( featStdev ~= 0,:) = featureVect( featStdev ~= 0,:)./repmat(featStdev(featStdev ~= 0), [1,numSamples]);
0039 
0040 for i1 = 1:numTrials 
0041     
0042     trainLabels = classLabels(:,expLabels(i1).train);
0043     trainFeatures = featureVect(:,expLabels(i1).train);
0044     trainFeatures( :, trainLabels==2) = [];
0045     trainLabels( :, trainLabels==2) = [];
0046     
0047     testLabels = classLabels(:,expLabels(i1).test);
0048     testFeatures = featureVect(:,expLabels(i1).test);
0049     testFeatures( :, testLabels==2) = [];
0050     testLabels( :, testLabels==2) = [];
0051     
0052     % do the tests on each feature
0053     for i2 = 1:dim
0054         if var( trainFeatures(i2,:)) < 1e-5 || ...
0055             var( trainFeatures(i2,trainLabels==0))< 1e-5 || ...
0056             var( trainFeatures(i2,trainLabels==1))< 1e-5 ,        
0057         else
0058             nb = NaiveBayes.fit(trainFeatures(i2,:)', trainLabels', 'Prior', 'uniform');
0059             estimate = nb.predict(testFeatures(i2,:)');          
0060             accuracy(i2) = accuracy(i2) + sum( estimate == testLabels'); %/length(testLabels);
0061         end
0062     end
0063 %      training is an N-by-D numeric matrix of training data. Rows of training
0064 %      correspond to observations; columns correspond to features. class is a
0065 %      classing variable for training (see Grouped Data) taking K distinct levels.
0066 %      Each element of class defines which class the corresponding row of training
0067 %      belongs to. training and class must have the same number of rows.
0068 
0069 % 'Prior' – The prior probabilities for the classes, specified as one of the following:
0070 % 'empirical' (default)    fit estimates the prior probabilities from the relative frequencies of the classes in training.
0071 
0072 end
0073 
0074 % sort variables in order best to worst
0075 [ accuracy bestToWorst] = sort( accuracy, 1,'descend');
0076 
0077 % remove redundancies
0078 unqIdx = findRedundancies( featureVect( bestToWorst,:) );
0079 bestToWorst = bestToWorst(unqIdx);
0080 
0081 bestVariables = bestToWorst(1:min(topVarsToKeep, length(bestToWorst) ));

Generated on Wed 20-Jan-2016 11:50:43 by m2html © 2005