Classify Videos Using Deep Learning Matlab
Ok based on matlab samples. I wanted to classify video sequences but got error of out of memory. So i combined two examples one works on videos and extract features and saves them to disk. And other create batches of features from the trained network.
Compute and Store futures
Since i got memory limit. I took max every 4th future of 1024.
netCNN = googlenet;
Take google net to view network we use
analyzeNetwork(netCNN)
we take layer 167 named layerName = “pool5–7x7_s1” we has activations size 1024
List all files use :
[files,labels] = hmdb51Files(dataFolder);
Randomization
numObservations = numel(labels);
N = floor(0.9 * numObservations);
idx = randperm(numObservations);
idxTrain = idx(1:N);
labelsTrain = labels(idxTrain);
idxValidation = idx(N+1:end);
labelsValidation = labels(idxValidation);
Its better to store this groups so we keep groups.
Loop Videos
decim function taken from file exchange
Its reduce memory use by decimation from 1024 to 128 for each frame.
https://in.mathworks.com/matlabcentral/fileexchange/54099-fast-data-decimation?s_tid=srchtitle
set frame limit of video length frame_limit=400;
for each video :(look carfule where we clear memory)
video = readVideo(files(i),frame_limit);
if(isempty(video))
continue;
end
video = centerCrop(video,inputSize);% 224 224 3 nframes
sequences = activations(netCNN,video,layerName);%1*1*1024*nf
clear video;
Xorg=squeeze(sequences);%1024*nf
X=[];
for m=1:size(Xorg,2)
xt=decim(Xorg(:,m).’,8,’max’); % here we reduce vector size by take maximal
X(:,m)=xt;
end
foldTrainCur = fullfile(trainFolder,string(labels(i)));
if(~exist(foldTrainCur,’dir’))
mkdir(foldTrainCur);
end
foldValidCur = fullfile(validationFolder,string(labels(i)));
if(~exist(foldValidCur,’dir’))
mkdir(foldValidCur);
end
if(ismember(i,idxTrain))
save(fullfile( foldTrainCur,num2str(i)),”X”,”-v7.3");
else
save(fullfile(foldValidCur,num2str(i)),”X”,”-v7.3");
end
clear X;
clear Xorg;
clear sequences;
end
LSTM Part
Now that already stored vector of feature for each video built LSTM network.
numFeatures = size(X,1); % 128
numClasses = numel(categories(labelsTrain));%13
layers = [
sequenceInputLayer(numFeatures,’Name’,’sequence’)
bilstmLayer(2000,’OutputMode’,’last’,’Name’,’bilstm’)
dropoutLayer(0.5,’Name’,’drop’)
fullyConnectedLayer(numClasses,’Name’,’fc’)
softmaxLayer(‘Name’,’softmax’)
classificationLayer(‘Name’,’classification’)];
Training
The fun part ! lets do it yes we can !
options = trainingOptions(‘adam’, …
‘ExecutionEnvironment’,’gpu’, …
‘MaxEpochs’,15, …
‘MiniBatchSize’,miniBatchSize, …
‘GradientThreshold’,1, …
‘Verbose’,0, …
‘Plots’,’training-progress’);
I used gpu since my cpu was stuck off course learn rate can be modified so it will change during training .
Depends on you to set batch size sincei got memory limits ..
miniBatchSize=16
dsTrain = sequenceDatastore(trainFolder);
dsTrain.MiniBatchSize = miniBatchSize;
net = trainNetwork(dsTrain,layers,options);
Wasn't the best but still some advance during epoch and many fluctuations probably due to step size/ batch size small. if step size will reduce during epochs probably it will get more stable.
The test :
dsTest = sequenceDatastore(validationFolder);
dsTest.MiniBatchSize = miniBatchSize;
Test accuracy
YPred = classify(net,dsTest,’MiniBatchSize’,miniBatchSize);
YTest = dsTest.Labels;
acc = sum(YPred == YTest)./numel(YTest) %0.61
[m,order] = confusionmat(YTest,YPred)
figure
cm = confusionchart(m,order);
Train accuracy
YPred = classify(net,dsTrain,’MiniBatchSize’,miniBatchSize);
YTrain = dsTrain.Labels;
acc = sum(YPred == YTrain)./numel(YTrain) %,0.62
[m,order] = confusionmat(YTrain,YPred)
figure
cm = confusionchart(m,order);
Summery
Here presented approach of video sequence classification using matlab with GPU.
features computed use googlenet(inception) they reduced from 1024 to 128 due to memory issues. the n feed to LSTM network.
The results for both train and test was about 61%. which wasn't great I'm sure if i had more memory and use more features from the CNN network the results was better. we can add in LSTM more epoch and change learn rate during training.