Classify Videos Using Deep Learning Matlab

Michael Sheinfeild
4 min readFeb 14, 2021

--

Ok based on matlab samples. I wanted to classify video sequences but got error of out of memory. So i combined two examples one works on videos and extract features and saves them to disk. And other create batches of features from the trained network.

brush hair label
cartwheel label

Compute and Store futures

Since i got memory limit. I took max every 4th future of 1024.

netCNN = googlenet;

Take google net to view network we use

analyzeNetwork(netCNN)

Part of google net

we take layer 167 named layerName = “pool5–7x7_s1” we has activations size 1024

pool5–7x7_s1

List all files use :

[files,labels] = hmdb51Files(dataFolder);

Labels

Randomization

numObservations = numel(labels);

N = floor(0.9 * numObservations);

idx = randperm(numObservations);

idxTrain = idx(1:N);

labelsTrain = labels(idxTrain);

idxValidation = idx(N+1:end);

labelsValidation = labels(idxValidation);

Its better to store this groups so we keep groups.

Loop Videos

decim function taken from file exchange

Its reduce memory use by decimation from 1024 to 128 for each frame.

https://in.mathworks.com/matlabcentral/fileexchange/54099-fast-data-decimation?s_tid=srchtitle

set frame limit of video length frame_limit=400;

for each video :(look carfule where we clear memory)

video = readVideo(files(i),frame_limit);

if(isempty(video))

continue;

end

video = centerCrop(video,inputSize);% 224 224 3 nframes

sequences = activations(netCNN,video,layerName);%1*1*1024*nf

clear video;

Xorg=squeeze(sequences);%1024*nf

X=[];

for m=1:size(Xorg,2)

xt=decim(Xorg(:,m).’,8,’max’); % here we reduce vector size by take maximal

X(:,m)=xt;

end

foldTrainCur = fullfile(trainFolder,string(labels(i)));

if(~exist(foldTrainCur,’dir’))

mkdir(foldTrainCur);

end

foldValidCur = fullfile(validationFolder,string(labels(i)));

if(~exist(foldValidCur,’dir’))

mkdir(foldValidCur);

end

if(ismember(i,idxTrain))

save(fullfile( foldTrainCur,num2str(i)),”X”,”-v7.3");

else

save(fullfile(foldValidCur,num2str(i)),”X”,”-v7.3");

end

clear X;

clear Xorg;

clear sequences;

end

LSTM Part

Now that already stored vector of feature for each video built LSTM network.

numFeatures = size(X,1); % 128

numClasses = numel(categories(labelsTrain));%13

layers = [

sequenceInputLayer(numFeatures,’Name’,’sequence’)

bilstmLayer(2000,’OutputMode’,’last’,’Name’,’bilstm’)

dropoutLayer(0.5,’Name’,’drop’)

fullyConnectedLayer(numClasses,’Name’,’fc’)

softmaxLayer(‘Name’,’softmax’)

classificationLayer(‘Name’,’classification’)];

Training

The fun part ! lets do it yes we can !

obama

options = trainingOptions(‘adam’, …

‘ExecutionEnvironment’,’gpu’, …

‘MaxEpochs’,15, …

‘MiniBatchSize’,miniBatchSize, …

‘GradientThreshold’,1, …

‘Verbose’,0, …

‘Plots’,’training-progress’);

I used gpu since my cpu was stuck off course learn rate can be modified so it will change during training .

Depends on you to set batch size sincei got memory limits ..

miniBatchSize=16

dsTrain = sequenceDatastore(trainFolder);

dsTrain.MiniBatchSize = miniBatchSize;

net = trainNetwork(dsTrain,layers,options);

Training

Wasn't the best but still some advance during epoch and many fluctuations probably due to step size/ batch size small. if step size will reduce during epochs probably it will get more stable.

The test :

dsTest = sequenceDatastore(validationFolder);

dsTest.MiniBatchSize = miniBatchSize;

Test accuracy

YPred = classify(net,dsTest,’MiniBatchSize’,miniBatchSize);

YTest = dsTest.Labels;

acc = sum(YPred == YTest)./numel(YTest) %0.61

[m,order] = confusionmat(YTest,YPred)

figure

cm = confusionchart(m,order);

Test Confusion Matrix

Train accuracy

YPred = classify(net,dsTrain,’MiniBatchSize’,miniBatchSize);

YTrain = dsTrain.Labels;

acc = sum(YPred == YTrain)./numel(YTrain) %,0.62

[m,order] = confusionmat(YTrain,YPred)

figure

cm = confusionchart(m,order);

Train confusion matrix

Summery

Here presented approach of video sequence classification using matlab with GPU.

features computed use googlenet(inception) they reduced from 1024 to 128 due to memory issues. the n feed to LSTM network.

The results for both train and test was about 61%. which wasn't great I'm sure if i had more memory and use more features from the CNN network the results was better. we can add in LSTM more epoch and change learn rate during training.

not the end

--

--