(Deep) Learning from Frames

Jônatas Wehrmann

doi:10.1235/bracis.vi.121

PDF

Published Jul 2, 2017

DOI: https://doi.org/10.1235/bracis.vi.121

Jônatas Wehrmann

Abstract

Learning content from videos is not an easy task and traditional machine learning approaches for computer vision have difficulties in doing it satisfactorily. However, in the past couple of years the machine learning community has seen the rise of deep learning methods that significantly improve the accuracy of several computer vision applications, e.g., Convolutional Neural Networks (ConvNets). In this paper, we explore the suitability of ConvNets for the movie trailers genre classifi- cation problem. Assigning genres to movies is particularly challenging because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be directly applied to this context. Hence, we propose a novel classification method that encapsulates multiple distinct ConvNets to perform genre classification, namely CoNNECT, where each ConvNet learns features that capture distinct aspects from the movie frames. We compare our novel approach with the current state-of-the-art techniques for movie classification, which make use of well-known image descriptors and lowlevel handcrafted features. Results show that CoNNECT significantly outperforms the state-of-the-art approaches in this task, moving towards effectively solving the genre classification problem.

How to Cite

WEHRMANN, Jônatas. (Deep) Learning from Frames. BRACIS, [S.l.], july 2017. Available at: <http://143.54.25.88/index.php/bracis/article/view/121>. Date accessed: 10 july 2026. doi: https://doi.org/10.1235/bracis.vi.121.

ABNT APA BibTeX CBE EndNote - EndNote format (Macintosh & Windows) MLA ProCite - RIS format (Macintosh & Windows) RefWorks Reference Manager - RIS format (Windows only) Turabian

Issue

2016: BRACIS

Section

Artigos

Article Sidebar

Main Article Content

Abstract

Article Details