Not logged in.
Quick Search - Contribution
|Title||High Level Semantic Video Understanding|
|Institution||University of Zurich|
|Faculty||Faculty of Business, Economics and Informatics|
|Abstract Text||High level semantic video understanding deals with the problem of analyzing basic insights from movies like interpersonal relationships, relationships to other entities or interpersonal interactions. The Deep Video Understanding Challenge has focused on this issue and organizes an annual competition in which a set of queries is created which should be answered by the participants. This thesis is written in the context of the Deep Video Understanding Challenge 2021 and describes a pipeline that is able to answer the set of queries on a movie- and scene-level. The pipeline consists of a scene segmentation engine which cuts the scenes into single keyframes and shots. After that, they are processed by two streams, which consists of several feature extraction models. One stream focuses on the visual component, while the other stream focuses on the audio component. After that, the features are combined and processed. Numerous classifiers are trained and used to predict the interpersonal relationships, relationships with other entities or interpersonal interactions. At the movie-level, a knowledge graph is then created, reflecting all the relationships between all the entities of a movie. This is used to answer the queries at movie-level. There, 8% of all questions could be answered correctly. The queries from scene-level could be answered to 1.5% correctly. The other pipelines from the DVU Challenge 2021 achieves better results as the worst result on movie level is 17% of correctly answered queries and the worst result on scene-level is 27% of correctly answered queries.|