Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval
TL;DR
Abstract
One of the key challenges of deep learning based image retrieval remains in aggregating convolutional activations into one highly representative feature vector. Ideally, this descriptor should encode semantic, spatial and low level information. Even though off-the-shelf pre-trained neural networks can already produce good representations in combination with aggregation methods, appropriate fine tuning for the task of image retrieval has shown to significantly boost retrieval performance. In this paper, we present a simple yet effective supervised aggregation method built on top of existing regional pooling approaches. In addition to the maximum activation of a given region, we calculate regional average activations of extracted feature maps. Subsequently, weights for each of the pooled feature vectors are learned to perform a weighted aggregation to a single feature vector. Furthermore, we apply our newly proposed NRA loss function for deep metric learning to fine tune the backbone neural network and to learn the aggregation weights. Our method achieves state-of-the-art results for the INRIA Holidays data set and competitive results for the Oxford Buildings and Paris data sets while reducing the training time significantly.
BibTeX
If you use our work in your research, please cite our publication:
@INPROCEEDINGS{8901787,
author={Schall, Konstantin and Barthel, Kai Uwe and Hezel, Nico and Jung, Klaus},
booktitle={2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)},
title={Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval},
year={2019},
volume={},
number={},
pages={1-6},
keywords={Computational and artificial intelligence;Multilayer neural network;Image retrieval;Content-based retrieval;Machine learning;Feature extraction;Machine learning algorithms;Nearest neighbor searches;Computer vision},
doi={10.1109/MMSP.2019.8901787}
}