Siameseネットワークモデルを用いた画像特徴量抽出

Siameseネットワークモデルを効率的に学習させることで、 ロバストな画像特徴量を計算する手法を提案する。 提案手法では、モデルに2つの画像パッチを入力し、出力された特徴量の誤差によってモデルを学習させる。 また、入力するパッチをその識別の難しさによって分類し、識別が困難なパッチを優先的に学習させることで、SIFT特徴量よりもロバストな特徴量の抽出を実現した。

Spotlight Video

Approach

Siamese Network

Our approach consists in training a Convolutional Neural Network (CNN) to build a feature representation of an image patch. We train by using two patches simultaneously that should either correspond to the same point and thus have similar features, or different points and thus different features. We optimize this by using a Siamese architecture, that is, we input two patches simultaneous and minimize the L2 distance between the features if they correspond to the same point and maximize the L2 distance if they correspond to different points. In order to learn efficient and discriminative representations, we propose a positive and negative mining approach which is shown to be critical for performance.

Results

Results of our approach We train and evaluate our results on the Brown dataset1. We also provide evaluation on other datasets including the DaLI dataset and show we significantly outperform existing approaches. Furthermore, our approach computes 128 dimension vectors that can be compared directly with L2 and thus is suitable as a drop-in replacement for SIFT.

Precision-Recall Area Under Curve values for the Brown dataset.
Dataset Split SIFT BGM L-BGM BinBoost256 VGG Ours
ND 0.349 0.487 0.495 0.549 0.663 0.667
YO 0.425 0.495 0.517 0.533 0.709 0.545
LY 0.226 0.268 0.355 0.410 0.558 0.608
All 0.370 0.440 0.508 0.550 0.693 0.756

For full details and results please refer to the full paper.

論文

2015年

Discriminative Learning of Deep Convolutional Feature Point Descriptors
Discriminative Learning of Deep Convolutional Feature Point Descriptors
Edgar Simo-Serra*, Eduard Trulls*, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, Francesc Moreno-Noguer (* equal contribution)
International Conference on Computer Vision (ICCV), 2015
Deep learning has revolutionalized image-level tasks, e.g. image classification, but patch-level tasks, e.g. point correspondence still rely on hand-crafted features, such as SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of non-corresponding patches with the combination of stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. Our models are fully convolutional, efficient to compute and amenable to modern GPUs, and can be used as a drop-in replacement for SIFT. We obtain consistent performance gains over the state of the art, and most importantly generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes.
@InProceedings{SimoSerraICCV2015,
   author    = {Edgar Simo-Serra and Eduard Trulls and Luis Ferraz and Iasonas Kokkinos and Pascal Fua and Francesc Moreno-Noguer},
   title     = {{Discriminative Learning of Deep Convolutional Feature Point Descriptors}},
   booktitle = "Proceedings of the International Conference on Computer Vision (ICCV)",
   year      = 2015,
}

ソフトウェア

Deep Descriptor
Deep Descriptor, 1.0 (2016年02月)
Deep Convolutional Feature Point Descriptors
This code is the implementation of the "Discriminative Learning of Deep Convolutional Feature Point Descriptors" paper. It contains the pre-trained models and example usage code.
  1. S. Winder, G. Hua and M. Brown. Picking the Best DAISY. In CVPR, 2009.