特徴抽出の研究

特徴抽出とは、生データを圧縮して、便利な表現に変換する貴重な情報処理の分野の一つである。本研究は、パッチマッチングなどの様々な課題にたいして応用し、よりいい特徴の抽出することを目指す。

  • Siameseネットワークモデルを用いた画像特徴量抽出

    Siameseネットワークモデルを用いた画像特徴量抽出

    Siameseネットワークモデルを効率的に学習させることで、 ロバストな画像特徴量を計算する手法を提案する。 提案手法では、モデルに2つの画像パッチを入力し、出力された特徴量の誤差によってモデルを学習させる。 また、入力するパッチをその識別の難しさによって分類し、識別が困難なパッチを優先的に学習させることで、SIFT特徴量よりもロバストな特徴量の抽出を実現した。

  • 測地混合モデル

    測地混合モデル

    There are many cases in which data is found to be distributed on a Riemannian manifold. In these cases, Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space.

  • 変形・照明不変の特徴量

    変形・照明不変の特徴量

    DaLI descriptors are local image patch representations that have been shown to be robust to deformation and strong illumination changes. These descriptors are constructed by treating the image patch as a 3D surface and then simulating the diffusion of heat along the surface for different intervals of time. Small time intervals represent local deformation properties while large time intervals represent global deformation properties. Additionally, by performing a logarithmic sampling and then a Fast Fourier Transform, it is possible to obtain robustness against non-linear illumination changes. We have created the first feature point dataset that focuses on deformation and illumination changes of real world objects in order to perform evaluation, where we show the DaLI descriptors outperform all the widely used descriptors.

論文

3D Human Pose Tracking Priors using Geodesic Mixture Models
3D Human Pose Tracking Priors using Geodesic Mixture Models
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Journal of Computer Vision (IJCV) 122(2):388-408, 2016
We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. Additionally, we consider using shrinkage covariance estimation to improve the robustness of the method, especially when dealing with very sparsely distributed samples. We evaluate the approach on a number of situations, going from data clustering on manifolds to combining pose and kinematics of articulated bodies for 3D human pose tracking. In all cases, we demonstrate remarkable improvement compared to several chosen baselines.
@Article{SimoSerraIJCV2016,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno Noguer},
   title     = {{3D Human Pose Tracking Priors using Geodesic Mixture Models}},
   journal   = {International Journal of Computer Vision (IJCV)},
   volume    = {122},
   number    = {2},
   pages     = {388--408},
   year      = 2016,
}
Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction
Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction
Edgar Simo-Serra and Hiroshi Ishikawa
Conference in Computer Vision and Pattern Recognition (CVPR), 2016
We propose a novel approach for learning features from weakly-supervised data by joint ranking and classification. In order to exploit data with weak labels, we jointly train a feature extraction network with a ranking loss and a classification network with a cross-entropy loss. We obtain high-quality compact discriminative features with few parameters, learned on relatively small datasets without additional annotations. This enables us to tackle tasks with specialized images not very similar to the more generic ones in existing fully-supervised datasets. We show that the resulting features in combination with a linear classifier surpass the state-of-the-art on the Hipster Wars dataset despite using features only 0.3% of the size. Our proposed features significantly outperform those obtained from networks trained on ImageNet, despite being 32 times smaller (128 single-precision floats), trained on noisy and weakly-labeled data, and using only 1.5% of the number of parameters.
@InProceedings{SimoSerraCVPR2016,
   author    = {Edgar Simo-Serra and Hiroshi Ishikawa},
   title     = {{Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction}},
   booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
   year      = 2016,
}
Discriminative Learning of Deep Convolutional Feature Point Descriptors
Discriminative Learning of Deep Convolutional Feature Point Descriptors
Edgar Simo-Serra*, Eduard Trulls*, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, Francesc Moreno-Noguer (* equal contribution)
International Conference on Computer Vision (ICCV), 2015
Deep learning has revolutionalized image-level tasks, e.g. image classification, but patch-level tasks, e.g. point correspondence still rely on hand-crafted features, such as SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of non-corresponding patches with the combination of stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. Our models are fully convolutional, efficient to compute and amenable to modern GPUs, and can be used as a drop-in replacement for SIFT. We obtain consistent performance gains over the state of the art, and most importantly generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes.
@InProceedings{SimoSerraICCV2015,
   author    = {Edgar Simo-Serra and Eduard Trulls and Luis Ferraz and Iasonas Kokkinos and Pascal Fua and Francesc Moreno-Noguer},
   title     = {{Discriminative Learning of Deep Convolutional Feature Point Descriptors}},
   booktitle = "Proceedings of the International Conference on Computer Vision (ICCV)",
   year      = 2015,
}
Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking
Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Conference on Machine Vision Applications (MVA) [best paper], 2015
We propose a novel kinematic prior for 3D human pose tracking that allows predicting the position in subsequent frames given the current position. We first define a Riemannian manifold that models the pose and extend it with its Lie algebra to also be able to represent the kinematics. We then learn a joint Gaussian mixture model of both the human pose and the kinematics on this manifold. Finally by conditioning the kinematics on the pose we are able to obtain a distribution of poses for subsequent frames that which can be used as a reliable prior in 3D human pose tracking. Our model scales well to large amounts of data and can be sampled at over 100,000 samples/second. We show it outperforms the widely used Gaussian diffusion model on the challenging Human3.6M dataset.
@InProceedings{SimoSerraMVA2015,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno-Noguer},
   title     = {{Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking}},
   booktitle = "International Conference on Machine Vision Applications (MVA)",
   year      = 2015,
}
DaLI: Deformation and Light Invariant Descriptor
DaLI: Deformation and Light Invariant Descriptor
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Journal of Computer Vision (IJCV) 115(2):135-154, 2015
Recent advances in 3D shape analysis and recognition have shown that heat diffusion theory can be effectively used to describe local features of deforming and scaling surfaces. In this paper, we show how this description can be used to characterize 2D image patches, and introduce DaLI, a novel feature point descriptor with high resilience to non-rigid image transformations and illumination changes. In order to build the descriptor, 2D image patches are initially treated as 3D surfaces. Patches are then described in terms of a heat kernel signature, which captures both local and global information, and shows a high degree of invariance to non-linear image warps. In addition, by further applying a logarithmic sampling and a Fourier transform, invariance to photometric changes is achieved. Finally, the descriptor is compacted by mapping it onto a low dimensional subspace computed using Principal Component Analysis, allowing for an efficient matching. A thorough experimental validation demonstrates that DaLI is significantly more discriminative and robust to illuminations changes and image transformations than state of the art descriptors, even those specifically designed to describe non-rigid deformations.
@Article{SimoSerraIJCV2015,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno Noguer},
   title     = {{DaLI: Deformation and Light Invariant Descriptor}},
   journal   = {International Journal of Computer Vision (IJCV)},
   volume    = {115},
   number    = {2},
   pages     = {136--154},
   year      = 2015,
}
Geodesic Finite Mixture Models
Geodesic Finite Mixture Models
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
British Machine Vision Conference (BMVC), 2014
We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. In particular, we show results on synthetic examples of a sphere and a quadric surface and on a large and complex dataset of human poses, where the proposed model is used as a regression tool for hypothesizing the geometry of occluded parts of the body.
@InProceedings{SimoSerraBMVC2014,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno-Noguer},
   title     = {{Geodesic Finite Mixture Models}},
   booktitle = "Proceedings of the British Machine Vision Conference (BMVC)",
   year      = 2014,
}

ソフトウェア

GFMM
GFMM, 2.0 (2016年07月)
Geodesic Finite Mixture Models
This code is an implementation of the Geodesic Finite Mixture Models written in matlab. The core of the algorithm consists of a single file which can be called to perform the clustering. Additionally, several examples are provided to generate the figures from the paper.
StyleNet
StyleNet, 1.0 (2016年06月)
Fashion style in 128 floats
This code is the implementation of the "Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction". It contains the best performing feature extraction model explained in the paper.
Deep Descriptor
Deep Descriptor, 1.0 (2016年02月)
Deep Convolutional Feature Point Descriptors
This code is the implementation of the "Discriminative Learning of Deep Convolutional Feature Point Descriptors" paper. It contains the pre-trained models and example usage code.
DaLI
DaLI, 1.0 (2015年01月)
Deformation and Light Invariant feature point descriptor
This is an implementation of the Deformation and Light Invariant (DaLI) descriptor. The core of the library is written in C. Additionally a Matlab/Octave interface is provided.
ceigs
ceigs, 1.1 (2012年01月)
C Wrapper for the ARPACK (Arnoldi Iteration) Library
This is a simple C frontend for ARPACK. This allows easy access to calculating a subset of eigenvectors and eigenvalues of sparse matrices. Specifically it can solve two problems: - Av = vd - Av = Mvd Where A, M are sparse matrices, v is the subset of eigenvectors and d is the diagonal matrix of eigenvalues.

データセット

DaLI Dataset
DaLI Dataset
Local image patch feature descriptor illumination and deformation invariance evaluation dataset.
We present a dataset for the evaluation of deformation and illumination invariance of local image patch feature point descriptors. The dataset consists of 192 unique 640x480 grayscale images corresponding to 12 different objects. Points of interest are obtained by the Difference of Gaussians (DoG) detected and were manually matched in order to obtain a ground truth mapping.