Feature Extraction Research

Feature extraction is a fundamental part of data processing which focuses on converting raw data into compact and useful representations, and has a wide applicability to many different types of problems. The research here focuses on extracting useful, compact, and discriminative features for a diversity of problems such as patch matching, or finding similar styles to certain images.

  • Deep Convolutional Feature Point Descriptors

    Deep Convolutional Feature Point Descriptors

    We learn compact discriminative feature point descriptors using a convolutional neural network. We directly optimize for using L2 distance by training with a pair of corresponding and non-corresponding patches correspond to small and large distances respectively using a Siamese architecture. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. The resulting descriptor is 128 dimensions that can be used as a drop-in replacement for any task involving SIFT. We show that this descriptor generalizes well to various datasets.

  • Geodesic Finite Mixture Models

    Geodesic Finite Mixture Models

    There are many cases in which data is found to be distributed on a Riemannian manifold. In these cases, Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space.

  • Deformation and Light Invariant Descriptor

    Deformation and Light Invariant Descriptor

    DaLI descriptors are local image patch representations that have been shown to be robust to deformation and strong illumination changes. These descriptors are constructed by treating the image patch as a 3D surface and then simulating the diffusion of heat along the surface for different intervals of time. Small time intervals represent local deformation properties while large time intervals represent global deformation properties. Additionally, by performing a logarithmic sampling and then a Fast Fourier Transform, it is possible to obtain robustness against non-linear illumination changes. We have created the first feature point dataset that focuses on deformation and illumination changes of real world objects in order to perform evaluation, where we show the DaLI descriptors outperform all the widely used descriptors.

Publications

3D Human Pose Tracking Priors using Geodesic Mixture Models
3D Human Pose Tracking Priors using Geodesic Mixture Models
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Journal of Computer Vision (IJCV) 122(2):388-408, 2016
We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. Additionally, we consider using shrinkage covariance estimation to improve the robustness of the method, especially when dealing with very sparsely distributed samples. We evaluate the approach on a number of situations, going from data clustering on manifolds to combining pose and kinematics of articulated bodies for 3D human pose tracking. In all cases, we demonstrate remarkable improvement compared to several chosen baselines.
@Article{SimoSerraIJCV2016,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno Noguer},
   title     = {{3D Human Pose Tracking Priors using Geodesic Mixture Models}},
   journal   = {International Journal of Computer Vision (IJCV)},
   volume    = {122},
   number    = {2},
   pages     = {388--408},
   year      = 2016,
}
Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction
Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction
Edgar Simo-Serra and Hiroshi Ishikawa
Conference in Computer Vision and Pattern Recognition (CVPR), 2016
We propose a novel approach for learning features from weakly-supervised data by joint ranking and classification. In order to exploit data with weak labels, we jointly train a feature extraction network with a ranking loss and a classification network with a cross-entropy loss. We obtain high-quality compact discriminative features with few parameters, learned on relatively small datasets without additional annotations. This enables us to tackle tasks with specialized images not very similar to the more generic ones in existing fully-supervised datasets. We show that the resulting features in combination with a linear classifier surpass the state-of-the-art on the Hipster Wars dataset despite using features only 0.3% of the size. Our proposed features significantly outperform those obtained from networks trained on ImageNet, despite being 32 times smaller (128 single-precision floats), trained on noisy and weakly-labeled data, and using only 1.5% of the number of parameters.
@InProceedings{SimoSerraCVPR2016,
   author    = {Edgar Simo-Serra and Hiroshi Ishikawa},
   title     = {{Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction}},
   booktitle = "Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR)",
   year      = 2016,
}
Discriminative Learning of Deep Convolutional Feature Point Descriptors
Discriminative Learning of Deep Convolutional Feature Point Descriptors
Edgar Simo-Serra*, Eduard Trulls*, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, Francesc Moreno-Noguer (* equal contribution)
International Conference on Computer Vision (ICCV), 2015
Deep learning has revolutionalized image-level tasks, e.g. image classification, but patch-level tasks, e.g. point correspondence still rely on hand-crafted features, such as SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of non-corresponding patches with the combination of stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. Our models are fully convolutional, efficient to compute and amenable to modern GPUs, and can be used as a drop-in replacement for SIFT. We obtain consistent performance gains over the state of the art, and most importantly generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes.
@InProceedings{SimoSerraICCV2015,
   author    = {Edgar Simo-Serra and Eduard Trulls and Luis Ferraz and Iasonas Kokkinos and Pascal Fua and Francesc Moreno-Noguer},
   title     = {{Discriminative Learning of Deep Convolutional Feature Point Descriptors}},
   booktitle = "Proceedings of the International Conference on Computer Vision (ICCV)",
   year      = 2015,
}
Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking
Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Conference on Machine Vision Applications (MVA) [best paper], 2015
We propose a novel kinematic prior for 3D human pose tracking that allows predicting the position in subsequent frames given the current position. We first define a Riemannian manifold that models the pose and extend it with its Lie algebra to also be able to represent the kinematics. We then learn a joint Gaussian mixture model of both the human pose and the kinematics on this manifold. Finally by conditioning the kinematics on the pose we are able to obtain a distribution of poses for subsequent frames that which can be used as a reliable prior in 3D human pose tracking. Our model scales well to large amounts of data and can be sampled at over 100,000 samples/second. We show it outperforms the widely used Gaussian diffusion model on the challenging Human3.6M dataset.
@InProceedings{SimoSerraMVA2015,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno-Noguer},
   title     = {{Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking}},
   booktitle = "International Conference on Machine Vision Applications (MVA)",
   year      = 2015,
}
DaLI: Deformation and Light Invariant Descriptor
DaLI: Deformation and Light Invariant Descriptor
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
International Journal of Computer Vision (IJCV) 115(2):135-154, 2015
Recent advances in 3D shape analysis and recognition have shown that heat diffusion theory can be effectively used to describe local features of deforming and scaling surfaces. In this paper, we show how this description can be used to characterize 2D image patches, and introduce DaLI, a novel feature point descriptor with high resilience to non-rigid image transformations and illumination changes. In order to build the descriptor, 2D image patches are initially treated as 3D surfaces. Patches are then described in terms of a heat kernel signature, which captures both local and global information, and shows a high degree of invariance to non-linear image warps. In addition, by further applying a logarithmic sampling and a Fourier transform, invariance to photometric changes is achieved. Finally, the descriptor is compacted by mapping it onto a low dimensional subspace computed using Principal Component Analysis, allowing for an efficient matching. A thorough experimental validation demonstrates that DaLI is significantly more discriminative and robust to illuminations changes and image transformations than state of the art descriptors, even those specifically designed to describe non-rigid deformations.
@Article{SimoSerraIJCV2015,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno Noguer},
   title     = {{DaLI: Deformation and Light Invariant Descriptor}},
   journal   = {International Journal of Computer Vision (IJCV)},
   volume    = {115},
   number    = {2},
   pages     = {136--154},
   year      = 2015,
}
Geodesic Finite Mixture Models
Geodesic Finite Mixture Models
Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer
British Machine Vision Conference (BMVC), 2014
We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. In particular, we show results on synthetic examples of a sphere and a quadric surface and on a large and complex dataset of human poses, where the proposed model is used as a regression tool for hypothesizing the geometry of occluded parts of the body.
@InProceedings{SimoSerraBMVC2014,
   author    = {Edgar Simo-Serra and Carme Torras and Francesc Moreno-Noguer},
   title     = {{Geodesic Finite Mixture Models}},
   booktitle = "Proceedings of the British Machine Vision Conference (BMVC)",
   year      = 2014,
}

Source Code

GFMM
GFMM, 2.0 (Jul, 2016)
Geodesic Finite Mixture Models
This code is an implementation of the Geodesic Finite Mixture Models written in matlab. The core of the algorithm consists of a single file which can be called to perform the clustering. Additionally, several examples are provided to generate the figures from the paper.
StyleNet
StyleNet, 1.0 (Jun, 2016)
Fashion style in 128 floats
This code is the implementation of the "Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction". It contains the best performing feature extraction model explained in the paper.
Deep Descriptor
Deep Descriptor, 1.0 (Feb, 2016)
Deep Convolutional Feature Point Descriptors
This code is the implementation of the "Discriminative Learning of Deep Convolutional Feature Point Descriptors" paper. It contains the pre-trained models and example usage code.
DaLI
DaLI, 1.0 (Jan, 2015)
Deformation and Light Invariant feature point descriptor
This is an implementation of the Deformation and Light Invariant (DaLI) descriptor. The core of the library is written in C. Additionally a Matlab/Octave interface is provided.
ceigs
ceigs, 1.1 (Jan, 2012)
C Wrapper for the ARPACK (Arnoldi Iteration) Library
This is a simple C frontend for ARPACK. This allows easy access to calculating a subset of eigenvectors and eigenvalues of sparse matrices. Specifically it can solve two problems: - Av = vd - Av = Mvd Where A, M are sparse matrices, v is the subset of eigenvectors and d is the diagonal matrix of eigenvalues.

Datasets

DaLI Dataset
DaLI Dataset
Local image patch feature descriptor illumination and deformation invariance evaluation dataset.
We present a dataset for the evaluation of deformation and illumination invariance of local image patch feature point descriptors. The dataset consists of 192 unique 640x480 grayscale images corresponding to 12 different objects. Points of interest are obtained by the Difference of Gaussians (DoG) detected and were manually matched in order to obtain a ground truth mapping.