Alternating Direction Graph Matching

    By Khue,

    In this paper, we introduce a graph matching method that can account for constraints of arbitrary order, with arbitrary potential functions. Unlike previous decomposition approaches that rely on the graph structures, we introduce a decomposition of the matching constraints. Graph matching is then reformulated as a non-convex non-separable optimization problem that can be split into smaller and much-easier-to-solve subproblems, by means of the alternating direction method of multipliers. The proposed framework is modular, scalable, and can be instantiated into different variants. Two instantiations are studied exploring pairwise and higher-order constraints. Experimental results on widely adopted benchmarks involving synthetic and real examples demonstrate that the proposed solutions outperform existing pairwise graph matching methods, and competitive with the state of the art in higher-order settings.

    D. Khuê Lê-Huu and Nikos Paragios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    arXiv version 

    • 0 replies

    Updated list of Internship/PhD positions

    By Khue,

    New offers will be updated frequently in this topic.

    • 7 replies

    HIRING: deep reinforcement learning startup in SF

    derik pridmore
    By derik pridmore,
    • 0 replies

    Deep Residual Networks re-implemented by Facebook and CornellTech

    By Khue,

    The post was co-authored by Sam Gross from Facebook AI Research and Michael Wilber from CornellTech.

    In this blog post we implement Deep Residual Networks (ResNets) and investigate ResNets from a model-selection and optimization perspective. We also discuss multi-GPU optimizations and engineering best-practices in training ResNets. We finally compare ResNets to GoogleNet and VGG networks.

    We release training code on GitHub, as well as pre-trained models for download with instructions for fine-tuning on your own datasets.

    Our released pre-trained models have a higher accuracy than the models in the original paper.

    At the end of last year, Microsoft Research Asia released a paper titled "Deep Residual Learning for Image Recognition", authored by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. The paper achieved state-of-the-art results in image classification and detection, winning the ImageNet and COCO competitions.

    The central idea of the paper itself is simple and elegant. They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to aresidual block in which the convolution layers predict a residual that is added to the block's input tensor.

    An example residual block is shown in the figure below.


    Deep feed-forward conv nets tend to suffer from optimization difficulty. Beyond a certain depth, adding extra layers results in higher training error and higher validation error, even when batch normalization is used. The authors of the ResNet paper argue that this underfitting is unlikely to be caused by vanishing gradients, since this difficulty occurs even with batch normalized networks. The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers.

    This post gives some data points for one trying to understand residual nets in more detail from an optimization perspective. It also investigates the contributions of certain design decisions on the effectiveness of the resulting networks.

    After the paper was published on Arxiv, both of us who authored this post independently started investigating and reproducing the results of the paper. After learning about each other's efforts, we decided to collectively write a single post combining our experiences.

    Read more on Torch blog...

    The authors of the paper also released their original (Caffe) pre-trained models a few days ago:

    • 6 replies

    Microsoft releases CNTK, its open source deep learning toolkit, on GitHub

    By Khue,

    Microsoft is making the tools that its own researchers use to speed up advances in artificial intelligence available to a broader group of developers by releasing its Computational Network Toolkit on GitHub.

    The researchers developed the open-source toolkit, dubbed CNTK, out of necessity. Xuedong Huang, Microsoft’s chief speech scientist, said he and his team were anxious to make faster improvements to how well computers can understand speech, and the tools they had to work with were slowing them down.

    So, a group of volunteers set out to solve this problem on their own, using a homegrown solution that stressed performance over all else.

    The effort paid off.

    In internal tests, Huang said CNTK has proved more efficient  than four other popular computational toolkits that developers use to create deep learning models for things like speech and image recognition, because it has better communication capabilities

    “The CNTK toolkit is just insanely more efficient than anything we have ever seen,” Huang said.

    Those types of performance gains are incredibly important in the fast-moving field of deep learning, because some of the biggest deep learning tasks can take weeks to finish.

    A comparison of toolkit speed rates

    Over the past few years, the field of deep learning has exploded as more researchers have started running machine learning algorithms using deep neural networks, which are systems that are inspired by the biological processes of the human brain. Many researchers see deep learning as a very promising approach for making artificial intelligence better.

    Those gains have allowed researchers to create systems that can accurately recognize and even translate conversations, as well as ones that can recognize images and even answer questions about them.

    Internally, Microsoft is using CNTK on a set of powerful computers that use graphics processing units, or GPUs.

    Although GPUs were designed for computer graphics, researchers have found that they also are ideal for processing the kind of algorithms that are leading to these major advances in technology that can speak, hear and understand speech, and recognize images and movements.

    Chris Basoglu, a principal development manager at Microsoft who also worked on the toolkit, said one of the advantages of CNTK is that it can be used by anyone from a researcher on a limited budget, with a single computer, to someone who has the ability to create their own large cluster of GPU-based computers. The researchers say it can scale across more GPU-based machines than other publicly available toolkits, providing a key advantage for users who want to do large-scale experiments or calculations.

    Xuedong Hong pictured in office

    Xuedong Huang (Photography by Scott Eklund/Red Box Pictures)

    Huang said it was important for his team to be able to address Microsoft’s internal needs with a tool like CNTK, but they also want to provide the same resources to other researchers who are making similar advances in deep learning.

    That’s why they decided to make the tools available via open source licenses to other researchers and developers.

    Last April, the researchers made the toolkit available to academic researchers, via Codeplex and under a more restricted open-source license.

    But starting Monday it also will be available, via an open-source license, to anyone else who wants to use it. The researchers say it could be useful to anyone from deep learning startups to more established companies that are processing a lot of data in real time.

    “With CNTK, they can actually join us to drive artificial intelligence breakthroughs,” Huang said.

    Microsoft Blog

    • 0 replies

    Next Big Test for AI: Making Sense of the World

    By Khue,

    A few years ago, a breakthrough in machine learning suddenly enabled computers to recognize objects shown in photographs with unprecedented—almost spooky—accuracy. The question now is whether machines can make another leap, by learning to make sense of what’s actually going on in such images.


    A new image database, called Visual Genome, could push computers toward this goal, and help gauge the progress of computers attempting to better understand the real world. Teaching computers to parse visual scenes is fundamentally important for artificial intelligence. It might not only spawn more useful vision algorithms, but also help train computers how to communicate more effectively, because language is so intimately tied to representation of the physical world.

    Visual Genome was developed by Fei-Fei Li, a professor who specializes in computer vision and who directs the Stanford Artificial Intelligence Lab, together with several colleagues. “We are focusing very much on some of the hardest questions in computer vision, which is really bridging perception to cognition,” Li says. “Not just taking pixel data in and trying to makes sense of its color, shading, those sorts of things, but really turn that into a fuller understanding of the 3-D as well as the semantic visual world.”


    Li and colleagues previously created ImageNet, a database containing more than a million images tagged according to their contents. Each year, the ImageNet Large Scale Visual Recognition Challenge tests the ability of computers to automatically recognize the contents of images.

    In 2012, a team led by Geoffrey Hinton at the University of Toronto built a large and powerful neural network that could categorize images far more accurately than anything created previously. The technique used to enable this advance, known as deep learning, involves feeding thousands or millions of examples into a many-layered neural network, gradually training each layer of virtual neurons to respond to increasingly abstract characteristics, from the texture of a dog’s fur, say, to its overall shape.

    The Toronto team’s achievement marked both a boom of interest in deep learning and a sort of a renaissance in artificial intelligence more generally. And deep learning has since been applied in many other areas, making computers better at other important tasks, such as processing audio and text.

    The images in Visual Genome are tagged more richly than in ImageNet, including the names and details of various objects shown in an image; the relationships between those objects; and information about any actions that are occurring. This was achieved using a crowdsourcing approach developed by one of Li’s colleagues at Stanford, Michael Bernstein. The plan is to launch an ImageNet-style challenge using the data set in 2017.

    Algorithms trained using examples in Visual Genome could do more than just recognize objects, and ought to have some ability to parse more complex visual scenes.

    “You’re sitting in an office, but what’s the layout, who’s the person, what is he doing, what are the objects around, what event is happening?” Li says. “We are also bridging [this understanding] to language, because the way to communicate is not by assigning numbers to pixels—you need to connect perception and cognition to language.” 

    Li believes that deep learning will likely play a key role in enabling computers to parse more complex scenes, but that other techniques will help advance the state of the art.

    The resulting AI algorithms could perhaps help organize images online or in personal collections, but they might have more significant uses, enabling robots or self-driving cars to understand a scene properly. They could conceivably also be used to teach computers more common sense, by appreciating which concepts are physically likely or more implausible.

    Richard Socher, a machine-learning expert and the founder of an AI startup called MetaMind, says this could be the most important aspect of the project. “A large part of language is about describing the visual world,” he says. “This data set provides a new scalable way to combine the two modalities and test new models.”

    Visual Genome isn’t the only complex image database out there for researchers to experiment with. Microsoft, for example, has a database called Common Objects in Context, which shows the names and position of multiple objects in images. Google, Facebook, and others are also pushing the ability of AI algorithms to parse visual scenes. Research published by Google in 2014 showed an algorithm that can provide basic captions for images, with varying levels of accuracy (see “Google’s Brain-Inspired Software Describes What It Sees in Complex Images”). And, more recently, Facebook showed a question-and-answer system that can answer very simple queries about images (see “Facebook App Can Answer Basic Questions About What’s in Photos”).

    Aude Oliva, a professor at MIT who studies machine and human vision, has developed a database called Places2, which contains more than 10 million images of different specific scenes. This project is meant to inspire the development of algorithms capable of describing the same scene in multiple ways, as humans tend to do. Oliva says Visual Genome and similar databases will help advance machine vision, but she believes that AI researchers will need to draw inspiration from biology if they want to build machines with truly human-like capabilities.

    “Humans draw their decision and intuition on lots on knowledge, common sense, sensory experiences, memories, and ‘thoughts’ that are not necessarily translated into language, speech, or text,” Oliva says. “Without knowing how the human brain creates thoughts, it will be difficult to teach common sense and visual understanding to an artificial system. Neuroscience and computer science are the two sides of the AI coin.”

    MIT Technology Review

    • 0 replies

    Google DeepMind's AlphaGo beats European Go champion 5-0

    By Khue,

    Google DeepMind team has published their paper on Mastering the game of Go with deep neural networks and tree search in Nature on 28th January 2016, describes a new approach to computer Go that combines Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play. This is the first time ever that a computer program has defeated a human professional player.

    The game of Go is widely viewed as an unsolved “grand challenge” for artificial intelligence. Despite decades of work, the strongest computer Go programs still only play at the level of human amateurs. In this paper they describe a Go program called AlphaGo. This program was based on general-purpose AI methods, using deep neural networks to mimic expert players, and further improving the program by learning from games played against itself. AlphaGo won over 99% of games against the strongest other Go programs. It also defeated the human European champion by 5–0 in tournament games, a feat previously believed to be at least a decade away.

    In March 2016, AlphaGo will face its ultimate challenge: a 5-game challenge match in Seoul against the legendary Lee Sedol, the top Go player in the world over the past decade.

    Paper: deepmind-mastering-go.pdf




    • 0 replies

    Approximate Primal-Dual Method for Markov Random Fields Optimization

    By Khue,

    In this topic we will discuss Nikos Komodakis's work on the approximate primal-dual method for solving MRF optimization. I will describe in this topic 4 algorithms: PD1, PD2, PD3 and FastPD.

    The MRF optimization (or MAP Inference) problem:
    &\text{minimize}  \quad    && \sum_{p\in V}\sum_{l\in L} \theta_p(l)x_p(l) + \sum_{pq\in E}\sum_{(l,l')\in L^2} \theta_{pq}(l,l')x_p(l)x_q(l')\\
    &\text{subject to} \quad    && \sum_{l\in L} x_p(l) = 1 \quad \forall p\in V,\\ 
     & &&\sum_{l'\in L} x_{pq}(l,l') = x_p(l) \quad\forall pq\in E, l\in L, \\
     & &&\sum_{l\in L} x_{pq}(l,l') = x_q(l') \quad\forall pq\in E, l'\in L, \\
     & &&x_p(l)\in\set{0,1}, x_{pq}(l,l')\in\set{0,1}\quad\forall pq\in E, l\in L, l'\in L.

    Replacing the integral constraints in the above problem by $x_p(l)\ge 0, x_{pq}(l,l')\ge 0$ we get its the LP-relaxation. The dual of this LP-relaxation is given by:

    \mbox{maximize}\quad & \sum_{p\in V}\min_{a\in L}h_p(a) \label{eq:dual-objective}\\
    \mbox{subject to}\quad &y_{pq}(a) + y_{qp}(b) \le \theta_{pq}(a,b) \quad \forall pq\in E, a\in L,b\in L, \label{eq:dual-feasibility}\end{align}
    where the function $h_p(\cdot)$ is defined by $$h_p(a) = \theta_p(a) + \sum_{q:pq\in E} y_{pq}(a).$$

    Denote by $x_p$ the chosen label for the node $p$.



    PD1 Algorithm

    This algorithm, at termination, gives a primal-dual pair $(\mathbf{x},\mathbf{y})$ satisfying ($\mathbf{x}$ is always chosen as a labeling, thus primal feasibility always holds and hereafter we will not state it explicitly) :

    1. Relaxed dual feasibility:
      \begin{equation}y_{pq}(a) \le \frac{1}{2}\theta_{pq}^{\mathrm{min}}\quad\forall pq\in E,a\in L. \label{eq:relaxed-dual-feasibility} \end{equation}
      Clearly, this implies dual feasibility, i.e. \eqref{eq:dual-feasibility}.
    2. Relaxed complementary slackness conditions: \begin{align} h_p(x_p) &= \min_ah_p(a) \quad \forall p\in V \label{eq:primal-csc-p} \\ y_{pq}(x_p) + y_{qp}(x_q) &\ge \frac{1}{f_\mathrm{app}} \theta_{pq}(x_p,x_q) \quad \forall pq\in E. \label{eq:relaxed-primal-csc-pq} \end{align}

    During its iterations, the algorithm ensures all the above conditions, except \eqref{eq:primal-csc-p}, which will be satisfied at the end.

    Dual variables update

    Given $(\mathbf{x},\mathbf{y})$ satisfying \eqref{eq:relaxed-dual-feasibility} and \eqref{eq:relaxed-primal-csc-pq}. If $(\mathbf{x},\mathbf{y})$ also satisfy \eqref{eq:primal-csc-p} then we obtain an approximate solution. Otherwise, we will be able to increase the dual objective \eqref{eq:dual-objective} as follows.

    Since \eqref{eq:primal-csc-p} does not hold for all $p$, there exists a non-empty set $V_0 \subseteq V$ of nodes $p$ where $h_p(x_p) > \min_ah_p(a)$.

    • 0 replies

    Maximum Flow and Minimum Cut

    By Khue,

    Weak duality. The value $|f|$ of any s-t flow $f$ is no greater than the capacity c(S,T) of any s-t cut $(S,T)$.

    Strong duality (Max-flow min-cut theorem). The maximum value of an s-t flow is equal to the minimum capacity over all s-t cuts.  



    $\begin{align} \mbox{maximize}\quad & v \\
    \mbox{subject to}\quad & \sum_{e\in\delta^+(s)}x_e - \sum_{e\in\delta^-(s)}x_e =
    \begin{cases}-v &\text{if } i=s \\ v &\text{if } i=t \\ 0 &\text{if } i\in V\setminus\set{s,t} \end{cases} \\
    & 0\le x_e \le c_e \quad \forall e\in E. \end{align}$


    A cut $(S,T)$ can be represented by indicator variables $z_i$ such that $z_i=0$ if $i\in S$ and $z_i=1$ if $i\in T$. The chosen edges are represented by indicator variables $y_e$. Note that we choose only edges that go from $S$ to $T$, thus $y_{ij}=1$ if and only if $z_i=1$ and $z_j=0$. The minimum cut can therefore formulated as:

    $\begin{align} \mbox{minimize}\quad & \sum_{e\in E}y_e c_e \\ \mbox{subject to}\quad &y_{ij} =1 \text{ if } (z_i,z_j)=(0,1) \text{ and } 0 \text{ otherwise}, \quad \forall ij\in E,\\ &z_s=0, z_t=1,\\ & z_i\in\set{0,1}\quad\forall i\in V, y_e\in\set{0,1}\quad\forall e\in E.\end{align}$

    The dual of the max-flow problem is:

    $\begin{align} \mbox{minimize}\quad & \sum_{e\in E}y_e c_e \\ \mbox{subject to}\quad &y_{ij} +z_i-z_j &&\ge 0 \quad \forall ij\in E,\\ &z_t-z_s &&= 1,\\ & y_e &&\ge 0 \quad \forall e\in E.\end{align}$

    This is a relaxation of the min-cut problem.

    Complementary slackness condition. $(x,v)$ and $(y,z)$ are both optimal solutions (to the primal and the dual, respectively) if and only if:

    1. $(x,v)$ is primal feasible;
    2. $(y,z)$ is dual feasible;
    3. $x_{ij}(y_{ij}+z_i-z_j)=0 \quad\forall e\in E$.
    4. $y_e(c_e-x_e)=0 \quad\forall e\in E$. 


    Recover an optimal solution of the max-flow from an optimal solution of the min-cut. 

    From this, we can recover

    • 0 replies

    Maximum Matching Problem

    By Khue,

    A matching $M$ in an undirected graph $G=(V,E)$ is a set of disjoint edges. A vertex $v$ that is not incident to any edge in $M$ is said to be exposed. Otherwise $v$ is matched. Given a graph $G=(V,E)$. The Maximum Matching Problem is to find a maximum (cardinality) matching $M$ of $G$.


    The image shows an example of a matching with cardinality $3$, the three edges are $(1,6), (2,7)$ and $(3,8)$. The vertices $4,5,9$ and $10$ are exposed.

    An alternating path with respect to $M$ (or an $M$-alternating path) is a path that alternates between edges in $M$ and edges in $E\setminus M$. An augmenting path with respect to $M$  (or an $M$-augmenting path) is a path in which the first and last vertices are exposed.

    Theorem. A matching $M$ in a graph $G$ is maximum if and only if there is no $M$-augmenting path in $G$.

    It is not difficult to prove the above theorem using the following lemma.

    Lemma. If $M$ is a matching in $G$ and $P$ is an $M$-augmenting path then $M' := M\triangle P$ (which is the symmetric difference of $M$ and $P$, defined by $M\triangle P = (M\cup P)\setminus (M\cap P)$) is also a matching and $|M'|=|M|+1$.

    By the above theorem, the problem of matching can be reduced to the problem of finding an augmenting path with respect to a given matching $M$.

    We will first discuss the matching problem for bipartite graphs before moving onto general graphs. A graph $G=(V\cup U,E)$, $V$ and $U$ are disjoint, is called bipartite if every edge in $E$ has one endpoint in $V$ and the other in $U$. 

    We present an algorithm that searches for an $M$-augmenting path (if such a path exists), given a bipartite graph $G$ and a matching $M$ in it:

    • Begin with an exposed node $v\in V$.
    • Find each neighbor $u$ of $v$:
      • If $u$ is exposed, then we have found an augmenting path (that is $(v,u)$);
      • If $u$ is not exposed, then denote by $v'$ its mate (i.e. $(u,v')$ is a matched edge). Continue the search with $v'$, by ignoring the vertices already encountered previously.
    • During the search, if we encounter an exposed vertex, then we have found an augmenting path. If we do not encounter any exposed vertex, then the given matching is already maximum. 



    [1]     A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency, Springer 2012.

    [2]     C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Dover 1998.

    • 0 replies

Portal by DevFuse · Based on IP.Board Portal by IPS
  • Posts

    • In this paper, we introduce a graph matching method that can account for constraints of arbitrary order, with arbitrary potential functions. Unlike previous decomposition approaches that rely on the graph structures, we introduce a decomposition of the matching constraints. Graph matching is then reformulated as a non-convex non-separable optimization problem that can be split into smaller and much-easier-to-solve subproblems, by means of the alternating direction method of multipliers. The proposed framework is modular, scalable, and can be instantiated into different variants. Two instantiations are studied exploring pairwise and higher-order constraints. Experimental results on widely adopted benchmarks involving synthetic and real examples demonstrate that the proposed solutions outperform existing pairwise graph matching methods, and competitive with the state of the art in higher-order settings. D. Khuê Lê-Huu and Nikos Paragios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
      arXiv version 
    • Open PhD and Postdoc positions in Schmidhuber's group, on Deep Reinforcement Learning and Recurrent Neural Networks:
    • 4 open PhD positions in Deep Learning for Natural Language Processing at INRIA Nancy.   Notre équipe à Inria Nancy propose plusieurs sujets de thèse sur l'apprentissage profond pour: - le rehaussement de la parole - la reconnaissance robuste du locuteur - la reconnaissance robuste de la parole - l'adaptation du modèle de langage Détails: Profil: Master 2 recherche en informatique, apprentissage, traitement du signal ou mathématiques appliquées. Expérience de programmation en Python ou C/C++. Environnement de travail: Multispeech ( est une équipe de 30 personnes couvrant différents champs de recherche sur la parole faisant appel à l'apprentissage automatique et au traitement du signal. Dates limites de candidature: - 31 mai pour les bourses de l'école doctorale - 30 juin pour les autres bourses Pour candidater: Ouvrir le formulaire au bas de la page web.
    • PhD in Deep Learning at THALES.
    • PhD in Computational Light Fields Imaging at INRIA Rennes. Starting September 2016.   OPEN PHD POSITION AT BELL LABS FRANCE / EURECOM IN MACHINE LEARNING /

      The Mathematics of Complex and Dynamic Networks Department of
      Alcatel-Lucent Bell Labs France and the Data Science Department of
      EURECOM invite applications for a fully funded PhD position on "Learning
      in Blotto games and applications to modeling attention in social
      networks." The PhD will be co-supervised by Dr. Alonso Silva
      ( and Prof. Patrick Loiseau

      A complete description of the position is available at

      The PhD topic is at the intersection of machine learning and game
      theory, with application to modeling social networks. Candidates should
      have a strong background in mathematics (probability and preferably
      learning or game theory or both) and an interest in the application to
      social networks; and should hold a MSc degree (or equivalent) by the
      start of the PhD (expected Fall 2016).

      Interested candidates are encouraged to apply before May 16, 2016, by
      sending the following documents to [email protected] and
      [email protected]: a detailed CV, a list of courses and grades in
      the last two years (at least), the name of 2-3 references willing to
      provide a recommendation letter for their application, a short statement
      of interest and any other information useful to evaluate the application.

      Any question regarding the position can also be sent directly to the
      supervisors at [email protected] and [email protected]
  • Who's Online (See full list)

    There are no registered users currently online