 0 replies
 6 replies
 0 replies
 0 replies
 0 replies

Relaxed dual feasibility:
\begin{equation}y_{pq}(a) \le \frac{1}{2}\theta_{pq}^{\mathrm{min}}\quad\forall pq\in E,a\in L. \label{eq:relaxeddualfeasibility} \end{equation}
Clearly, this implies dual feasibility, i.e. \eqref{eq:dualfeasibility}.  Relaxed complementary slackness conditions: \begin{align} h_p(x_p) &= \min_ah_p(a) \quad \forall p\in V \label{eq:primalcscp} \\ y_{pq}(x_p) + y_{qp}(x_q) &\ge \frac{1}{f_\mathrm{app}} \theta_{pq}(x_p,x_q) \quad \forall pq\in E. \label{eq:relaxedprimalcscpq} \end{align}
 0 replies
 $(x,v)$ is primal feasible;
 $(y,z)$ is dual feasible;
 $x_{ij}(y_{ij}+z_iz_j)=0 \quad\forall e\in E$.
 $y_e(c_ex_e)=0 \quad\forall e\in E$.
 0 replies
 Begin with an exposed node $v\in V$.

Find each neighbor $u$ of $v$:
 If $u$ is exposed, then we have found an augmenting path (that is $(v,u)$);
 If $u$ is not exposed, then denote by $v'$ its mate (i.e. $(u,v')$ is a matched edge). Continue the search with $v'$, by ignoring the vertices already encountered previously.
 During the search, if we encounter an exposed vertex, then we have found an augmenting path. If we do not encounter any exposed vertex, then the given matching is already maximum.
 0 replies
Alternating Direction Graph Matching
In this paper, we introduce a graph matching method that can account for constraints of arbitrary order, with arbitrary potential functions. Unlike previous decomposition approaches that rely on the graph structures, we introduce a decomposition of the matching constraints. Graph matching is then reformulated as a nonconvex nonseparable optimization problem that can be split into smaller and mucheasiertosolve subproblems, by means of the alternating direction method of multipliers. The proposed framework is modular, scalable, and can be instantiated into different variants. Two instantiations are studied exploring pairwise and higherorder constraints. Experimental results on widely adopted benchmarks involving synthetic and real examples demonstrate that the proposed solutions outperform existing pairwise graph matching methods, and competitive with the state of the art in higherorder settings.
D. Khuê LêHuu and Nikos Paragios. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
arXiv version
Updated list of Internship/PhD positions
HIRING: deep reinforcement learning startup in SF
Deep Residual Networks reimplemented by Facebook and CornellTech
The post was coauthored by Sam Gross from Facebook AI Research and Michael Wilber from CornellTech.
In this blog post we implement Deep Residual Networks (ResNets) and investigate ResNets from a modelselection and optimization perspective. We also discuss multiGPU optimizations and engineering bestpractices in training ResNets. We finally compare ResNets to GoogleNet and VGG networks.
We release training code on GitHub, as well as pretrained models for download with instructions for finetuning on your own datasets.
Our released pretrained models have a higher accuracy than the models in the original paper.
At the end of last year, Microsoft Research Asia released a paper titled "Deep Residual Learning for Image Recognition", authored by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. The paper achieved stateoftheart results in image classification and detection, winning the ImageNet and COCO competitions.
The central idea of the paper itself is simple and elegant. They take a standard feedforward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to aresidual block in which the convolution layers predict a residual that is added to the block's input tensor.
An example residual block is shown in the figure below.
Deep feedforward conv nets tend to suffer from optimization difficulty. Beyond a certain depth, adding extra layers results in higher training error and higher validation error, even when batch normalization is used. The authors of the ResNet paper argue that this underfitting is unlikely to be caused by vanishing gradients, since this difficulty occurs even with batch normalized networks. The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers.
This post gives some data points for one trying to understand residual nets in more detail from an optimization perspective. It also investigates the contributions of certain design decisions on the effectiveness of the resulting networks.
After the paper was published on Arxiv, both of us who authored this post independently started investigating and reproducing the results of the paper. After learning about each other's efforts, we decided to collectively write a single post combining our experiences.
The authors of the paper also released their original (Caffe) pretrained models a few days ago: https://github.com/KaimingHe/deepresidualnetworks.
Microsoft releases CNTK, its open source deep learning toolkit, on GitHub
Microsoft is making the tools that its own researchers use to speed up advances in artificial intelligence available to a broader group of developers by releasing its Computational Network Toolkit on GitHub.
The researchers developed the opensource toolkit, dubbed CNTK, out of necessity. Xuedong Huang, Microsoft’s chief speech scientist, said he and his team were anxious to make faster improvements to how well computers can understand speech, and the tools they had to work with were slowing them down.
So, a group of volunteers set out to solve this problem on their own, using a homegrown solution that stressed performance over all else.
The effort paid off.
In internal tests, Huang said CNTK has proved more efficient than four other popular computational toolkits that developers use to create deep learning models for things like speech and image recognition, because it has better communication capabilities
“The CNTK toolkit is just insanely more efficient than anything we have ever seen,” Huang said.
Those types of performance gains are incredibly important in the fastmoving field of deep learning, because some of the biggest deep learning tasks can take weeks to finish.
Over the past few years, the field of deep learning has exploded as more researchers have started running machine learning algorithms using deep neural networks, which are systems that are inspired by the biological processes of the human brain. Many researchers see deep learning as a very promising approach for making artificial intelligence better.
Those gains have allowed researchers to create systems that can accurately recognize and even translate conversations, as well as ones that can recognize images and even answer questions about them.
Internally, Microsoft is using CNTK on a set of powerful computers that use graphics processing units, or GPUs.
Although GPUs were designed for computer graphics, researchers have found that they also are ideal for processing the kind of algorithms that are leading to these major advances in technology that can speak, hear and understand speech, and recognize images and movements.
Chris Basoglu, a principal development manager at Microsoft who also worked on the toolkit, said one of the advantages of CNTK is that it can be used by anyone from a researcher on a limited budget, with a single computer, to someone who has the ability to create their own large cluster of GPUbased computers. The researchers say it can scale across more GPUbased machines than other publicly available toolkits, providing a key advantage for users who want to do largescale experiments or calculations.
Huang said it was important for his team to be able to address Microsoft’s internal needs with a tool like CNTK, but they also want to provide the same resources to other researchers who are making similar advances in deep learning.
That’s why they decided to make the tools available via open source licenses to other researchers and developers.
Last April, the researchers made the toolkit available to academic researchers, via Codeplex and under a more restricted opensource license.
But starting Monday it also will be available, via an opensource license, to anyone else who wants to use it. The researchers say it could be useful to anyone from deep learning startups to more established companies that are processing a lot of data in real time.
“With CNTK, they can actually join us to drive artificial intelligence breakthroughs,” Huang said.
Next Big Test for AI: Making Sense of the World
A few years ago, a breakthrough in machine learning suddenly enabled computers to recognize objects shown in photographs with unprecedented—almost spooky—accuracy. The question now is whether machines can make another leap, by learning to make sense of what’s actually going on in such images.
A new image database, called Visual Genome, could push computers toward this goal, and help gauge the progress of computers attempting to better understand the real world. Teaching computers to parse visual scenes is fundamentally important for artificial intelligence. It might not only spawn more useful vision algorithms, but also help train computers how to communicate more effectively, because language is so intimately tied to representation of the physical world.
Visual Genome was developed by FeiFei Li, a professor who specializes in computer vision and who directs the Stanford Artificial Intelligence Lab, together with several colleagues. “We are focusing very much on some of the hardest questions in computer vision, which is really bridging perception to cognition,” Li says. “Not just taking pixel data in and trying to makes sense of its color, shading, those sorts of things, but really turn that into a fuller understanding of the 3D as well as the semantic visual world.”
Li and colleagues previously created ImageNet, a database containing more than a million images tagged according to their contents. Each year, the ImageNet Large Scale Visual Recognition Challenge tests the ability of computers to automatically recognize the contents of images.
In 2012, a team led by Geoffrey Hinton at the University of Toronto built a large and powerful neural network that could categorize images far more accurately than anything created previously. The technique used to enable this advance, known as deep learning, involves feeding thousands or millions of examples into a manylayered neural network, gradually training each layer of virtual neurons to respond to increasingly abstract characteristics, from the texture of a dog’s fur, say, to its overall shape.
The Toronto team’s achievement marked both a boom of interest in deep learning and a sort of a renaissance in artificial intelligence more generally. And deep learning has since been applied in many other areas, making computers better at other important tasks, such as processing audio and text.
The images in Visual Genome are tagged more richly than in ImageNet, including the names and details of various objects shown in an image; the relationships between those objects; and information about any actions that are occurring. This was achieved using a crowdsourcing approach developed by one of Li’s colleagues at Stanford, Michael Bernstein. The plan is to launch an ImageNetstyle challenge using the data set in 2017.
Algorithms trained using examples in Visual Genome could do more than just recognize objects, and ought to have some ability to parse more complex visual scenes.
“You’re sitting in an office, but what’s the layout, who’s the person, what is he doing, what are the objects around, what event is happening?” Li says. “We are also bridging [this understanding] to language, because the way to communicate is not by assigning numbers to pixels—you need to connect perception and cognition to language.”
Li believes that deep learning will likely play a key role in enabling computers to parse more complex scenes, but that other techniques will help advance the state of the art.
The resulting AI algorithms could perhaps help organize images online or in personal collections, but they might have more significant uses, enabling robots or selfdriving cars to understand a scene properly. They could conceivably also be used to teach computers more common sense, by appreciating which concepts are physically likely or more implausible.
Richard Socher, a machinelearning expert and the founder of an AI startup called MetaMind, says this could be the most important aspect of the project. “A large part of language is about describing the visual world,” he says. “This data set provides a new scalable way to combine the two modalities and test new models.”
Visual Genome isn’t the only complex image database out there for researchers to experiment with. Microsoft, for example, has a database called Common Objects in Context, which shows the names and position of multiple objects in images. Google, Facebook, and others are also pushing the ability of AI algorithms to parse visual scenes. Research published by Google in 2014 showed an algorithm that can provide basic captions for images, with varying levels of accuracy (see “Google’s BrainInspired Software Describes What It Sees in Complex Images”). And, more recently, Facebook showed a questionandanswer system that can answer very simple queries about images (see “Facebook App Can Answer Basic Questions About What’s in Photos”).
Aude Oliva, a professor at MIT who studies machine and human vision, has developed a database called Places2, which contains more than 10 million images of different specific scenes. This project is meant to inspire the development of algorithms capable of describing the same scene in multiple ways, as humans tend to do. Oliva says Visual Genome and similar databases will help advance machine vision, but she believes that AI researchers will need to draw inspiration from biology if they want to build machines with truly humanlike capabilities.
“Humans draw their decision and intuition on lots on knowledge, common sense, sensory experiences, memories, and ‘thoughts’ that are not necessarily translated into language, speech, or text,” Oliva says. “Without knowing how the human brain creates thoughts, it will be difficult to teach common sense and visual understanding to an artificial system. Neuroscience and computer science are the two sides of the AI coin.”
Google DeepMind's AlphaGo beats European Go champion 50
Google DeepMind team has published their paper on Mastering the game of Go with deep neural networks and tree search in Nature on 28th January 2016, describes a new approach to computer Go that combines MonteCarlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of selfplay. This is the first time ever that a computer program has defeated a human professional player.
The game of Go is widely viewed as an unsolved “grand challenge” for artificial intelligence. Despite decades of work, the strongest computer Go programs still only play at the level of human amateurs. In this paper they describe a Go program called AlphaGo. This program was based on generalpurpose AI methods, using deep neural networks to mimic expert players, and further improving the program by learning from games played against itself. AlphaGo won over 99% of games against the strongest other Go programs. It also defeated the human European champion by 5–0 in tournament games, a feat previously believed to be at least a decade away.
In March 2016, AlphaGo will face its ultimate challenge: a 5game challenge match in Seoul against the legendary Lee Sedol, the top Go player in the world over the past decade.
Paper: deepmindmasteringgo.pdf
Approximate PrimalDual Method for Markov Random Fields Optimization
In this topic we will discuss Nikos Komodakis's work on the approximate primaldual method for solving MRF optimization. I will describe in this topic 4 algorithms: PD1, PD2, PD3 and FastPD.
The MRF optimization (or MAP Inference) problem:
\begin{equation}
\begin{aligned}
&\text{minimize} \quad && \sum_{p\in V}\sum_{l\in L} \theta_p(l)x_p(l) + \sum_{pq\in E}\sum_{(l,l')\in L^2} \theta_{pq}(l,l')x_p(l)x_q(l')\\
&\text{subject to} \quad && \sum_{l\in L} x_p(l) = 1 \quad \forall p\in V,\\
& &&\sum_{l'\in L} x_{pq}(l,l') = x_p(l) \quad\forall pq\in E, l\in L, \\
& &&\sum_{l\in L} x_{pq}(l,l') = x_q(l') \quad\forall pq\in E, l'\in L, \\
& &&x_p(l)\in\set{0,1}, x_{pq}(l,l')\in\set{0,1}\quad\forall pq\in E, l\in L, l'\in L.
\end{aligned}
\end{equation}
Replacing the integral constraints in the above problem by $x_p(l)\ge 0, x_{pq}(l,l')\ge 0$ we get its the LPrelaxation. The dual of this LPrelaxation is given by:
\begin{align}
\mbox{maximize}\quad & \sum_{p\in V}\min_{a\in L}h_p(a) \label{eq:dualobjective}\\
\mbox{subject to}\quad &y_{pq}(a) + y_{qp}(b) \le \theta_{pq}(a,b) \quad \forall pq\in E, a\in L,b\in L, \label{eq:dualfeasibility}\end{align}
where the function $h_p(\cdot)$ is defined by $$h_p(a) = \theta_p(a) + \sum_{q:pq\in E} y_{pq}(a).$$
Denote by $x_p$ the chosen label for the node $p$.
...
PD1 Algorithm
This algorithm, at termination, gives a primaldual pair $(\mathbf{x},\mathbf{y})$ satisfying ($\mathbf{x}$ is always chosen as a labeling, thus primal feasibility always holds and hereafter we will not state it explicitly) :
During its iterations, the algorithm ensures all the above conditions, except \eqref{eq:primalcscp}, which will be satisfied at the end.
Dual variables update
Given $(\mathbf{x},\mathbf{y})$ satisfying \eqref{eq:relaxeddualfeasibility} and \eqref{eq:relaxedprimalcscpq}. If $(\mathbf{x},\mathbf{y})$ also satisfy \eqref{eq:primalcscp} then we obtain an approximate solution. Otherwise, we will be able to increase the dual objective \eqref{eq:dualobjective} as follows.
Since \eqref{eq:primalcscp} does not hold for all $p$, there exists a nonempty set $V_0 \subseteq V$ of nodes $p$ where $h_p(x_p) > \min_ah_p(a)$.
Maximum Flow and Minimum Cut
Weak duality. The value $f$ of any st flow $f$ is no greater than the capacity c(S,T) of any st cut $(S,T)$.
Strong duality (Maxflow mincut theorem). The maximum value of an st flow is equal to the minimum capacity over all st cuts.
Maxflow:
$\begin{align} \mbox{maximize}\quad & v \\
\mbox{subject to}\quad & \sum_{e\in\delta^+(s)}x_e  \sum_{e\in\delta^(s)}x_e =
\begin{cases}v &\text{if } i=s \\ v &\text{if } i=t \\ 0 &\text{if } i\in V\setminus\set{s,t} \end{cases} \\
& 0\le x_e \le c_e \quad \forall e\in E. \end{align}$
Mincut:
A cut $(S,T)$ can be represented by indicator variables $z_i$ such that $z_i=0$ if $i\in S$ and $z_i=1$ if $i\in T$. The chosen edges are represented by indicator variables $y_e$. Note that we choose only edges that go from $S$ to $T$, thus $y_{ij}=1$ if and only if $z_i=1$ and $z_j=0$. The minimum cut can therefore formulated as:
$\begin{align} \mbox{minimize}\quad & \sum_{e\in E}y_e c_e \\ \mbox{subject to}\quad &y_{ij} =1 \text{ if } (z_i,z_j)=(0,1) \text{ and } 0 \text{ otherwise}, \quad \forall ij\in E,\\ &z_s=0, z_t=1,\\ & z_i\in\set{0,1}\quad\forall i\in V, y_e\in\set{0,1}\quad\forall e\in E.\end{align}$
The dual of the maxflow problem is:
$\begin{align} \mbox{minimize}\quad & \sum_{e\in E}y_e c_e \\ \mbox{subject to}\quad &y_{ij} +z_iz_j &&\ge 0 \quad \forall ij\in E,\\ &z_tz_s &&= 1,\\ & y_e &&\ge 0 \quad \forall e\in E.\end{align}$
This is a relaxation of the mincut problem.
Complementary slackness condition. $(x,v)$ and $(y,z)$ are both optimal solutions (to the primal and the dual, respectively) if and only if:
Recover an optimal solution of the maxflow from an optimal solution of the mincut.
From this, we can recover
Maximum Matching Problem
A matching $M$ in an undirected graph $G=(V,E)$ is a set of disjoint edges. A vertex $v$ that is not incident to any edge in $M$ is said to be exposed. Otherwise $v$ is matched. Given a graph $G=(V,E)$. The Maximum Matching Problem is to find a maximum (cardinality) matching $M$ of $G$.
The image shows an example of a matching with cardinality $3$, the three edges are $(1,6), (2,7)$ and $(3,8)$. The vertices $4,5,9$ and $10$ are exposed.
An alternating path with respect to $M$ (or an $M$alternating path) is a path that alternates between edges in $M$ and edges in $E\setminus M$. An augmenting path with respect to $M$ (or an $M$augmenting path) is a path in which the first and last vertices are exposed.
Theorem. A matching $M$ in a graph $G$ is maximum if and only if there is no $M$augmenting path in $G$.
It is not difficult to prove the above theorem using the following lemma.
Lemma. If $M$ is a matching in $G$ and $P$ is an $M$augmenting path then $M' := M\triangle P$ (which is the symmetric difference of $M$ and $P$, defined by $M\triangle P = (M\cup P)\setminus (M\cap P)$) is also a matching and $M'=M+1$.
By the above theorem, the problem of matching can be reduced to the problem of finding an augmenting path with respect to a given matching $M$.
We will first discuss the matching problem for bipartite graphs before moving onto general graphs. A graph $G=(V\cup U,E)$, $V$ and $U$ are disjoint, is called bipartite if every edge in $E$ has one endpoint in $V$ and the other in $U$.
We present an algorithm that searches for an $M$augmenting path (if such a path exists), given a bipartite graph $G$ and a matching $M$ in it:
References
[1] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency, Springer 2012.
[2] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Dover 1998.