You are here



Sanjeev Arora, Princeton University

Linear Algebraic Structure of Word Meanings


In Natural Language Processing (NLP), semantic word embeddings use vectors to capture the meanings of words. They are often constructed using nonlinear/nonconvex techniques such as deep nets and energy‐based models. Recently, Mikolov et al. (2013) showed that such   embeddings exhibit linear structure that can be used to solve "word analogy tasks" such as man: woman :: king: ??.  Subsequently, Levy and Goldberg (2014) and Pennington et al. (2014) tried to explain why such a linear structure should arise in embeddings derived from nonlinear methods.  

We provide a new generative model for language that gives a different explanation for how such linear algebraic structure arises. This new model also casts new light on older methods for generating word embeddings, such as the PMI method of Church and Hanks (1990).   The model has surprising predictions, which are empirically verified. It also suggests a new linear algebraic method to detect polysemy (words having multiple meanings).

We think that our methodology and generative model may be useful for other NLP tasks and understanding the efficacy of other neural models. 

Joint work with Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski (listed in alphabetical order).

Hari Balakrishnan, 

Massachusets Institute of Technology

TCP Ex Machina: Computer-Synthesized Congestion Control


In computer networks, a fundamental problem is congestion control: how should endpoints transmit data to ensure good performance for their transfers and allocate network resources properly? This problem has received much attention over the past thirty years. The design of such protocols, and indeed similar algorithms in all fields of computer science, has largely been a creative, human effort. In this talk, I ask: Is it possible for a computer program "design" (or "discover") the right rules for congestion control? Should computers, rather than humans, be tasked with developing protocols? And just how well can we make computers perform this task? I will describe Remy, a software program and research agenda to answer these questions. Remy seeks to optimize explicitly-stated objectives for network applications and create rules that endpoints can follow. The most surprising result we have found is that, in many cases, Remy not only matches what the best network researchers have invented, but handily surpasses them in performance. I will also describe a new protocol designed by humans using insights from Remy.


This project is joint work with my students Keith Winstein (now on the faculty at Stanford), Anirudh Sivaraman, Venkat Arun, and Pratiksha Thaker.


Avrim Blum,

Carnegie Mellon University

Reconstructing Preferences from Opaque Transactions



In this talk I will discuss the problem of learning the preferences  of agents in a complex system by observing the outcomes of a series  of joint interactions. 

For example, consider an allocation process that aims to allocate resources to agents who each have different preferences or requirements, and where the allocator has some unknown priority ordering over the agents.  From observing the outcome of a series of allocations involving different subsets of agents, can you learn the preferences of the agents and the allocator's priority ordering?  Or consider observing an auction in which agents valuations are drawn from their own personal probability distributions D_i, but the only information that can be observed is the identity of the winner (highest bidder). From repeated observations, plus the ability to enter the auction and win or lose) yourself, can you reconstruct the relevant parts of the individual distributions? In this talk I will discuss algorithms in the context of both of these problems. In the process we will see connections to decision-list learning in learning theory and Kaplan-Meier estimators in medical statistics.

This is joint work with Yishay Mansour and Jamie Morgenstern.


Robert Calderbank,

Duke University

GraphConnect: A Framework for Regularizing Neural Networks



Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, DropOut and DropConnect are data-independent. We shall describe GraphConnect, a complementary method for regularizing deep networks that is data-dependent, and is motivated by the observation that data of interest typically lie close to a manifold. Our proposed method encourages the relationships between the learned decisions to resemble a graph representing the original manifold structure. Essentially GraphConnect is designed to learn attributes that are present in data samples in contrast to weight decay, DropOut and DropConnect which are simply designed to make it more difficult to fit to random error or noise. Analysis of empirical Rademacher complexity suggests that GraphConnect is stronger than weight decay as a regularization. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, GraphConnect is able to significantly improve performance over weight decay, and when used in isolation, is competitive with DropOut.

Peter Glynn,

Stanford University

Multi-level Monte Carlo Algorithms for Computing Gradients and Relative Value Functions


In this talk, we start with an overview of existing Monte Carlo methods related to computation of sensitivities/gradients for Markov chains in which the transition dynamics depend on a parameterized family of policies. We then discuss a new class of algorithms that use multi-level Monte Carlo (MLMC) ideas to facilitate the computation of such sensitivities and that exhibit faster rates of convergence for steady-state quantities. We also show how MLMC can be used to efficiently compute the relative value function for a given policy at a given state x, and discuss other related applications of MLMC ideas. This work is joint with Zeyu Zheng. 


Mor Harchol-Balter,

Carnegie Mellon University

Queueing with Redundant Requests: A More Realistic Model 


Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to replicate a request so that it joins the queue at multiple servers.   The request is considered complete as soon as any one copy of the request completes.

Redundancy is beneficial because it allows us to overcome server-side variability -- the fact that the server we choose might be temporarily slow, due to factors like background load, network interrupts, garbage collection, and so on.   When server-side variability dominates runtime, replicating requests can greatly reduce their response times. 

In the past few years, queueing theorists have begun to study redundancy, first via approximations, and, more recently, via exact analysis.   Unfortunately, for analytical tractability, all the theoretical analysis has assumed models with independent service times.  This is unrealistic because a job with large size should actually remain large at all servers, rather than getting a new independent size at each server.   The unrealistic independence assumption has led to theoretical results which can be at odds with computer systems implementation results.

In this talk, we introduce a much more realistic model of redundancy.   Our model allows us to decouple the inherent job size (X) from the server-side slowdown (S), where we track both S and X for each job.  Analysis within the S&X model is, of course, much more difficult.  Nevertheless, we derive a policy which is both analytically tractable within the S&X model and has provably excellent performance.

Joint work with: Kristy Gardner and Alan Scheller-Wolf 

Babak Hassibi, 

California Institute of Technology

Crowdsourcing for Clustering: Algorithms and Information-Theoretic Lower Bounds


We consider the problem of clustering unlabelled data using a crowd of non-experts. Take, for example, a data base of dog (or bird) images that we would like to cluster into breeds (or species) with the collective help of workers who are not experts in dog breeds (or bird species). Questions of interest include what queries to ask, how many, and what algorithms to use to reliably cluster the data. We focus on simple comparison queries (on pairs, triples and, perhaps, more items) and on the use of convex optimization clustering algorithms (that attempt to decompose a partially-observed adjacency matrix into low rank and sparse components). In this setting, we obtain necessary and sufficient conditions for success of the clustering algorithm as a function of the number of queries, the number of data points, the number of clusters, the number of outliers, the size of the minimum cluster and the quality of the workers. We contrast this with information-theoretic lower bounds on the required number of queries obtained by viewing the crowdsourced clustering problem as one of joint source channel coding. A key feature of the approach is to quantify the cost of a query with its entropy. We also present various empirical results to put the theoretical results into perspective.


Alfred Hero, University of Michigan at Ann Arbor    

Foundational Principles for Large Scale Graph Mining



Many problems in data science depend on mining large graphs for patterns such as communities and high centrality nodes. In most cases the graphs are noisy and negatively affect accuracy as measured by the error rates of putative discoveries and the downstream effect of such errors on the decisionmaker.  Foundational principles are essential for specifying conditions under which such errors can be controlled and minimized.  

Rong Jin,

Michigan State University and Alibaba

 Large-scale Robust Online Matching and Its Application in E-commerce


This talk will be focused on large-scale matching problem, a key problem that has found numerous applications in e-commerce but yet has not been studied extensively. I will particularly focus on two challenges of large-scale matching problem. The first challenge is robustness of solution as most quantities used in optimization are based on estimation that are often inaccurate and unreliable. I will introduce two different technique for robust matching, one based on robust optimization theory and the other based on the theory of two-sided matching. The second challenge is how to make matching decision in an online fashion. I will introduce an approach based on theory of primal-dual online algorithm, where the key is to introduce appropriate discounted awards to account for the future unseen events. For all these algorithms, I will introduce their basic theory, performance guarantee, and empirical studies in in Alibaba e-commerce platform.



Steven Low, California Institute of Technology

Online optimization of power networks



We are at the cusp of a historical transformation of our power systems into a more sustainable, dynamic, intelligent, and distributed form with hundreds of millions of intelligent endpoints.  The optimization and control of such a network requires solving power flow equations, which is well-known to be hard. The grid, however, solves them in real-time at scale, and we propose to exploit it explicitly to carry out part of our optimization algorithm.  This approach not only improves scalability, but also naturally adapts to evolving network conditions. In this talk, we present two examples.


The first example presents an online algorithm to solve an optimal power flow problem at a slow timescale on a radial network where the controllable devices continuously interact with the network that implicitly computes a power flow solution given a control action. Collectively these devices and the network implement a gradient projection algorithm in real time. We prove that the proposed algorithm converges to a set of local optima and provide sufficient conditions under which it converges to a global optimum. We derive an upper bound on the suboptimality gap of any local optimum. This bound suggests that any local optimum is almost as good as any strictly feasible point.


In the second example, the online algorithm integrates primary frequency regulation, secondary frequency regulation, and congestion management at a fast timescale. The algorithm is distributed in that it requires communication only between neighboring buses. Collectively, the controllable devices and the swing dynamics of the network implement a primal-dual algorithm to rebalance power, restore nominal frequency and inter-area flows, and enforce line limits at a minimum control cost. We prove sufficient conditions under which the algorithm converges to a global optimum.

R Srikant,

University of Illinois at Urbana-Champaign

Clustering Using Pairwise Comparisons


We consider a pairwise comparisons model with n users and m items. Each user is shown a few pairs of items, and when a pair of items is shown to a user, he or she expresses a preference for one of the items based on a probabilistic model. The goal is to group users into clusters so that users within each cluster have similar preferences. We will present an algorithm which clusters all users correctly with high probability using a number of pairwise comparisons which is within a polylog factor of an information-theoretic lower bound. Joint work with Barbara Dembin and Siddhartha Satpathi. 


Demosthenis Teneketzis, University of Michigan at Ann Arbor

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition


We formulate and analyze a general class of stochastic dynamic games with asymmetric information arising in dynamic systems. In such games, multiple strategic agents control the system dynamics and have different information about the system over time. Because of the presence of asymmetric information, each agent needs to form beliefs about other agents' private information. Therefore, the specification of the agents' beliefs along with their strategies is necessary to study the dynamic game. We use Perfect Bayesian Equilibrium (PBE) as our solution concept. A PBE consists of a pair of strategy profile and belief system. In a PBE, every agent's strategy should be a best response under the belief system, and the belief system depends on the agents' strategy profile when there is signaling among agents. Therefore, the circular dependence between strategy profile and belief system makes it difficult to compute PBE. Using the common information among agents, we introduce a subclass of PBE called Common Information Based Perfect Bayesian Equilibria ( CIB-PBE), and provide a sequential decomposition of the dynamic game. Such a decomposition leads to a backward induction algorithm to compute CIB-PBE. We illustrate the sequential decomposition with an example of a multiple access broadcast game. We prove the existence of CIB-PBE for a subclass of dynamic games.



Jean Walrand, University of California at Berkeley

Learning in Networks

Many control policies for networks are derived when the parameters of the system are known.  This talk examines policies for networks with unknown parameters, such as robust and adaptive policies.  Topics include adaptive maximum weight scheduling and queue-based scheduling. The problem of learning the delay function of a network is also discussed.