High-Dimensional Sampling Algorithms
Santosh Vempala
Algorithms and Randomness Center Georgia Tech
We study the complexity, in high dimension, of basic algorithmic problems such as optimization, integration, rounding and sampling. A suitable convexity assumption allows polynomial-time algorithms for these problems, while still including very interesting special cases such as linear programming, volume computation and many instances of discrete optimization. We will survey the breakthroughs that lead to the current state-of-the-art and pay special attention to the discovery that all of the above problems can be reduced to the problem of *sampling* efficiently. In the process of establishing upper and lower bounds on the complexity of sampling in high dimension, we will encounter geometric random walks, isoperimetric inequalities, generalizations of convexity, probabilistic proof techniques and other methods bridging geometry, probability and complexity.
Scroll with j/k | | | Size
High-Dimensional Sampling Algorithms
Santosh Vempala
Algorithms and Randomness Center Georgia Tech
1
Format
Please ask questions Indicate that I should go faster or slower Feel free to ask for more examples And for more proofs
Exercises along the way.
2
High-dimensional problems
Input: A set of points S in n-dimensional space or a distribution in A function f that maps points to real values (could be the indicator of a set)
3
Algorithmic Geometry
What is the complexity of computational problems as the dimension grows? Dimension = number of variables
Typically, size of input is a function of the dimension.
4
Problem 1: Optimization
Input: function f: specified by an oracle, point x, error parameter . Output: point y such that
5
Problem 2: Integration
Input: function f: specified by an oracle, point x, error parameter . Output: number A such that:
6
Problem 3: Sampling
Input: function f: specified by an oracle, point x, error parameter . Output: A point y from a distribution within distance of distribution with density proportional to f.
7
Problem 4: Rounding
Input: function f: specified by an oracle, point x, error parameter . Output: An affine transformation that approximately sandwiches f between concentric balls.
8
Problem 5: Learning
Input: i.i.d. points with labels from an unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)
9
Sampling
Generate a uniform random point from a set S or with density proportional to function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. This course: mathematical and algorithmic foundations of sampling and its applications.
10
Course Outline
Lecture 1. Introduction to Sampling, highdimensional Geometry and Complexity. L2. Algorithms based on Sampling. L3. Sampling Algorithms.
11
Lecture 1: Introduction
Computational problems in high dimension The challenges of high dimensionality Convex bodies, Logconcave functions Brunn-Minkowski and its variants Isotropy Summary of applications
12
Lecture 2: Algorithmic Applications
Convex Optimization Rounding Volume Computation Integration
13
Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing
14
High-dimensional problems are hard
P1. Optimization. Find minimum of f over a set. P2. Integration. Find the average (or integral) of f. These problems are intractable (hard) in general, i.e., for arbitrary sets and general functions Intractable for arbitrary sets and linear functions Intractable for polytopes and quadratic functions P1 is NP-hard or worse
min number of unsatisfied clauses in a 3-SAT formula
P2 is #P-hard or worse
Count number of satisfying solutions to a 3-SAT formula
15
High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor;Khachiyan;GLS] S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm f is the indicator function of a convex set.
16
A glimpse of the complexity frontier
1. Are the entries of a given matrix inner products of a set of vectors? A = BBT? (semidefinite program)
2.
Are they inner products of a set of nonnegative vectors?
Is A =BBT, B 0? (completely positive)
17
Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)
Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest
18
Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,
+ 1
19
How to specify a convex set?
Explicit list of constraints, e.g., a linear program:
What about the set of positive semidefinite matrices? Or the set of vectors on the edges of a graph that have weight at least one on every cut?
20
Structure I: Separation Oracle
Either x K or there is a halfspace containing K and not x.
21
Convex sets have separation oracles
If x is not in K, let y be the point in K that is closest to x.
y is unique: If is closer.
are both closest, then
/2
Take the hyperplane normal to (x-y):
22
Separation oracles
For an LP, simply check all the linear constraints For a ball or ellipsoid, find the tangent plane For the SDP cone, check if the eigenvalues are all nonnegative; if not eigenvector gives a separating hyperplane. For cut example, find mincut to check if all cuts are at least 1.
23
Example: Learning by Sampling
Sequence of points X1, X2, , Unknown -1/1 function f We get Xi and have to guess f(Xi) Goal: minimize number of wrong guesses.
24
Learning Halfspaces
Unknown -1/1 function f f(X) = 1 if > 0 and f(X) = -1 otherwise
For an unknown vector w, with each component wi being a b-bit integer. What is the minimum number of mistakes?
25
Majority algorithm
After X1, X2, ,Xk the set of consistent functions f correspond to Sk = {w : (sign(Xi)Xi) > 0 for i = 1,2,, k } Guess f(Xk+1), to be the majority of the predictions of each w in Sk Claim. Number of wrong guesses bn But how to compute majority?? |Sk| could be 2bn !
26
Random algorithm
Pick random w in Sk Guess
27
Random algorithm
Pick random w in Sk Guess Lemma 1. E(#wrong guesses) 2bn. Proof idea. Every time random guess is wrong, majority algorithm has probability at least of being wrong. Exercise 1. Prove Lemma 1.
28
Learning by Sampling
How to pick random w in Sk? Sk is a convex set! It can be efficiently sampled.
29
Structure of Convex Bodies
Volume(unit cube) = 1 Volume(unit ball)
drops exponentially with n
u
For any central hyperplane, most of the mass . of a ball is within distance
30
Structure of Convex Bodies
Volume(unit cube) = 1 Volume(unit ball)
drops exponentially with n
Most of the volume is near the boundary: So,
31
Structure II: Volume Distribution
A,B sets in , their Minkowski sum is:
For a convex body, the hyperplane section at (x+y)/2 contains What is the volume distribution?
32
Brunn-Minkowski inequality
A, B compact sets in Thm.
Suffices to prove
by taking the sets to be
, 1
33
Brunn-Minkowski inequality
Thm. A, B: compact sets in
Proof. First take A, B to be cuboids, i.e., A= B= Then A+B =
.
34
Brunn-Minkowski inequality
Thm. A, B: compact sets in
Proof. Next take A,B to be finite unions of disjoint cuboids: A = and B =
Finally, note that any compact set can be approximated to arbitrary accuracy by the union of a finite set of cuboids.
35
Logconcave functions
i.e., f is nonnegative and its logarithm is concave.
36
Logconcave functions
Examples:
Indicator functions of convex sets are logconcave Gaussian density function, exponential function
Level sets of f, are convex. Many other useful geometric properties
37
Prekopa-Leindler inequality
Prekopa-Leindler: then
Functional version of [B-M], equivalent to it.
38
Properties of logconcave functions
For two logconcave functions f and g Their sum might not be logconcave But their product h(x) = f(x)g(x) is logconcave And so is their minimum h(x) = min f(x), g(x).
39
Properties of logconcave functions
Convolution is logconcave
And so is any marginal:
Exercise 2. Prove the above properties using the PrekopaLeindler inequality.
40
Isotropic position
Affine transformations preserve convexity and logconcavity. What can one use as a canonical position? E.g., ellipsoids map to a ball, parallelopipeds map to cubes. What about general convex bodies? Logconcave functions?
41
Isotropic position
Let x be a random point from a convex body K z = E(x) is the center of gravity (or centroid). Shift so that z = 0. Now consider the covariance matrix
A has bounded entries; it is positive semidefinite; it is full rank unless K lies in a lower-dimensional subspace.
42
Isotropic position
Let Then a random point y from K satisfies: for some n x n matrix B.
K is in isotropic position.
43
Isotropic position: Exercises
Exercise 3. Find R s.t. the origin-centered cube of side length 2R is isotropic. Exercise 4. Show that for a random point x from a set in isotropic position, for any unit vector v, we have
44
Isotropic position and sandwiching
For any convex body K (in fact any set/distribution with bounded second moments), we can apply an affine transformation so that for a random point x from K :
Thus K looks like a ball up to second moments. How close is it really to a ball? Can it be sandwiched between two balls of similar radii? Yes!
45
Sandwiching
Thm (John). Any convex body K has an ellipsoid E s.t. .
The minimum volume ellipsoid contained in K can be used. Thm (KLS). For a convex body K in isotropic position,
Also a factor n sandwiching, but with a different ellipsoid. As we will see, isotropic sandwiching (rounding) is algorithmically efficient while the classical approach is not.
46
Lecture 2: Algorithmic Applications
Convex Optimization Rounding Volume Computation Integration
47
Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing
48
High-Dimensional Sampling Algorithms
Santosh Vempala
Algorithms and Randomness Center Georgia Tech
49
Format
Please ask questions Indicate that I should go faster or slower Feel free to ask for more examples And for more proofs
Exercises along the way.
50
High-dimensional problems
Input: or a distribution in A set of points S in A function f that maps points to real values (could be the indicator of a set)
51
Algorithmic Geometry
What is the complexity of computational problems as the dimension grows? Dimension = number of variables
Typically, size of input is a function of the dimension.
52
Problem 1: Optimization
Input: function f: specified by an oracle, point x, error parameter . Output: point y such that
53
Problem 2: Integration
Input: function f: specified by an oracle, point x, error parameter . Output: number A such that:
54
Problem 3: Sampling
Input: function f: specified by an oracle, point x, error parameter . Output: A point y from a distribution within distance of distribution with density proportional to f.
55
Problem 4: Rounding
Input: function f: specified by an oracle, point x, error parameter . Output: An affine transformation that approximately sandwiches f between concentric balls.
56
Problem 5: Learning
Input: i.i.d. points (with labels) from unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)
57
Sampling
Generate a uniform random point from a set S or with density proportional to function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. This course: mathematical and algorithmic foundations of sampling and its applications.
58
Lecture 2: Algorithmic Applications
Given a blackbox for sampling, we will study algorithms for: Rounding Convex Optimization Volume Computation Integration
59
High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor] works when S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm works when f is the indicator function of a convex set.
60
Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)
Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest
61
Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,
+ 1
62
Sandwiching
Thm (John). Any convex body K has an ellipsoid E s.t. .
The minimum volume ellipsoid contained in K can be used. Thm (KLS). For a convex body K in isotropic position,
Also a factor n sandwiching, but with a different ellipsoid. As we will see, isotropic sandwiching (rounding) is algorithmically efficient while the classical approach is not.
63
Rounding via Sampling
1. Sample m random points from K; 2. Compute sample mean z and sample covariance matrix A. 3. Compute B = A . Applying B to K achieves near-isotropic position. Thm. C().n random points suffice to achieve for isotropic K.
[Adamczak et al.;Srivastava-Vershynin; improving on Bourgain;Rudelson]
I.e., for any unit vector v,
1+
1+ .
64
Convex Feasibility
Input: Separation oracle for a convex body K, guarantee that if K is nonempty, it contains a ball of radius r and is contained in the ball of radius R centered the origin. Output: A point x in K. Complexity: #oracle calls and #arithmetic operations. To be efficient, complexity of an algorithm should be bounded by poly(n, log(R/r)).
65
Convex optimization reduces to feasibility
To minimize a convex (or even quasiconvex) function f, we can reduce to the feasibility problem via a binary search. Maintains convexity.
66
How to choose oracle queries?
67
Convex feasibility via sampling
[Bertsimas-V. 02] . 1. Let z=0, P = 2. Does If yes, output K. be a halfspace 3. If no, let H = containing K. 4. Let 5. Sample uniformly from P. 6. Let Go to Step 2.
68
Centroid algorithm
[Levin 65]. Use centroid of surviving set as query point in each iteration. #iterations = O(nlog(R/r)). Best possible. Problem: how to find centroid? #P-hard! [Rademacher 2007]
69
Why does centroid work?
Does not cut volume in half. But it does cut by a constant fraction. Thm [Grunbaum 60]. For any halfspace H containing the centroid of a convex body K,
70
Centroid cuts are balanced
K convex. Assume centroid is origin. Fix normal vector of halfspace to be Let be the slice of K at t. with a ball of
Symmetrize K: Replace each slice the same volume as . Claim. Resulting set is convex. Pf. Use Brunn-Minkowski.
71
Centroid cuts are balanced
Transform K to a cone while making the halfspace volume no larger. For a cone, the lower bound of the theorem holds.
72
Centroid cuts are balanced
Transform K to a cone. Maintain volume of right half. Centroid moves right, so halfspace through centroid has smaller mass.
73
Centroid cuts are balanced
Complete K to a cone. Again centroid moves right. So cone has smaller halfspace volume than K.
74
Cone volume
Exercise 1. Show that for a cone, the volume of a halfspace containing its centroid can be as small as smaller. times its volume but no
75
Convex optimization via Sampling
How many iterations for the sampling-based algorithm? If we use only 1 random sample in each iteration, then the number of iterations could be exponential! Do poly(n) samples suffice?
76
Approximating the centroid
Let be uniform random from K and y be their average. Suppose K is isotropic. Then, E(y)=0, E So m = O(n) samples give a point y within constant distance of the origin, IF K is isotropic. Is this good enough? What if K is not isotropic?
77
Robust Grunbaum: cuts near centroid are also balanced
Lemma [BV02]. For isotropic convex body K and halfspace H containing a point within distance t of the origin,
Thm [BV02]. For any convex body K and halfspace H containing the average of m random points from K,
78
Robust Grunbaum: cuts near centroid are also balanced
Lemma. For isotropic convex body K and halfspace H containing a point within distance t of the origin, 1 vol K H t vol K . e Proof uses similar ideas as Grunbaum, with more structural properties. In particular, Lemma. For any 1-dimensional isotropic logconcave function f, max f < 1.
79
Optimization via Sampling
Thm. For any convex body K and halfspace H containing the average of m random points from K, 1 n E(vol K H ) vol K . e m Proof. We can assume K is isotropic since affine transformations maintain vol(K H)/vol(K). Distance of y, the average of random samples, from the centroid is bounded. So O(n) samples suffice in each iteration.
80
Optimization via Sampling
Thm. [BV02] Convex feasibility can be solved using O(n log R/r) oracle calls. Ellipsoid takes , Vaidyas algorithm also takes O(n log R/r).
With sampling, one can solve convex optimization using only a membership oracle and a starting point in K. We will see this later.
81
Integration
We begin with the important special case of volume computation: Given convex body K, and parameter , find a number A s.t.
82
Volume via Rounding
Using the John ellipsoid or the Inertial ellipsoid
Polytime algorithm, Can we do better?
approximation to volume
83
Complexity of Volume Estimation
Thm [E86, BF87]. For any deterministic algorithm that membership calls to the oracle for a uses at most convex body K and computes two numbers A and B , there is some convex body such that for which the ratio B/A is at least
where c is an absolute constant.
84
Complexity of Volume Estimation
Thm [BF]. For deterministic algorithms: # oracle calls approximation factor
Thm [DV12]. Matching upper bound of in time
85
Volume computation
[DFK89]. Polynomial-time randomized algorithm that estimates volume with probability at least in time poly(n, ).
86
Volume by Random Sampling
Pick random samples from ball/cube containing K. Compute fraction c of sample in K. Output c.vol(outer ball).
Need too many samples
87
Volume via Sampling
Let
/
Estimate each ratio with random samples.
88
Volume via Sampling
/
Claim. Total #samples
89
Variance of product
Exercise 2. Let Y be the product estimator = with each as = , i=1,2,, m, estimated using k samples with 3
Show that var Y 1+
1 E Y .
90
Appears to be optimal
n phases, O*(n) samples in each phase. If we only took m < n phases, then the ratio to be estimated in some phase could be as large as / which is superpoly for m = o(n). Is total samples the best possible?
91
Simulated Annealing [Kalai-V.04,Lovasz-V.03]
To estimate consider a sequence
, ,
with Then,
,,
=
being easy, e.g., constant function over ball.
Each ratio can be estimated by sampling:
1. 2. Sample X with density proportional to Compute
=
=
.
=
.
92
A tight reduction [LV03]
Define:
~
log(2 / )
93
Volume via Annealing
Lemma.
for large enough n.
Although expectation of Y can be large (exponential even), it has small variance!
94
Proof via logconcavity
Exercise 2. For a logconcave function let Show that for . is a logconcave function. ,
[Hint: Define
.]
95
Proof via logconcavity
is a logconcave function.
96
Progress on volume
Dyer-Frieze-Kannan 91 Lovsz-Siminovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 LV 03,04 LV 06 Power 23 16 10 10 8 7 5 4 4 New ideas everything localization logconcave integration ball walk error analysis multiple improvements speedy walk, isotropy annealing, wt. isoper. integration, local analysis
97
Optimization via Annealing
We can minimize quasiconvex function f over convex set S given only by a membership oracle and a starting point in S. [KV04, LV06]. Almost the same algorithm, in reverse: to find max f, define M. sequence of functions starting at nearly uniform and getting more and more concentrated points of near-optimal objective value.
98
Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing
99
High-Dimensional Sampling Algorithms
Santosh Vempala
Algorithms and Randomness Center Georgia Tech
100
Sampling
Generate a uniform random point from a set S or with density proportional to function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. This course: mathematical and algorithmic foundations of sampling and its applications.
101
Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)
Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest
102
Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,
+ 1
103
Annealing
Integration
= = = ( ) , , =1 1+
Optimization
= = = ( ) , , = 1+
Sample with density prop. . to Estimate ~ ( )/ Output = .
Sample with density prop. to .
Output X with max f(X).
104
How to sample?
Take a random walk in K. Consider a lattice intersected with K Grid (lattice) walk: At grid point x, pick random y from if y is in K, go to y
105
Ball walk
At x, pick random y from if y is in K, go to y
106
Hit-and-run
[Boneh, Smith] At x, -pick a random chord L through x -go to a uniform random point y on L
107
Markov chains
State space K, set of measurable subsets that form a -algebra, i.e., closed under finite unions and intersections associated with A next step distribution each point u in the state space. A starting point. s.t.
108
Convergence
Stationary distribution Q, ergodic flow defined as = A ( )
For a stationary distribution Q, we have = ( A)
109
Random walks in K
For both walks, the distribution of the current point tends to uniform in K. The uniform distribution is stationary, in fact,
Exercise 1. Show that the uniform distribution is stationary for hit-and-run. Question: How many steps are needed?
110
Rate of convergence?
Ergodic flow:
= A ( )
Conductance: = ( ) ,
min
A
= inf ( )
111
Conductance
Mixing rate cannot be faster than 1/ Since it takes this many steps to even escape from some subsets. Does give an upper bound? Yes, for discrete Markov chains
Thm. [Jerrum-Sinclair]
Where is the second eigenvalue of the transition matrix.
Thus, mixing rate =
112
Rate of convergence
High conductance => rapid mixing Proof does not go through eigenvalue gap
113
How to bound conductance?
Conductance of ball walk is not bounded! Local conductance can be arbitrarily small. What can we do? Modify K slightly Or start with a nearly random point in K. = vol + vol( )
114
Smoothing a convex body
Each point of the original body has a small ball around it. What about new points? No worse than local conductance of boundary points of a small ball. Choosing step radius will ensure that every point has local conductance at least a fixed constant.
115
Conductance
Consider an arbitrary measurable subset S.
We need to show that the escape probability from S is large.
116
Conductance
Need: Points that do not cross over are far from each other If two subsets are far, then the rest of the set is large
117
One-step distributions
large
the balls around u,v have small intersection u,v must be far
118
Prob. distance
Lemma. -steps. If
Geometric distance
for the ball walk with
then
.
119
Coupling 1-step distributions
if
120
Isoperimetry
Extends to logconcave densities:
121
Conductance
Thm. Conductance of ball walk is at least
We can use
So
122
Conductance
Thm. Conductance of ball walk is at least Pf. = S < = 4 = If not, S , S S S 2 . 8
S < 4
2
1 . 4 2
123
Conductance
Thm. Conductance of ball walk is at least Pf. = S < = 4 For , , , 1 S
S < 4 , 2 .
> 1 2 , ,
min min
S .
124
Conductance
Thm. Conductance of ball walk is at least Pf.
125
KLS hyperplane conjecture
A: covariance matrix of stationary distribution
126
Thin shell conjecture
Theorem [Bobkov].
Conj. (Thin shell) Alternatively: Current best bound [Guedon-E. Milman]: n1/3
127
KLS-Slicing-Thin-shell
known conj thin shell slicing KLS
Moreover, KLS implies the others [Ball] and thinshell implies slicing [Eldan-Klartag10].
128
Convergence
Thm. [LS93, KLS97] If S is convex, then the ball walk with an M-warm start reaches an (independent) nearly random point in poly(n, D, M) steps. Strictly speaking, this is not rapid mixing! How to get the first random point? Better dependence on diameter D?
129
Is rapid mixing possible?
Ball walk can have bad starts, but Hit-and-run escapes from corners
Min distance based isoperimetry is too coarse
130
Average distance isoperimetry
How to average distance?
Theorem.[LV04; Dieker-V.12]
131
Average distance Isoperimetry
132
Hit-and-run
Thm [LV04]. Hit-and-run mixes in polynomial time from any starting point inside a convex body. Conductance = Gives
sampling algorithm
133
Multi-point random walks
Maintain m points For each point X,
Pick a random combination of the m points Use this to update X
Stationary distribution: m uniform random points!
134
Sampling
Q1. Is starting at a nice point faster? E.g., does ball walk mix rapidly starting at a single point, e.g., the centroid? Q2. How to check convergence to stationarity on the fly? Does it suffice to check that the measures of all halfspaces have converged?
(Note: poly(n) sample can estimate all halfspace measures approximately)
135
Sampling: current status
Can be sampled efficiently: Convex bodies Logconcave distributions (1/n-1)-harmonic-concave distributions Near-logconcave distributions Star-shaped bodies ?? ---------------------Cannot be sampled efficiently: Quasiconcave distributions
136
High-dimensional sampling algorithms
Sampling manifolds Random reflections Deterministic sampling? Other applications
137
Site based on the django-slidedeck framework by Jason Yosinski.
Find a bug? Email Jason or submit a pull request on Github.