High-dimensional Sampling Algorithms - Santosh Vempala - MLSS 2012 Kyoto Slides

« High-dimensional Sampling Algorithms

Santosh Vempala

We study the complexity, in high dimension, of basic algorithmic problems such as optimization, integration, rounding and sampling. A suitable convexity assumption allows polynomial-time algorithms for these problems, while still including very interesting special cases such as linear programming, volume computation and many instances of discrete optimization. We will survey the breakthroughs that lead to the current state-of-the-art and pay special attention to the discovery that all of the above problems can be reduced to the problem of *sampling* efficiently. In the process of establishing upper and lower bounds on the complexity of sampling in high dimension, we will encounter geometric random walks, isoperimetric inequalities, generalizations of convexity, probabilistic proof techniques and other methods bridging geometry, probability and complexity.

Scroll with j/k | Numbers | Overlaid text | Size

High-Dimensional Sampling Algorithms

Santosh Vempala
Algorithms and Randomness Center Georgia Tech

Format
Please ask questions Indicate that I should go faster or slower Feel free to ask for more examples And for more proofs

Exercises along the way.

Slide: High-dimensional problems
Input: A set of points S in n-dimensional space or a distribution in A function f that maps points to real values (could be the indicator of a set)

High-dimensional problems
Input: A set of points S in n-dimensional space or a distribution in A function f that maps points to real values (could be the indicator of a set)

Slide: Algorithmic Geometry
What is the complexity of computational problems as the dimension grows? Dimension = number of variables
Typically, size of input is a function of the dimension.

Algorithmic Geometry
What is the complexity of computational problems as the dimension grows? Dimension = number of variables
Typically, size of input is a function of the dimension.

Problem 1: Optimization
Input: function f: specified by an oracle, point x, error parameter . Output: point y such that

Problem 2: Integration
Input: function f: specified by an oracle, point x, error parameter . Output: number A such that:

Slide: Problem 3: Sampling
Input: function f: specified by an oracle, point x, error parameter . Output: A point y from a distribution within distance of distribution with density proportional to f.

Problem 3: Sampling
Input: function f: specified by an oracle, point x, error parameter . Output: A point y from a distribution within distance of distribution with density proportional to f.

Slide: Problem 4: Rounding
Input: function f: specified by an oracle, point x, error parameter . Output: An affine transformation that approximately sandwiches f between concentric balls.

Problem 4: Rounding
Input: function f: specified by an oracle, point x, error parameter . Output: An affine transformation that approximately sandwiches f between concentric balls.

Slide: Problem 5: Learning
Input: i.i.d. points with labels from an unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)

Problem 5: Learning
Input: i.i.d. points with labels from an unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)

Slide: Sampling
Generate a uniform random point from a set S or with density proportional to function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. This course: mathematical and algorithmic foundations of sampling and its applications.

Sampling
Generate a uniform random point from a set S or with density proportional to function f. Numerous applications in diverse areas: statistics, networking, biology, computer vision, privacy, operations research etc. This course: mathematical and algorithmic foundations of sampling and its applications.

Slide: Course Outline
Lecture 1. Introduction to Sampling, highdimensional Geometry and Complexity. L2. Algorithms based on Sampling. L3. Sampling Algorithms.

Course Outline
Lecture 1. Introduction to Sampling, highdimensional Geometry and Complexity. L2. Algorithms based on Sampling. L3. Sampling Algorithms.

Slide: Lecture 1: Introduction
Computational problems in high dimension The challenges of high dimensionality Convex bodies, Logconcave functions Brunn-Minkowski and its variants Isotropy Summary of applications

Lecture 1: Introduction
Computational problems in high dimension The challenges of high dimensionality Convex bodies, Logconcave functions Brunn-Minkowski and its variants Isotropy Summary of applications

Lecture 2: Algorithmic Applications
Convex Optimization Rounding Volume Computation Integration

Slide: Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing

Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing

Slide: High-dimensional problems are hard
P1. Optimization. Find minimum of f over a set. P2. Integration. Find the average (or integral) of f. These problems are intractable (hard) in general, i.e., for arbitrary sets and general functions Intractable for arbitrary sets and linear functions Intractable for polytopes and quadratic functions P1 is NP-hard or worse
min number of unsatisfied clauses in a 3-SAT formula

P2 is #P-hard or worse
Count number of satisfying solutions to a 3-SAT formula

High-dimensional problems are hard
P1. Optimization. Find minimum of f over a set. P2. Integration. Find the average (or integral) of f. These problems are intractable (hard) in general, i.e., for arbitrary sets and general functions Intractable for arbitrary sets and linear functions Intractable for polytopes and quadratic functions P1 is NP-hard or worse
min number of unsatisfied clauses in a 3-SAT formula

P2 is #P-hard or worse
Count number of satisfying solutions to a 3-SAT formula

Slide: High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor;Khachiyan;GLS] S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm f is the indicator function of a convex set.

High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor;Khachiyan;GLS] S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm f is the indicator function of a convex set.

A glimpse of the complexity frontier
1. Are the entries of a given matrix inner products of a set of vectors? A = BBT? (semidefinite program)

Are they inner products of a set of nonnegative vectors?

Is A =BBT, B 0? (completely positive)

Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)

Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest

Slide: Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,

+ 1

Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,

+ 1

How to specify a convex set?
Explicit list of constraints, e.g., a linear program:

What about the set of positive semidefinite matrices? Or the set of vectors on the edges of a graph that have weight at least one on every cut?

Structure I: Separation Oracle

Either x K or there is a halfspace containing K and not x.

Convex sets have separation oracles
If x is not in K, let y be the point in K that is closest to x.

y is unique: If is closer.

are both closest, then

Take the hyperplane normal to (x-y):

Slide: Separation oracles
For an LP, simply check all the linear constraints For a ball or ellipsoid, find the tangent plane For the SDP cone, check if the eigenvalues are all nonnegative; if not eigenvector gives a separating hyperplane. For cut example, find mincut to check if all cuts are at least 1.

Separation oracles
For an LP, simply check all the linear constraints For a ball or ellipsoid, find the tangent plane For the SDP cone, check if the eigenvalues are all nonnegative; if not eigenvector gives a separating hyperplane. For cut example, find mincut to check if all cuts are at least 1.

Slide: Example: Learning by Sampling
Sequence of points X1, X2, , Unknown -1/1 function f We get Xi and have to guess f(Xi) Goal: minimize number of wrong guesses.

Example: Learning by Sampling
Sequence of points X1, X2, , Unknown -1/1 function f We get Xi and have to guess f(Xi) Goal: minimize number of wrong guesses.

Learning Halfspaces
Unknown -1/1 function f f(X) = 1 if > 0 and f(X) = -1 otherwise

For an unknown vector w, with each component wi being a b-bit integer. What is the minimum number of mistakes?

Slide: Majority algorithm
After X1, X2, ,Xk the set of consistent functions f correspond to Sk = {w : (sign(Xi)Xi) > 0 for i = 1,2,, k } Guess f(Xk+1), to be the majority of the predictions of each w in Sk Claim. Number of wrong guesses bn But how to compute majority?? |Sk| could be 2bn !

Majority algorithm
After X1, X2, ,Xk the set of consistent functions f correspond to Sk = {w : (sign(Xi)Xi) > 0 for i = 1,2,, k } Guess f(Xk+1), to be the majority of the predictions of each w in Sk Claim. Number of wrong guesses bn But how to compute majority?? |Sk| could be 2bn !

Random algorithm
Pick random w in Sk Guess

Random algorithm
Pick random w in Sk Guess Lemma 1. E(#wrong guesses) 2bn. Proof idea. Every time random guess is wrong, majority algorithm has probability at least of being wrong. Exercise 1. Prove Lemma 1.

Learning by Sampling
How to pick random w in Sk? Sk is a convex set! It can be efficiently sampled.

Structure of Convex Bodies
Volume(unit cube) = 1 Volume(unit ball)
drops exponentially with n

For any central hyperplane, most of the mass . of a ball is within distance

Structure of Convex Bodies
Volume(unit cube) = 1 Volume(unit ball)
drops exponentially with n

Most of the volume is near the boundary: So,

Structure II: Volume Distribution
A,B sets in , their Minkowski sum is:

For a convex body, the hyperplane section at (x+y)/2 contains What is the volume distribution?

Brunn-Minkowski inequality
A, B compact sets in Thm.

Suffices to prove

by taking the sets to be

, 1

Brunn-Minkowski inequality
Thm. A, B: compact sets in

Proof. First take A, B to be cuboids, i.e., A= B= Then A+B =

Brunn-Minkowski inequality
Thm. A, B: compact sets in

Proof. Next take A,B to be finite unions of disjoint cuboids: A = and B =

Finally, note that any compact set can be approximated to arbitrary accuracy by the union of a finite set of cuboids.

Logconcave functions

i.e., f is nonnegative and its logarithm is concave.

Logconcave functions

Examples:
Indicator functions of convex sets are logconcave Gaussian density function, exponential function

Level sets of f, are convex. Many other useful geometric properties

Prekopa-Leindler inequality
Prekopa-Leindler: then

Functional version of [B-M], equivalent to it.

Slide: Properties of logconcave functions
For two logconcave functions f and g Their sum might not be logconcave But their product h(x) = f(x)g(x) is logconcave And so is their minimum h(x) = min f(x), g(x).

Properties of logconcave functions
For two logconcave functions f and g Their sum might not be logconcave But their product h(x) = f(x)g(x) is logconcave And so is their minimum h(x) = min f(x), g(x).

Properties of logconcave functions
Convolution is logconcave

And so is any marginal:

Exercise 2. Prove the above properties using the PrekopaLeindler inequality.

Slide: Isotropic position
Affine transformations preserve convexity and logconcavity. What can one use as a canonical position? E.g., ellipsoids map to a ball, parallelopipeds map to cubes. What about general convex bodies? Logconcave functions?

Isotropic position
Affine transformations preserve convexity and logconcavity. What can one use as a canonical position? E.g., ellipsoids map to a ball, parallelopipeds map to cubes. What about general convex bodies? Logconcave functions?

Slide: Isotropic position
Let x be a random point from a convex body K z = E(x) is the center of gravity (or centroid). Shift so that z = 0. Now consider the covariance matrix

A has bounded entries; it is positive semidefinite; it is full rank unless K lies in a lower-dimensional subspace.

Isotropic position
Let x be a random point from a convex body K z = E(x) is the center of gravity (or centroid). Shift so that z = 0. Now consider the covariance matrix

A has bounded entries; it is positive semidefinite; it is full rank unless K lies in a lower-dimensional subspace.

Isotropic position
Let Then a random point y from K satisfies: for some n x n matrix B.

K is in isotropic position.

Slide: Isotropic position: Exercises
Exercise 3. Find R s.t. the origin-centered cube of side length 2R is isotropic. Exercise 4. Show that for a random point x from a set in isotropic position, for any unit vector v, we have

Isotropic position: Exercises
Exercise 3. Find R s.t. the origin-centered cube of side length 2R is isotropic. Exercise 4. Show that for a random point x from a set in isotropic position, for any unit vector v, we have

Slide: Isotropic position and sandwiching
For any convex body K (in fact any set/distribution with bounded second moments), we can apply an affine transformation so that for a random point x from K :

Thus K looks like a ball up to second moments. How close is it really to a ball? Can it be sandwiched between two balls of similar radii? Yes!

Isotropic position and sandwiching
For any convex body K (in fact any set/distribution with bounded second moments), we can apply an affine transformation so that for a random point x from K :

Thus K looks like a ball up to second moments. How close is it really to a ball? Can it be sandwiched between two balls of similar radii? Yes!

Sandwiching
Thm (John). Any convex body K has an ellipsoid E s.t. .

The minimum volume ellipsoid contained in K can be used. Thm (KLS). For a convex body K in isotropic position,

Also a factor n sandwiching, but with a different ellipsoid. As we will see, isotropic sandwiching (rounding) is algorithmically efficient while the classical approach is not.

Lecture 2: Algorithmic Applications
Convex Optimization Rounding Volume Computation Integration

Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing

High-Dimensional Sampling Algorithms

Santosh Vempala
Algorithms and Randomness Center Georgia Tech

Format
Please ask questions Indicate that I should go faster or slower Feel free to ask for more examples And for more proofs

Exercises along the way.

Slide: High-dimensional problems
Input: or a distribution in A set of points S in A function f that maps points to real values (could be the indicator of a set)

High-dimensional problems
Input: or a distribution in A set of points S in A function f that maps points to real values (could be the indicator of a set)

Algorithmic Geometry
What is the complexity of computational problems as the dimension grows? Dimension = number of variables
Typically, size of input is a function of the dimension.

Problem 1: Optimization
Input: function f: specified by an oracle, point x, error parameter . Output: point y such that

Problem 2: Integration
Input: function f: specified by an oracle, point x, error parameter . Output: number A such that:

Problem 3: Sampling
Input: function f: specified by an oracle, point x, error parameter . Output: A point y from a distribution within distance of distribution with density proportional to f.

Problem 4: Rounding
Input: function f: specified by an oracle, point x, error parameter . Output: An affine transformation that approximately sandwiches f between concentric balls.

Slide: Problem 5: Learning
Input: i.i.d. points (with labels) from unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)

Problem 5: Learning
Input: i.i.d. points (with labels) from unknown distribution, error parameter . Output: A rule to correctly label 1- of the input distribution. (generalizes integration)

Slide: Lecture 2: Algorithmic Applications
Given a blackbox for sampling, we will study algorithms for: Rounding Convex Optimization Volume Computation Integration

Lecture 2: Algorithmic Applications
Given a blackbox for sampling, we will study algorithms for: Rounding Convex Optimization Volume Computation Integration

Slide: High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor] works when S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm works when f is the indicator function of a convex set.

High-dimensional Algorithms
P1. Optimization. Find minimum of f over the set S. Ellipsoid algorithm [Yudin-Nemirovski; Shor] works when S is a convex set and f is a convex function. P2. Integration. Find the integral of f. Dyer-Frieze-Kannan algorithm works when f is the indicator function of a convex set.

Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)

Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest

Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,

+ 1

Sandwiching
Thm (John). Any convex body K has an ellipsoid E s.t. .

The minimum volume ellipsoid contained in K can be used. Thm (KLS). For a convex body K in isotropic position,

Also a factor n sandwiching, but with a different ellipsoid. As we will see, isotropic sandwiching (rounding) is algorithmically efficient while the classical approach is not.

Slide: Rounding via Sampling
1. Sample m random points from K; 2. Compute sample mean z and sample covariance matrix A. 3. Compute B = A . Applying B to K achieves near-isotropic position. Thm. C().n random points suffice to achieve for isotropic K.
[Adamczak et al.;Srivastava-Vershynin; improving on Bourgain;Rudelson]

I.e., for any unit vector v,

1+

1+ .

Rounding via Sampling
1. Sample m random points from K; 2. Compute sample mean z and sample covariance matrix A. 3. Compute B = A . Applying B to K achieves near-isotropic position. Thm. C().n random points suffice to achieve for isotropic K.
[Adamczak et al.;Srivastava-Vershynin; improving on Bourgain;Rudelson]

I.e., for any unit vector v,

1+ .

Slide: Convex Feasibility
Input: Separation oracle for a convex body K, guarantee that if K is nonempty, it contains a ball of radius r and is contained in the ball of radius R centered the origin. Output: A point x in K. Complexity: #oracle calls and #arithmetic operations. To be efficient, complexity of an algorithm should be bounded by poly(n, log(R/r)).

Convex Feasibility
Input: Separation oracle for a convex body K, guarantee that if K is nonempty, it contains a ball of radius r and is contained in the ball of radius R centered the origin. Output: A point x in K. Complexity: #oracle calls and #arithmetic operations. To be efficient, complexity of an algorithm should be bounded by poly(n, log(R/r)).

Slide: Convex optimization reduces to feasibility
To minimize a convex (or even quasiconvex) function f, we can reduce to the feasibility problem via a binary search. Maintains convexity.

Convex optimization reduces to feasibility
To minimize a convex (or even quasiconvex) function f, we can reduce to the feasibility problem via a binary search. Maintains convexity.

How to choose oracle queries?

Slide: Convex feasibility via sampling
[Bertsimas-V. 02] . 1. Let z=0, P = 2. Does If yes, output K. be a halfspace 3. If no, let H = containing K. 4. Let 5. Sample uniformly from P. 6. Let Go to Step 2.

Convex feasibility via sampling
[Bertsimas-V. 02] . 1. Let z=0, P = 2. Does If yes, output K. be a halfspace 3. If no, let H = containing K. 4. Let 5. Sample uniformly from P. 6. Let Go to Step 2.

Slide: Centroid algorithm
[Levin 65]. Use centroid of surviving set as query point in each iteration. #iterations = O(nlog(R/r)). Best possible. Problem: how to find centroid? #P-hard! [Rademacher 2007]

Centroid algorithm
[Levin 65]. Use centroid of surviving set as query point in each iteration. #iterations = O(nlog(R/r)). Best possible. Problem: how to find centroid? #P-hard! [Rademacher 2007]

$Slide: Why does centroid work? Does not cut volume in half. But it does cut by a constant fraction. Thm [Grunbaum 60]. For any halfspace H containing the centroid of a convex body K,$

Why does centroid work?
Does not cut volume in half. But it does cut by a constant fraction. Thm [Grunbaum 60]. For any halfspace H containing the centroid of a convex body K,

Centroid cuts are balanced
K convex. Assume centroid is origin. Fix normal vector of halfspace to be Let be the slice of K at t. with a ball of

Symmetrize K: Replace each slice the same volume as . Claim. Resulting set is convex. Pf. Use Brunn-Minkowski.

Centroid cuts are balanced

Transform K to a cone while making the halfspace volume no larger. For a cone, the lower bound of the theorem holds.

Centroid cuts are balanced

Transform K to a cone. Maintain volume of right half. Centroid moves right, so halfspace through centroid has smaller mass.

Centroid cuts are balanced

Complete K to a cone. Again centroid moves right. So cone has smaller halfspace volume than K.

Slide: Cone volume
Exercise 1. Show that for a cone, the volume of a halfspace containing its centroid can be as small as smaller. times its volume but no

Cone volume
Exercise 1. Show that for a cone, the volume of a halfspace containing its centroid can be as small as smaller. times its volume but no

Slide: Convex optimization via Sampling
How many iterations for the sampling-based algorithm? If we use only 1 random sample in each iteration, then the number of iterations could be exponential! Do poly(n) samples suffice?

Convex optimization via Sampling
How many iterations for the sampling-based algorithm? If we use only 1 random sample in each iteration, then the number of iterations could be exponential! Do poly(n) samples suffice?

Slide: Approximating the centroid
Let be uniform random from K and y be their average. Suppose K is isotropic. Then, E(y)=0, E So m = O(n) samples give a point y within constant distance of the origin, IF K is isotropic. Is this good enough? What if K is not isotropic?

Approximating the centroid
Let be uniform random from K and y be their average. Suppose K is isotropic. Then, E(y)=0, E So m = O(n) samples give a point y within constant distance of the origin, IF K is isotropic. Is this good enough? What if K is not isotropic?

Slide: Robust Grunbaum: cuts near centroid are also balanced
Lemma [BV02]. For isotropic convex body K and halfspace H containing a point within distance t of the origin,

Thm [BV02]. For any convex body K and halfspace H containing the average of m random points from K,

Robust Grunbaum: cuts near centroid are also balanced
Lemma [BV02]. For isotropic convex body K and halfspace H containing a point within distance t of the origin,

Thm [BV02]. For any convex body K and halfspace H containing the average of m random points from K,

Slide: Robust Grunbaum: cuts near centroid are also balanced
Lemma. For isotropic convex body K and halfspace H containing a point within distance t of the origin, 1 vol K H t vol K . e Proof uses similar ideas as Grunbaum, with more structural properties. In particular, Lemma. For any 1-dimensional isotropic logconcave function f, max f < 1.

Robust Grunbaum: cuts near centroid are also balanced
Lemma. For isotropic convex body K and halfspace H containing a point within distance t of the origin, 1 vol K H t vol K . e Proof uses similar ideas as Grunbaum, with more structural properties. In particular, Lemma. For any 1-dimensional isotropic logconcave function f, max f < 1.

Slide: Optimization via Sampling
Thm. For any convex body K and halfspace H containing the average of m random points from K, 1 n E(vol K H ) vol K . e m Proof. We can assume K is isotropic since affine transformations maintain vol(K H)/vol(K). Distance of y, the average of random samples, from the centroid is bounded. So O(n) samples suffice in each iteration.

Optimization via Sampling
Thm. For any convex body K and halfspace H containing the average of m random points from K, 1 n E(vol K H ) vol K . e m Proof. We can assume K is isotropic since affine transformations maintain vol(K H)/vol(K). Distance of y, the average of random samples, from the centroid is bounded. So O(n) samples suffice in each iteration.

Slide: Optimization via Sampling
Thm. [BV02] Convex feasibility can be solved using O(n log R/r) oracle calls. Ellipsoid takes , Vaidyas algorithm also takes O(n log R/r).

With sampling, one can solve convex optimization using only a membership oracle and a starting point in K. We will see this later.

Optimization via Sampling
Thm. [BV02] Convex feasibility can be solved using O(n log R/r) oracle calls. Ellipsoid takes , Vaidyas algorithm also takes O(n log R/r).

With sampling, one can solve convex optimization using only a membership oracle and a starting point in K. We will see this later.

Integration
We begin with the important special case of volume computation: Given convex body K, and parameter , find a number A s.t.

Volume via Rounding
Using the John ellipsoid or the Inertial ellipsoid

Polytime algorithm, Can we do better?

approximation to volume

Slide: Complexity of Volume Estimation
Thm [E86, BF87]. For any deterministic algorithm that membership calls to the oracle for a uses at most convex body K and computes two numbers A and B , there is some convex body such that for which the ratio B/A is at least

where c is an absolute constant.

Complexity of Volume Estimation
Thm [E86, BF87]. For any deterministic algorithm that membership calls to the oracle for a uses at most convex body K and computes two numbers A and B , there is some convex body such that for which the ratio B/A is at least

where c is an absolute constant.

Complexity of Volume Estimation
Thm [BF]. For deterministic algorithms: # oracle calls approximation factor

Thm [DV12]. Matching upper bound of in time

Volume computation
[DFK89]. Polynomial-time randomized algorithm that estimates volume with probability at least in time poly(n, ).

$Slide: Volume by Random Sampling Pick random samples from ball/cube containing K. Compute fraction c of sample in K. Output c.vol(outer ball). Need too many samples$

Volume by Random Sampling
Pick random samples from ball/cube containing K. Compute fraction c of sample in K. Output c.vol(outer ball).

Need too many samples

Volume via Sampling
Let
/

Estimate each ratio with random samples.

Volume via Sampling
/

Claim. Total #samples

Variance of product
Exercise 2. Let Y be the product estimator = with each as = , i=1,2,, m, estimated using k samples with 3

Show that var Y 1+

1 E Y .

Slide: Appears to be optimal
n phases, O*(n) samples in each phase. If we only took m < n phases, then the ratio to be estimated in some phase could be as large as / which is superpoly for m = o(n). Is total samples the best possible?

Appears to be optimal
n phases, O*(n) samples in each phase. If we only took m < n phases, then the ratio to be estimated in some phase could be as large as / which is superpoly for m = o(n). Is total samples the best possible?

Simulated Annealing [Kalai-V.04,Lovasz-V.03]
To estimate consider a sequence

, ,
with Then,

being easy, e.g., constant function over ball.

Each ratio can be estimated by sampling:
1. 2. Sample X with density proportional to Compute

A tight reduction [LV03]
Define:

log(2 / )

Volume via Annealing

Lemma.

for large enough n.

Although expectation of Y can be large (exponential even), it has small variance!

Proof via logconcavity
Exercise 2. For a logconcave function let Show that for . is a logconcave function. ,

[Hint: Define

Proof via logconcavity
is a logconcave function.

Slide: Progress on volume
Dyer-Frieze-Kannan 91 Lovsz-Siminovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 LV 03,04 LV 06 Power 23 16 10 10 8 7 5 4 4 New ideas everything localization logconcave integration ball walk error analysis multiple improvements speedy walk, isotropy annealing, wt. isoper. integration, local analysis

Progress on volume
Dyer-Frieze-Kannan 91 Lovsz-Siminovits 90 Applegate-K 90 L 90 DF 91 LS 93 KLS 97 LV 03,04 LV 06 Power 23 16 10 10 8 7 5 4 4 New ideas everything localization logconcave integration ball walk error analysis multiple improvements speedy walk, isotropy annealing, wt. isoper. integration, local analysis

Slide: Optimization via Annealing
We can minimize quasiconvex function f over convex set S given only by a membership oracle and a starting point in S. [KV04, LV06]. Almost the same algorithm, in reverse: to find max f, define M. sequence of functions starting at nearly uniform and getting more and more concentrated points of near-optimal objective value.

Optimization via Annealing
We can minimize quasiconvex function f over convex set S given only by a membership oracle and a starting point in S. [KV04, LV06]. Almost the same algorithm, in reverse: to find max f, define M. sequence of functions starting at nearly uniform and getting more and more concentrated points of near-optimal objective value.

Lecture 3: Sampling Algorithms
Sampling by random walks Conductance Grid walk, Ball walk, Hit-and-run Isoperimetric inequalities Rapid mixing

High-Dimensional Sampling Algorithms

Santosh Vempala
Algorithms and Randomness Center Georgia Tech

100

101

Structure
Q. What geometric structure makes algorithmic problems computationally tractable?
(i.e., solvable with polynomial complexity)

Convexity often suffices. Is convexity the frontier of polynomial-time solvability? Appears to be in many cases of interest

102

Convexity
(Indicator functions of) Convex sets: , , 0,1 , , + 1 Concave functions: + 1 Logconcave functions: + 1 Quasiconcave functions: + 1 min Star-shaped sets: . . , + 1 ,

+ 1

103

Annealing
Integration
= = = ( ) , , =1 1+

Optimization
= = = ( ) , , = 1+

Sample with density prop. . to Estimate ~ ( )/ Output = .

Sample with density prop. to .

Output X with max f(X).

104

Slide: How to sample?
Take a random walk in K. Consider a lattice intersected with K Grid (lattice) walk: At grid point x, pick random y from if y is in K, go to y

How to sample?
Take a random walk in K. Consider a lattice intersected with K Grid (lattice) walk: At grid point x, pick random y from if y is in K, go to y

105

Ball walk
At x, pick random y from if y is in K, go to y

106

Hit-and-run
[Boneh, Smith] At x, -pick a random chord L through x -go to a uniform random point y on L

107

Slide: Markov chains
State space K, set of measurable subsets that form a -algebra, i.e., closed under finite unions and intersections associated with A next step distribution each point u in the state space. A starting point. s.t.

Markov chains
State space K, set of measurable subsets that form a -algebra, i.e., closed under finite unions and intersections associated with A next step distribution each point u in the state space. A starting point. s.t.

108

Convergence
Stationary distribution Q, ergodic flow defined as = A ( )

For a stationary distribution Q, we have = ( A)

109

Random walks in K
For both walks, the distribution of the current point tends to uniform in K. The uniform distribution is stationary, in fact,

Exercise 1. Show that the uniform distribution is stationary for hit-and-run. Question: How many steps are needed?

110

Rate of convergence?
Ergodic flow:
= A ( )

Conductance: = ( ) ,

min

= inf ( )

111

Slide: Conductance
Mixing rate cannot be faster than 1/ Since it takes this many steps to even escape from some subsets. Does give an upper bound? Yes, for discrete Markov chains

Thm. [Jerrum-Sinclair]
Where is the second eigenvalue of the transition matrix.

Thus, mixing rate =

Conductance
Mixing rate cannot be faster than 1/ Since it takes this many steps to even escape from some subsets. Does give an upper bound? Yes, for discrete Markov chains

Thm. [Jerrum-Sinclair]
Where is the second eigenvalue of the transition matrix.

Thus, mixing rate =

112

Rate of convergence

High conductance => rapid mixing Proof does not go through eigenvalue gap

113

Slide: How to bound conductance?
Conductance of ball walk is not bounded! Local conductance can be arbitrarily small. What can we do? Modify K slightly Or start with a nearly random point in K. = vol + vol( )

How to bound conductance?
Conductance of ball walk is not bounded! Local conductance can be arbitrarily small. What can we do? Modify K slightly Or start with a nearly random point in K. = vol + vol( )

114

Smoothing a convex body

Each point of the original body has a small ball around it. What about new points? No worse than local conductance of boundary points of a small ball. Choosing step radius will ensure that every point has local conductance at least a fixed constant.

115

Conductance
Consider an arbitrary measurable subset S.

We need to show that the escape probability from S is large.

116

Conductance

Need: Points that do not cross over are far from each other If two subsets are far, then the rest of the set is large

117

One-step distributions
large

the balls around u,v have small intersection u,v must be far

118

Prob. distance
Lemma. -steps. If

Geometric distance
for the ball walk with

then

119

Coupling 1-step distributions

120

Isoperimetry

Extends to logconcave densities:

121

Conductance
Thm. Conductance of ball walk is at least

We can use

122

Conductance
Thm. Conductance of ball walk is at least Pf. = S < = 4 = If not, S , S S S 2 . 8

S < 4

1 . 4 2

123

Conductance
Thm. Conductance of ball walk is at least Pf. = S < = 4 For , , , 1 S

S < 4 , 2 .

> 1 2 , ,

min min

S .

124

Conductance
Thm. Conductance of ball walk is at least Pf.

125

KLS hyperplane conjecture
A: covariance matrix of stationary distribution

126

Thin shell conjecture
Theorem [Bobkov].

Conj. (Thin shell) Alternatively: Current best bound [Guedon-E. Milman]: n1/3

127

Slide: KLS-Slicing-Thin-shell
known conj thin shell slicing KLS
Moreover, KLS implies the others [Ball] and thinshell implies slicing [Eldan-Klartag10].

KLS-Slicing-Thin-shell
known conj thin shell slicing KLS
Moreover, KLS implies the others [Ball] and thinshell implies slicing [Eldan-Klartag10].

128

Slide: Convergence
Thm. [LS93, KLS97] If S is convex, then the ball walk with an M-warm start reaches an (independent) nearly random point in poly(n, D, M) steps. Strictly speaking, this is not rapid mixing! How to get the first random point? Better dependence on diameter D?

Convergence
Thm. [LS93, KLS97] If S is convex, then the ball walk with an M-warm start reaches an (independent) nearly random point in poly(n, D, M) steps. Strictly speaking, this is not rapid mixing! How to get the first random point? Better dependence on diameter D?

129

Is rapid mixing possible?
Ball walk can have bad starts, but Hit-and-run escapes from corners

Min distance based isoperimetry is too coarse

130

Average distance isoperimetry
How to average distance?

Theorem.[LV04; Dieker-V.12]

131

Average distance Isoperimetry

132

Hit-and-run
Thm [LV04]. Hit-and-run mixes in polynomial time from any starting point inside a convex body. Conductance = Gives

sampling algorithm

133

Multi-point random walks
Maintain m points For each point X,
Pick a random combination of the m points Use this to update X

Stationary distribution: m uniform random points!

134

Slide: Sampling
Q1. Is starting at a nice point faster? E.g., does ball walk mix rapidly starting at a single point, e.g., the centroid? Q2. How to check convergence to stationarity on the fly? Does it suffice to check that the measures of all halfspaces have converged?
(Note: poly(n) sample can estimate all halfspace measures approximately)

Sampling
Q1. Is starting at a nice point faster? E.g., does ball walk mix rapidly starting at a single point, e.g., the centroid? Q2. How to check convergence to stationarity on the fly? Does it suffice to check that the measures of all halfspaces have converged?
(Note: poly(n) sample can estimate all halfspace measures approximately)

135

Slide: Sampling: current status
Can be sampled efficiently: Convex bodies Logconcave distributions (1/n-1)-harmonic-concave distributions Near-logconcave distributions Star-shaped bodies ?? ---------------------Cannot be sampled efficiently: Quasiconcave distributions

Sampling: current status
Can be sampled efficiently: Convex bodies Logconcave distributions (1/n-1)-harmonic-concave distributions Near-logconcave distributions Star-shaped bodies ?? ---------------------Cannot be sampled efficiently: Quasiconcave distributions

136

High-dimensional sampling algorithms
Sampling manifolds Random reflections Deterministic sampling? Other applications

137