Metric random matchings with applications111A preliminary version of this paper appears in https://arxiv.org/abs/1702.03106.

Ching-Lueh Chang 222Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan. Email: clchang@saturn.yzu.edu.tw 333Supported in part by the Ministry of Science and Technology of Taiwan under grant 105-2221-E-155-047-.
Abstract

Let ({1,2,…,n},d)12…𝑛𝑑(\{1,2,\ldots,n\},d) be a metric space. We analyze the expected value and the variance of βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)) for a uniformly random permutation 𝝅𝝅\boldsymbol{\pi} of {1,2,…,n}12…𝑛\{1,2,\ldots,n\}, leading to the following results:

  • β€’

    Consider the problem of finding a point in {1,2,…,n}12…𝑛\{1,2,\ldots,n\} with the minimum sum of distances to all points. We show that this problem has a randomized algorithm that (1) always outputs a (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate solution in expected O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time and that (2) inherits Indyk’s [9, 10] algorithm to output a (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximate solution in O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time with probability Ω​(1)Ξ©1\Omega(1), where ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1).

  • β€’

    The average distance in ({1,2,…,n},d)12…𝑛𝑑(\{1,2,\ldots,n\},d) can be approximated in O​(n/Ο΅)𝑂𝑛italic-Ο΅O(n/\epsilon) time to within a multiplicative factor in [ 1/2βˆ’Ο΅,1]12italic-Ο΅1[\,1/2-\epsilon,1\,] with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1), where Ο΅>0italic-Ο΅0\epsilon>0.

  • β€’

    Assume d𝑑d to be a graph metric. Then the average distance in ({1,2,…,n},d)12…𝑛𝑑(\{1,2,\ldots,n\},d) can be approximated in O​(n)𝑂𝑛O(n) time to within a multiplicative factor in [ 1βˆ’Ο΅,1+Ο΅]1italic-Ο΅1italic-Ο΅[\,1-\epsilon,1+\epsilon\,] with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1), where Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}).

1 Introduction

A metric space is a nonempty set M𝑀M endowed with a metric, i.e., a function d:MΓ—Mβ†’[ 0,∞):𝑑→𝑀𝑀 0d\colon M\times M\to[\,0,\infty\,) such that

  • β€’

    d​(x,y)=0𝑑π‘₯𝑦0d(x,y)=0 if and only if x=yπ‘₯𝑦x=y (identity of indiscernibles),

  • β€’

    d​(x,y)=d​(y,x)𝑑π‘₯𝑦𝑑𝑦π‘₯d(x,y)=d(y,x) (symmetry), and

  • β€’

    d​(x,y)+d​(y,z)β‰₯d​(x,z)𝑑π‘₯𝑦𝑑𝑦𝑧𝑑π‘₯𝑧d(x,y)+d(y,z)\geq d(x,z) (triangle inequality)

for all xπ‘₯x, y𝑦y, z∈M𝑧𝑀z\in M [14].

For all nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+}, define [n]≑{1,2,…,n}delimited-[]𝑛12…𝑛[n]\equiv\{1,2,\ldots,n\}. Given nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+} and oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,), metric 111-median asks for argminy∈[n]βˆ‘x∈[n]d​(y,x)subscriptargmin𝑦delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑦π‘₯\mathop{\mathrm{argmin}}_{y\in[n]}\,\sum_{x\in[n]}\,d(y,x), breaking ties arbitrarily. It generalizes the classical median selection on the real line and has a brute-force Ξ˜β€‹(n2)Θsuperscript𝑛2\Theta(n^{2})-time algorithm. More generally, metric kπ‘˜k-median asks for c1subscript𝑐1c_{1}, c2subscript𝑐2c_{2}, ……\ldots, ck∈[n]subscriptπ‘π‘˜delimited-[]𝑛c_{k}\in[n] minimizing βˆ‘x∈[n]mini=1k⁑d​(x,ci)subscriptπ‘₯delimited-[]𝑛superscriptsubscript𝑖1π‘˜π‘‘π‘₯subscript𝑐𝑖\sum_{x\in[n]}\,\min_{i=1}^{k}\,d(x,c_{i}). Because d​(β‹…,β‹…)𝑑⋅⋅d(\cdot,\cdot) defines (n2)=Ξ˜β€‹(n2)binomial𝑛2Θsuperscript𝑛2\binom{n}{2}=\Theta(n^{2}) nonzero distances, only o​(n2)π‘œsuperscript𝑛2o(n^{2})-time algorithms are said to run in sublinear time [9]. For all Ξ±β‰₯1𝛼1\alpha\geq 1, an α𝛼\alpha-approximate 111-median is a point p∈[n]𝑝delimited-[]𝑛p\in[n] satisfying

βˆ‘x∈[n]d​(p,x)≀α⋅miny∈[n]β€‹βˆ‘x∈[n]d​(y,x).subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯⋅𝛼subscript𝑦delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑦π‘₯\sum_{x\in[n]}\,d\left(p,x\right)\leq\alpha\cdot\min_{y\in[n]}\,\sum_{x\in[n]}\,d\left(y,x\right).

For all Ο΅>0italic-Ο΅0\epsilon>0, metric 111-median has a Monte Carlo (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximation O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2})-time algorithm [9, 10]. Guha et al. [8] show that metric kπ‘˜k-median has a Monte Carlo, O​(exp⁑(O​(1/Ο΅)))𝑂𝑂1italic-Ο΅O(\exp(O(1/\epsilon)))-approximation, O​(n​k​log⁑n)π‘‚π‘›π‘˜π‘›O(nk\log n)-time, O​(nΟ΅)𝑂superscript𝑛italic-Ο΅O(n^{\epsilon})-space and one-pass algorithm for all small kπ‘˜k as well as a deterministic, O​(exp⁑(O​(1/Ο΅)))𝑂𝑂1italic-Ο΅O(\exp(O(1/\epsilon)))-approximation, O​(n1+Ο΅)𝑂superscript𝑛1italic-Ο΅O(n^{1+\epsilon})-time, O​(nΟ΅)𝑂superscript𝑛italic-Ο΅O(n^{\epsilon})-space and one-pass algorithm. Given n𝑛n points in ℝDsuperscriptℝ𝐷\mathbb{R}^{D} with Dβ‰₯1𝐷1D\geq 1, the Monte Carlo algorithms of Kumar et al. [11] find a (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximate 111-median in O​(Dβ‹…exp⁑(1/Ο΅O​(1)))𝑂⋅𝐷1superscriptitalic-ϡ𝑂1O(D\cdot\exp(1/\epsilon^{O(1)})) time and a (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximate solution to metric kπ‘˜k-median in O​(D​nβ‹…exp⁑((k/Ο΅)O​(1)))𝑂⋅𝐷𝑛superscriptπ‘˜italic-ϡ𝑂1O(Dn\cdot\exp((k/\epsilon)^{O(1)})) time. All randomized O​(1)𝑂1O(1)-approximation algorithms for metric kπ‘˜k-median take Ω​(n​k)Ξ©π‘›π‘˜\Omega(nk) time [12, 8]. Chang [3] shows that metric 111-median has a deterministic, (2​h)2β„Ž(2h)-approximation, O​(h​n1+1/h)π‘‚β„Žsuperscript𝑛11β„ŽO(hn^{1+1/h})-time and nonadaptive algorithm for all constants hβˆˆβ„€+βˆ–{1}β„Žsuperscriptβ„€1h\in\mathbb{Z}^{+}\setminus\{1\}, generalizing the results of Chang [2] and Wu [16]. On the other hand, he disproves the existence of deterministic (2​hβˆ’Ο΅)2β„Žitalic-Ο΅(2h-\epsilon)-approximation O​(n1+1/(hβˆ’1)/h)𝑂superscript𝑛11β„Ž1β„ŽO(n^{1+1/(h-1)}/h)-time algorithms for all constants hβˆˆβ„€+βˆ–{1}β„Žsuperscriptβ„€1h\in\mathbb{Z}^{+}\setminus\{1\} and Ο΅>0italic-Ο΅0\epsilon>0 [4, 5].

In social network analysis, the closeness centrality of a point v𝑣v is the reciprocal of the average distance from v𝑣v to all points [15]. So metric 111-median asks for a point with the maximum closeness centrality. Given oracle access to a graph metric, the Monte-Carlo algorithms of Goldreich and Ron [7] and Eppstein and Wang [6] estimate the closeness centrality of a given point and those of all points, respectively.

All known sublinear-time algorithms for metric 111-median are either deterministic or Monte Carlo, the latter having a positive probability of failure. For example, Indyk’s Monte Carlo (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximation algorithm outputs with a positive probability a solution without approximation guarantees. In contrast, we show that metric 111-median has a randomized algorithm that always outputs a (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate solution in expected O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time for all ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1). So, excluding the known deterministic algorithms (which are Las Vegas only in the degenerate sense), this paper gives the first Las Vegas approximation algorithm for metric 111-median with an expected sublinear running time. Note that deterministic sublinear-time algorithms for metric 111-median can be 444-approximate but not (4βˆ’Ο΅)4italic-Ο΅(4-\epsilon)-approximate for any constant Ο΅>0italic-Ο΅0\epsilon>0 [2, 5]. So our approximation ratio of 2+Ο΅2italic-Ο΅2+\epsilon beats that of any deterministic sublinear-time algorithm. Inheriting Indyk’s algorithm, our algorithm outputs a (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximate 111-median in O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time with probability Ω​(1)Ξ©1\Omega(1) for all ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1).

Indyk [9, 10] gives a Monte-Carlo O​(n/Ο΅3.5)𝑂𝑛superscriptitalic-Ο΅3.5O(n/\epsilon^{3.5})-time algorithm that approximates the average distance in any metric space ([n],d)delimited-[]𝑛𝑑([n],d) to within a multiplicative factor in [ 1βˆ’Ο΅,1+Ο΅]1italic-Ο΅1italic-Ο΅[\,1-\epsilon,1+\epsilon\,], for all Ο΅>0italic-Ο΅0\epsilon>0. Barhum, Goldreich and Shraibman [1] improve Indyk’s time complexity of O​(n/Ο΅3.5)𝑂𝑛superscriptitalic-Ο΅3.5O(n/\epsilon^{3.5}) to O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}). This paper gives a Monte-Carlo O​(n/Ο΅)𝑂𝑛italic-Ο΅O(n/\epsilon)-time algorithm that approximates the average distance in ([n],d)delimited-[]𝑛𝑑([n],d) to within a multiplicative factor in [ 1/2βˆ’Ο΅,1]12italic-Ο΅1[\,1/2-\epsilon,1\,], for all Ο΅>0italic-Ο΅0\epsilon>0. For all Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}), we present a Monte-Carlo O​(n)𝑂𝑛O(n)-time algorithm approximating the average distance of any graph metric to within a multiplicative factor in [ 1βˆ’Ο΅,1+Ο΅]1italic-Ο΅1italic-Ο΅[\,1-\epsilon,1+\epsilon\,]. But for general metrics, we do not know whether the O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) running time of Barhum, Goldreich and Shraibman can be improved to O​(n/Ο΅2βˆ’Ξ©β€‹(1))𝑂𝑛superscriptitalic-Ο΅2Ξ©1O(n/\epsilon^{2-\Omega(1)}).

2 Definitions and preliminaries

For a metric space ([n],d)delimited-[]𝑛𝑑([n],d),

rΒ―Β―π‘Ÿ\displaystyle\bar{r}≑\displaystyle\equiv1n2β‹…βˆ‘x,y∈[n]d​(x,y),β‹…1superscript𝑛2subscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,d\left(x,y\right),(1)
pβˆ—superscript𝑝\displaystyle p^{*}≑\displaystyle\equivargminp∈[n]βˆ‘x∈[n]d​(p,x),subscriptargmin𝑝delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯\displaystyle\mathop{\mathrm{argmin}}_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right),(2)

breaking ties arbitrarily in equation (2). So rΒ―Β―π‘Ÿ\bar{r} is the average distance in ([n],d)delimited-[]𝑛𝑑([n],d), and pβˆ—superscript𝑝p^{*} is a 111-median.

An algorithm A𝐴A with oracle access to d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,) is denoted by Adsuperscript𝐴𝑑A^{d} and may query d𝑑d on any (x,y)∈[n]Γ—[n]π‘₯𝑦delimited-[]𝑛delimited-[]𝑛(x,y)\in[n]\times[n] for d​(x,y)𝑑π‘₯𝑦d(x,y). In this paper, all Landau symbols (such as O​(β‹…)𝑂⋅O(\cdot), o​(β‹…)π‘œβ‹…o(\cdot), Ω​(β‹…)Ξ©β‹…\Omega(\cdot) and ω​(β‹…)πœ”β‹…\omega(\cdot)) are w.r.t. n𝑛n. The following result is due to Indyk.

Fact 1 ([9, 10]).

For all Ο΅>0italic-Ο΅0\epsilon>0, metric 111-median has a Monte Carlo (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximation O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2})-time algorithm with a failure probability of at most 1/e1𝑒1/e.

Henceforth, denote Indyk’s algorithm in Fact 1 by Indyk median. It is given nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+}, Ο΅>0italic-Ο΅0\epsilon>0 and oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,). The following fact on estimating the average distance is due to Barhum, Goldreich and Shraibman.

Fact 2 ([1]).

Given nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+}, Ο΅>0italic-Ο΅0\epsilon>0 and oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,), a real number in [(1βˆ’Ο΅)​rΒ―,(1+Ο΅)​rΒ―]1italic-Ο΅Β―π‘Ÿ1italic-Ο΅Β―π‘Ÿ\left[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,\right] can be found in O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time with probability at least 1/2+Ω​(1)12Ξ©11/2+\Omega(1).

Chebyshev’s inequality ([13]).

Let X𝑋X be a random variable with a finite expected value and a finite nonzero variance. Then for all kβ‰₯1π‘˜1k\geq 1,

Pr⁑[|Xβˆ’E[X]|β‰₯k​var(X)]≀1k2.Pr𝑋Edelimited-[]π‘‹π‘˜var𝑋1superscriptπ‘˜2\Pr\left[\,\left|\,X-\mathop{\mathrm{E}}[X]\,\right|\geq k\sqrt{\mathop{\mathrm{var}}(X)}\,\right]\leq\frac{1}{k^{2}}.

3 Las Vegas approximation for metric 111-median selection

This section presents a randomized algorithm that always outputs a (2+ϡ)2italic-ϡ(2+\epsilon)-approximate 111-median, where ϡ∈(0,1)italic-ϡ01\epsilon\in(0,1). Clearly,

βˆ‘x∈[n]d​(pβˆ—,x)=(2)minp∈[n]β€‹βˆ‘x∈[n]d​(p,x)≀1nβ‹…βˆ‘p∈[n]βˆ‘x∈[n]d​(p,x)=(1)n​rΒ―.superscript(2)subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯subscript𝑝delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯β‹…1𝑛subscript𝑝delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯superscript(1)π‘›Β―π‘Ÿ\displaystyle\sum_{x\in[n]}\,d\left(p^{*},x\right)\stackrel{{\scriptstyle\text{(\ref{optimalpoint})}}}{{=}}\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\leq\frac{1}{n}\cdot\sum_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\stackrel{{\scriptstyle\text{(\ref{averagedistance})}}}{{=}}n\bar{r}.(3)

For each permutation Ο€:[n]β†’[n]:πœ‹β†’delimited-[]𝑛delimited-[]𝑛\pi\colon[n]\to[n],

βˆ‘i=1⌊n/2βŒ‹d​(π​(2​iβˆ’1),π​(2​i))β‰€βˆ‘i=1⌊n/2βŒ‹d​(pβˆ—,π​(2​iβˆ’1))+d​(pβˆ—,π​(2​i))β‰€βˆ‘x∈[n]d​(pβˆ—,x),superscriptsubscript𝑖1𝑛2π‘‘πœ‹2𝑖1πœ‹2𝑖superscriptsubscript𝑖1𝑛2𝑑superscriptπ‘πœ‹2𝑖1𝑑superscriptπ‘πœ‹2𝑖subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\pi\left(2i-1\right),\pi\left(2i\right)\right)\leq\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(p^{*},\pi\left(2i-1\right)\right)+d\left(p^{*},\pi\left(2i\right)\right)\leq\sum_{x\in[n]}\,d\left(p^{*},x\right),(4)

where the first and the second inequalities follow from the triangle inequality and the injectivity of Ο€πœ‹\pi.

Lemma 3.

When line 5 of Las Vegas median in Fig. 1 is run, z𝑧z is a (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate 111-median.

Proof.

The condition in line 4 of Las Vegas median implies

βˆ‘x∈[n]d​(z,x)≀(4)(2+Ο΅)β€‹βˆ‘x∈[n]d​(pβˆ—,x)=(2)(2+Ο΅)​minp∈[n]β€‹βˆ‘x∈[n]d​(p,x).superscript(4)subscriptπ‘₯delimited-[]𝑛𝑑𝑧π‘₯2italic-Ο΅subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯superscript(2)2italic-Ο΅subscript𝑝delimited-[]𝑛subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯\sum_{x\in[n]}\,d\left(z,x\right)\stackrel{{\scriptstyle\text{(\ref{matchingsizeandoptimalsumofdistances})}}}{{\leq}}\left(2+\epsilon\right)\sum_{x\in[n]}\,d\left(p^{*},x\right)\stackrel{{\scriptstyle\text{(\ref{optimalpoint})}}}{{=}}\left(2+\epsilon\right)\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right).

So when line 5 is run, it returns a (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate 111-median. ∎

1:  while true do
2:     z←Indyk mediand​(n,Ο΅/8)←𝑧superscriptIndyk median𝑑𝑛italic-Ο΅8z\leftarrow\text{\sf Indyk median}^{d}(n,\epsilon/8);
3:     Pick independent and uniformly random permutations 𝝅1subscript𝝅1\boldsymbol{\pi}_{1}, 𝝅2subscript𝝅2\boldsymbol{\pi}_{2}, ……\ldots, 𝝅80β€‹βŒˆ1/Ο΅βŒ‰:[n]β†’[n]:subscript𝝅801italic-Ο΅β†’delimited-[]𝑛delimited-[]𝑛\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}\colon[n]\to[n];
4:     if there exists j∈[⌈1/Ο΅βŒ‰]𝑗delimited-[]1italic-Ο΅j\in[\lceil 1/\epsilon\rceil] satisfying βˆ‘x∈[n]d​(z,x)≀(2+Ο΅)β€‹βˆ‘i=1⌊n/2βŒ‹d​(𝝅j​(2​iβˆ’1),𝝅j​(2​i))subscriptπ‘₯delimited-[]𝑛𝑑𝑧π‘₯2italic-Ο΅superscriptsubscript𝑖1𝑛2𝑑subscript𝝅𝑗2𝑖1subscript𝝅𝑗2𝑖\sum_{x\in[n]}\,d(z,x)\leq(2+\epsilon)\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}_{j}(2i-1),\boldsymbol{\pi}_{j}(2i)) then
5:        return  z𝑧z;
6:     end if
7:  end while
Figure 1: Algorithm Las Vegas median with oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,) and with inputs nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+} and ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1)

Inequalities (3)–(4) yield the following.

Lemma 4.

For each permutation Ο€:[n]β†’[n]:πœ‹β†’delimited-[]𝑛delimited-[]𝑛\pi\colon[n]\to[n],

βˆ‘i=1⌊n/2βŒ‹d​(π​(2​iβˆ’1),π​(2​i))≀n​rΒ―.superscriptsubscript𝑖1𝑛2π‘‘πœ‹2𝑖1πœ‹2π‘–π‘›Β―π‘Ÿ\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\pi\left(2i-1\right),\pi\left(2i\right)\right)\leq n\bar{r}.
Lemma 5.

For a uniformly random permutation 𝛑:[n]β†’[n]:𝛑→delimited-[]𝑛delimited-[]𝑛\boldsymbol{\pi}\colon[n]\to[n],

E𝝅[βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))]=⌊n2βŒ‹β‹…n​rΒ―nβˆ’1.subscriptE𝝅delimited-[]superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖⋅𝑛2π‘›Β―π‘Ÿπ‘›1\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right]=\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{n\bar{r}}{n-1}.
Proof.

For each i∈[⌊n/2βŒ‹]𝑖delimited-[]𝑛2i\in[\lfloor n/2\rfloor], {𝝅​(2​iβˆ’1),𝝅​(2​i)}𝝅2𝑖1𝝅2𝑖\{\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\} is a uniformly random size-222 subset of [n]delimited-[]𝑛[n], implying

E𝝅[d​(𝝅​(2​iβˆ’1),𝝅​(2​i))]subscriptE𝝅delimited-[]𝑑𝝅2𝑖1𝝅2𝑖\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right]=\displaystyle=1nβ‹…(nβˆ’1)β‹…βˆ‘distinct xy∈[n]d​(x,y)β‹…1⋅𝑛𝑛1subscriptdistinct xy∈[n]𝑑π‘₯𝑦\displaystyle\frac{1}{n\cdot(n-1)}\cdot\sum_{\text{distinct $x$, $y\in[n]$}}\,d\left(x,y\right)
=\displaystyle=1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]d​(x,y)β‹…1⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦\displaystyle\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d\left(x,y\right)
=(1)superscript(1)\displaystyle\stackrel{{\scriptstyle\text{(\ref{averagedistance})}}}{{=}}n​rΒ―nβˆ’1,π‘›Β―π‘Ÿπ‘›1\displaystyle\frac{n\bar{r}}{n-1},

where the second equality follows from the identity of indiscernibles. Finally, use the linearity of expectation. ∎

Lemma 6.

For all ϡ∈(0,1)italic-ϡ01\epsilon\in(0,1) and in each iteration of the while loop of Las Vegas median,

Pr⁑[βˆƒj∈[80β‹…βŒˆ1Ο΅βŒ‰],βˆ‘i=1⌊n/2βŒ‹d​(𝝅j​(2​iβˆ’1),𝝅j​(2​i))β‰₯(12βˆ’Ο΅8)​n​rΒ―]β‰₯0.9,Pr𝑗delimited-[]β‹…801italic-Ο΅superscriptsubscript𝑖1𝑛2𝑑subscript𝝅𝑗2𝑖1subscript𝝅𝑗2𝑖12italic-Ο΅8π‘›Β―π‘Ÿ0.9\displaystyle\Pr\left[\,\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right]\geq 0.9,(5)

where the probability is taken over 𝛑1subscript𝛑1\boldsymbol{\pi}_{1}, 𝛑2subscript𝛑2\boldsymbol{\pi}_{2}, ……\ldots, 𝛑80β€‹βŒˆ1/Ο΅βŒ‰subscript𝛑801italic-Ο΅\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil} in line 3 of Las Vegas median.

Proof.

Let 𝝅:[n]β†’[n]:𝝅→delimited-[]𝑛delimited-[]𝑛\boldsymbol{\pi}\colon[n]\to[n] be a uniformly random permutation and

Ξ±=Pr𝝅⁑[βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‰₯(12βˆ’Ο΅8)​n​rΒ―].𝛼subscriptPr𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖12italic-Ο΅8π‘›Β―π‘Ÿ\displaystyle\alpha=\Pr_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right].(6)

So by Lemma 4,

E𝝅[βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))]≀α​n​rΒ―+(1βˆ’Ξ±)​(12βˆ’Ο΅8)​n​rΒ―.subscriptE𝝅delimited-[]superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2π‘–π›Όπ‘›Β―π‘Ÿ1𝛼12italic-Ο΅8π‘›Β―π‘Ÿ\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]\leq\alpha n\bar{r}+\left(1-\alpha\right)\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}.

This and Lemma 5 imply Ξ±β‰₯Ο΅/8𝛼italic-Ο΅8\alpha\geq\epsilon/8. So the left-hand side of inequality (5) is at least 1βˆ’(1βˆ’Ο΅/8)80β€‹βŒˆ1/Ο΅βŒ‰β‰₯0.91superscript1italic-Ο΅8801italic-Ο΅0.91-(1-\epsilon/8)^{80\lceil 1/\epsilon\rceil}\geq 0.9. ∎

Lemma 7.

For all ϡ∈(0,1)italic-ϡ01\epsilon\in(0,1) and in each iteration of the while loop of Las Vegas median,

Pr[(βˆ‘x∈[n]d(z,x)≀(1+Ο΅8)minp∈[n]βˆ‘x∈[n]d(p,x))\displaystyle\Pr\left[\,\left(\sum_{x\in[n]}\,d\left(z,x\right)\leq\left(1+\frac{\epsilon}{8}\right)\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\right)\right.(7)
∧(βˆƒj∈[80β‹…βŒˆ1Ο΅βŒ‰],βˆ‘i=1⌊n/2βŒ‹d​(𝝅j​(2​iβˆ’1),𝝅j​(2​i))β‰₯(12βˆ’Ο΅8)​n​rΒ―)formulae-sequence𝑗delimited-[]β‹…801italic-Ο΅superscriptsubscript𝑖1𝑛2𝑑subscript𝝅𝑗2𝑖1subscript𝝅𝑗2𝑖12italic-Ο΅8π‘›Β―π‘Ÿ\displaystyle\left.\land\left(\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right)\right.
∧(βˆƒj∈[80β‹…βŒˆ1Ο΅βŒ‰],βˆ‘x∈[n]d(z,x)≀(2+Ο΅)βˆ‘i=1⌊n/2βŒ‹d(𝝅j(2iβˆ’1),𝝅j(2i)))]\displaystyle\left.\land\left(\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{x\in[n]}\,d\left(z,x\right)\leq\left(2+\epsilon\right)\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\right)\,\right]
=\displaystyle=12+Ω​(1),12Ξ©1\displaystyle\frac{1}{2}+\Omega(1),

where the probability is taken over 𝛑1subscript𝛑1\boldsymbol{\pi}_{1}, 𝛑2subscript𝛑2\boldsymbol{\pi}_{2}, ……\ldots, 𝛑80β€‹βŒˆ1/Ο΅βŒ‰subscript𝛑801italic-Ο΅\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil} and the random coin tosses of Indyk median.

Proof.

By Fact 1 and line 2 of Las Vegas median, the first condition within Pr⁑[β‹…]Prβ‹…\Pr[\cdot] in equation (7) holds with probability at least 1βˆ’1/e11𝑒1-1/e over the random coin tosses of Indyk median. By Lemma 6, the second condition holds with probability at least 0.90.90.9 over 𝝅1subscript𝝅1\boldsymbol{\pi}_{1}, 𝝅2subscript𝝅2\boldsymbol{\pi}_{2}, ……\ldots, 𝝅80β€‹βŒˆ1/Ο΅βŒ‰subscript𝝅801italic-Ο΅\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}. In summary, the first two conditions hold simultaneously with probability at least (1βˆ’1/e)β‹…0.9=1/2+Ω​(1)β‹…11𝑒0.912Ξ©1(1-1/e)\cdot 0.9=1/2+\Omega(1) (note that the random coin tosses of Indyk median are independent of 𝝅1subscript𝝅1\boldsymbol{\pi}_{1}, 𝝅2subscript𝝅2\boldsymbol{\pi}_{2}, ……\ldots, 𝝅80β€‹βŒˆ1/Ο΅βŒ‰subscript𝝅801italic-Ο΅\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}). Finally, the first two conditions together imply the third by inequality (3) and the easy fact that

(1+Ο΅8)≀(2+Ο΅)​(12βˆ’Ο΅8).1italic-Ο΅82italic-Ο΅12italic-Ο΅8\left(1+\frac{\epsilon}{8}\right)\leq\left(2+\epsilon\right)\left(\frac{1}{2}-\frac{\epsilon}{8}\right).

∎

Theorem 8.

For all ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1), metric 111-median has a randomized algorithm that (1) always outputs a (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate solution in expected O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time and (2) outputs a (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximate solution in O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time with probability Ω​(1)Ξ©1\Omega(1).

Proof.

By Lemma 7, each execution of lines 4–5 of Las Vegas median returns with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1). So the expected number of iterations is O​(1)𝑂1O(1). By Fact 1, line 2 takes O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time. Line 3 takes 80β€‹βŒˆ1/Ο΅βŒ‰β‹…O​(n)β‹…801italic-ϡ𝑂𝑛80\lceil 1/\epsilon\rceil\cdot O(n) time by the Knuth shuffle. Clearly, lines 4–5 take O​(n/Ο΅)𝑂𝑛italic-Ο΅O(n/\epsilon) time. In summary, the expected running time of Las Vegas median is O​(1)β‹…O​(n/Ο΅2)=O​(n/Ο΅2)⋅𝑂1𝑂𝑛superscriptitalic-Ο΅2𝑂𝑛superscriptitalic-Ο΅2O(1)\cdot O(n/\epsilon^{2})=O(n/\epsilon^{2}). To prevent Las Vegas median from running forever, find a 111-median by brute force (which obviously takes O​(n2)𝑂superscript𝑛2O(n^{2}) time) after n2superscript𝑛2n^{2} steps of computation. By Lemma 3, Las Vegas median is (2+Ο΅)2italic-Ο΅(2+\epsilon)-approximate.

By Lemma 7, z𝑧z is (1+Ο΅/8)1italic-Ο΅8(1+\epsilon/8)-approximate and is also returned in line 5 with probability Ω​(1)Ξ©1\Omega(1) in the first (in fact, any) iteration. Finally, the previous paragraph has shown each iteration to take O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) time. ∎

By Fact 1, Indyk median satisfies condition (2) in Theorem 8. But it does not satisfy condition (1).

We now justify the optimality of the ratio of 2+Ο΅2italic-Ο΅2+\epsilon in Theorem 8. Let A𝐴A be a randomized algorithm that always outputs a (2βˆ’Ο΅)2italic-Ο΅(2-\epsilon)-approximate 111-median. Furthermore, denote by p∈[n]𝑝delimited-[]𝑛p\in[n] (resp., QβŠ†[n]Γ—[n]𝑄delimited-[]𝑛delimited-[]𝑛Q\subseteq[n]\times[n]) the output (resp., the set of queries as unordered pairs) of Ad1​(n)superscript𝐴subscript𝑑1𝑛A^{d_{1}}(n), where d1subscript𝑑1d_{1} is the discrete metric (i.e., d1​(x,y)=1subscript𝑑1π‘₯𝑦1d_{1}(x,y)=1 and d1​(x,x)=0subscript𝑑1π‘₯π‘₯0d_{1}(x,x)=0 for all distinct xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n]). Without loss of generality, assume (p,y)∈Q𝑝𝑦𝑄(p,y)\in Q for all y∈[n]βˆ–{p}𝑦delimited-[]𝑛𝑝y\in[n]\setminus\{p\} by adding dummy queries. So the queries in Q𝑄Q witness that

βˆ‘y∈[n]βˆ–{p}d1​(p,y)=nβˆ’1.subscript𝑦delimited-[]𝑛𝑝subscript𝑑1𝑝𝑦𝑛1\displaystyle\sum_{y\in[n]\setminus\{p\}}\,d_{1}\left(p,y\right)=n-1.(8)

Assume without loss of generality that A𝐴A never queries for the distance from a point to itself.

In the sequel, consider the case that |Q|<Ο΅β‹…(nβˆ’1)2/8𝑄⋅italic-Ο΅superscript𝑛128|Q|<\epsilon\cdot(n-1)^{2}/8. By the averaging argument, there exists a point p^∈[n]βˆ–{p}^𝑝delimited-[]𝑛𝑝\hat{p}\in[n]\setminus\{p\} involved in at most 2β‹…|Q|/(nβˆ’1)β‹…2𝑄𝑛12\cdot|Q|/(n-1) queries in Q𝑄Q (note that each query involves two points). Because every function f:[n]Γ—[n]β†’[ 0,∞):𝑓→delimited-[]𝑛delimited-[]𝑛 0f\colon[n]\times[n]\to[\,0,\infty\,) with

{f​(x,y)∣(x,y∈[n])∧(xβ‰ y)}βŠ†{12,1}conditional-set𝑓π‘₯𝑦π‘₯𝑦delimited-[]𝑛π‘₯𝑦121\left\{f\left(x,y\right)\mid\left(x,y\in[n]\right)\land\left(x\neq y\right)\right\}\subseteq\left\{\frac{1}{2},1\right\}

satisfies the triangle inequality, A𝐴A cannot exclude the possibility that d1​(p^,y)=1/2subscript𝑑1^𝑝𝑦12d_{1}(\hat{p},y)=1/2 for all y∈[n]βˆ–{p^}𝑦delimited-[]𝑛^𝑝y\in[n]\setminus\{\hat{p}\} satisfying (p^,y)βˆ‰Q^𝑝𝑦𝑄(\hat{p},y)\notin Q. In summary, A𝐴A cannot rule out the case that

βˆ‘y∈[n]d1​(p^,y)subscript𝑦delimited-[]𝑛subscript𝑑1^𝑝𝑦\displaystyle\sum_{y\in[n]}\,d_{1}\left(\hat{p},y\right)≀\displaystyle\leq2β‹…|Q|nβˆ’1β‹…1+(nβˆ’1βˆ’2β‹…|Q|nβˆ’1)β‹…12<(12+Ο΅8)β‹…(nβˆ’1).β‹…β‹…2𝑄𝑛11⋅𝑛1β‹…2𝑄𝑛112β‹…12italic-Ο΅8𝑛1\displaystyle\frac{2\cdot|Q|}{n-1}\cdot 1+\left(n-1-\frac{2\cdot|Q|}{n-1}\right)\cdot\frac{1}{2}<\left(\frac{1}{2}+\frac{\epsilon}{8}\right)\cdot(n-1).\,\,\,\,\,(9)

Equations (8)–(9) contradict the guarantee that p𝑝p is (2βˆ’Ο΅)2italic-Ο΅(2-\epsilon)-approximate. Consequently, the case that |Q|<Ο΅β‹…(nβˆ’1)2/8𝑄⋅italic-Ο΅superscript𝑛128|Q|<\epsilon\cdot(n-1)^{2}/8 should never happen. The next theorem summarizes the above.

Theorem 9.

Metric 111-median has no randomized algorithm that always outputs a (2βˆ’Ο΅)2italic-Ο΅(2-\epsilon)-approximate solution and that makes fewer than Ο΅β‹…(nβˆ’1)2/8β‹…italic-Ο΅superscript𝑛128\epsilon\cdot(n-1)^{2}/8 queries with a positive probability given oracle access to the discrete metric, for any constant ϡ∈(0,1)italic-Ο΅01\epsilon\in(0,1).

Lemmas 4 and 6 yield the following estimation of the average distance.

Theorem 10.

Given nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+}, Ο΅>0italic-Ο΅0\epsilon>0 and oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,), a real number in [(1/2βˆ’Ο΅)​rΒ―,rΒ―]12italic-Ο΅Β―π‘ŸΒ―π‘Ÿ[\,(1/2-\epsilon)\bar{r},\bar{r}\,] can be found in O​(n/Ο΅)𝑂𝑛italic-Ο΅O(n/\epsilon) time with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1).

Proof.

By Lemmas 4 and 6,

1nβ‹…maxj∈[80β‹…βŒˆ1/Ο΅βŒ‰]β€‹βˆ‘i=1⌊n/2βŒ‹d​(𝝅j​(2​iβˆ’1),𝝅j​(2​i))∈[(12βˆ’Ο΅8)​rΒ―,rΒ―]β‹…1𝑛subscript𝑗delimited-[]β‹…801italic-Ο΅superscriptsubscript𝑖1𝑛2𝑑subscript𝝅𝑗2𝑖1subscript𝝅𝑗2𝑖12italic-Ο΅8Β―π‘ŸΒ―π‘Ÿ\displaystyle\frac{1}{n}\cdot\max_{j\in[80\cdot\lceil 1/\epsilon\rceil]}\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\in\left[\,\left(\frac{1}{2}-\frac{\epsilon}{8}\right)\bar{r},\bar{r}\,\right](10)

with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1). The Knuth shuffle picks 𝝅1subscript𝝅1\boldsymbol{\pi}_{1}, 𝝅2subscript𝝅2\boldsymbol{\pi}_{2}, ……\ldots, 𝝅80β€‹βŒˆ1/Ο΅βŒ‰subscript𝝅801italic-Ο΅\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil} in 80β€‹βŒˆ1/Ο΅βŒ‰β‹…O​(n)β‹…801italic-ϡ𝑂𝑛80\lceil 1/\epsilon\rceil\cdot O(n) time. Then the left-hand side of relation (10) can be calculated in O​(n/Ο΅)𝑂𝑛italic-Ο΅O(n/\epsilon) time. ∎

Note that the estimation of the average distance in Theorem 10 has only one-sided error. The time complexity (resp., approximation ratio) in Theorem 10 is better (resp., worse) than that in Fact 2.

4 Estimating the average distance of a graph metric

Throughout this section, take any Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}) less than a small constant, e.g., Ο΅=10βˆ’100italic-Ο΅superscript10100\epsilon=10^{-100}. Define

δ𝛿\displaystyle\delta≑\displaystyle\equivΟ΅21010,superscriptitalic-Ο΅2superscript1010\displaystyle\frac{\epsilon^{2}}{10^{10}},(11)
rπ‘Ÿ\displaystyle r≑\displaystyle\equiv1nβ‹…βˆ‘x∈[n]d​(pβˆ—,x),β‹…1𝑛subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯\displaystyle\frac{1}{n}\cdot\sum_{x\in[n]}\,d\left(p^{*},x\right),(12)

where pβˆ—superscript𝑝p^{*} is as in equation (2). As Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}), Ξ΄=ω​(1/n)π›Ώπœ”1𝑛\delta=\omega(1/\sqrt{n}) by equation (11).

Lemma 11.

r¯≀2​rΒ―π‘Ÿ2π‘Ÿ\bar{r}\leq 2r.

Proof.

By equation (1) and the triangle inequality,

rΒ―Β―π‘Ÿ\displaystyle\bar{r}≀\displaystyle\leq1n2β‹…βˆ‘x,y∈[n](d​(pβˆ—,x)+d​(pβˆ—,y))β‹…1superscript𝑛2subscriptπ‘₯𝑦delimited-[]𝑛𝑑superscript𝑝π‘₯𝑑superscript𝑝𝑦\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,\left(d\left(p^{*},x\right)+d\left(p^{*},y\right)\right)(13)
=\displaystyle=1n2β‹…nβ‹…(βˆ‘x∈[n]d​(pβˆ—,x)+βˆ‘y∈[n]d​(pβˆ—,y))β‹…1superscript𝑛2𝑛subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯subscript𝑦delimited-[]𝑛𝑑superscript𝑝𝑦\displaystyle\frac{1}{n^{2}}\cdot n\cdot\left(\sum_{x\in[n]}\,d\left(p^{*},x\right)+\sum_{y\in[n]}\,d\left(p^{*},y\right)\right)
=\displaystyle=2nβ‹…βˆ‘x∈[n]d​(pβˆ—,x).β‹…2𝑛subscriptπ‘₯delimited-[]𝑛𝑑superscript𝑝π‘₯\displaystyle\frac{2}{n}\cdot\sum_{x\in[n]}\,d\left(p^{*},x\right).

Equations (12)–(13) complete the proof. ∎

1:  Pick a uniformly random permutation 𝝅:[n]β†’[n]:𝝅→delimited-[]𝑛delimited-[]𝑛\boldsymbol{\pi}\colon[n]\to[n];
2:  return  βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…2/nsuperscriptsubscript𝑖1𝑛2⋅𝑑𝝅2𝑖1𝝅2𝑖2𝑛\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i))\cdot 2/n;
Figure 2: Algorithm average distance with oracle access to a metric d:[n]Γ—[n]β†’[ 0,∞):𝑑→delimited-[]𝑛delimited-[]𝑛 0d\colon[n]\times[n]\to[\,0,\infty\,) and with inputs nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+} and Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}).

As in line 1 of average distance in Fig. 2, let 𝝅:[n]β†’[n]:𝝅→delimited-[]𝑛delimited-[]𝑛\boldsymbol{\pi}\colon[n]\to[n] be a uniformly random permutation. Clearly,

E𝝅[(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))2]subscriptE𝝅delimited-[]superscriptsuperscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖2\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)^{2}\,\right]
=\displaystyle=E𝝅[βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…βˆ‘j=1⌊n/2βŒ‹d​(𝝅​(2​jβˆ’1),𝝅​(2​j))]subscriptE𝝅delimited-[]superscriptsubscript𝑖1𝑛2⋅𝑑𝝅2𝑖1𝝅2𝑖superscriptsubscript𝑗1𝑛2𝑑𝝅2𝑗1𝝅2𝑗\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot\sum_{j=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]
=\displaystyle=βˆ‘distinct i,j=1⌊n/2βŒ‹E𝝅[d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…d​(𝝅​(2​jβˆ’1),𝝅​(2​j))]superscriptsubscriptdistinct i,j=1𝑛2subscriptE𝝅delimited-[]⋅𝑑𝝅2𝑖1𝝅2𝑖𝑑𝝅2𝑗1𝝅2𝑗\displaystyle\sum_{\text{distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]
+\displaystyle+βˆ‘i=1⌊n/2βŒ‹E𝝅[d2​(𝝅​(2​iβˆ’1),𝝅​(2​i))],superscriptsubscript𝑖1𝑛2subscriptE𝝅delimited-[]superscript𝑑2𝝅2𝑖1𝝅2𝑖\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right],(15)

where the last equality follows from the linearity of expectation and the separation of pairs (i,j)𝑖𝑗(i,j) according to whether i=j𝑖𝑗i=j. The next three lemmas analyze the variance of

βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)).superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right).
Lemma 12.
βˆ‘distinct i,j=1⌊n/2βŒ‹E𝝅[d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…d​(𝝅​(2​jβˆ’1),𝝅​(2​j))]≀14β‹…(1+O​(1n))​n2​rΒ―2.superscriptsubscriptdistinct i,j=1𝑛2subscriptE𝝅delimited-[]⋅𝑑𝝅2𝑖1𝝅2𝑖𝑑𝝅2𝑗1𝝅2𝑗⋅141𝑂1𝑛superscript𝑛2superscriptΒ―π‘Ÿ2\displaystyle\sum_{\text{\rm distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]\leq\frac{1}{4}\cdot\left(1+O\left(\frac{1}{n}\right)\right)n^{2}\bar{r}^{2}.
Proof.

Pick any distinct i𝑖i, j∈[⌊n/2βŒ‹]𝑗delimited-[]𝑛2j\in[\,\lfloor n/2\rfloor\,]. Clearly,

{𝝅​(2​iβˆ’1),𝝅​(2​i),𝝅​(2​jβˆ’1),𝝅​(2​j)}𝝅2𝑖1𝝅2𝑖𝝅2𝑗1𝝅2𝑗\left\{\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right),\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right\}

is a uniformly random size-444 subset of [n]delimited-[]𝑛[n]. So

E𝝅[d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…d​(𝝅​(2​jβˆ’1),𝝅​(2​j))]subscriptE𝝅delimited-[]⋅𝑑𝝅2𝑖1𝝅2𝑖𝑑𝝅2𝑗1𝝅2𝑗\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\right)\cdot d\left(\boldsymbol{\pi}(2j-1),\boldsymbol{\pi}(2j)\right)\,\right]
=\displaystyle=1nβ‹…(nβˆ’1)β‹…(nβˆ’2)β‹…(nβˆ’3)β‹…βˆ‘distinct uvxy∈[n]d​(u,v)β‹…d​(x,y).β‹…1⋅𝑛𝑛1𝑛2𝑛3subscriptdistinct uvxy∈[n]⋅𝑑𝑒𝑣𝑑π‘₯𝑦\displaystyle\frac{1}{n\cdot(n-1)\cdot(n-2)\cdot(n-3)}\cdot\sum_{\text{distinct $u$, $v$, $x$, $y\in[n]$}}\,d\left(u,v\right)\cdot d\left(x,y\right).

Clearly,

βˆ‘distinct uvxy∈[n]d​(u,v)β‹…d​(x,y)subscriptdistinct uvxy∈[n]⋅𝑑𝑒𝑣𝑑π‘₯𝑦\displaystyle\sum_{\text{distinct $u$, $v$, $x$, $y\in[n]$}}\,d\left(u,v\right)\cdot d\left(x,y\right)≀\displaystyle\leqβˆ‘u,v,x,y∈[n]d​(u,v)β‹…d​(x,y)subscript𝑒𝑣π‘₯𝑦delimited-[]𝑛⋅𝑑𝑒𝑣𝑑π‘₯𝑦\displaystyle\sum_{u,v,x,y\in[n]}\,d\left(u,v\right)\cdot d\left(x,y\right)
=\displaystyle=βˆ‘u,v∈[n]d​(u,v)β‹…βˆ‘x,y∈[n]d​(x,y)subscript𝑒𝑣delimited-[]𝑛⋅𝑑𝑒𝑣subscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦\displaystyle\sum_{u,v\in[n]}\,d\left(u,v\right)\cdot\sum_{x,y\in[n]}\,d\left(x,y\right)
=\displaystyle=(βˆ‘x,y∈[n]d​(x,y))2.superscriptsubscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦2\displaystyle\left(\sum_{x,y\in[n]}\,d\left(x,y\right)\right)^{2}.

In summary,

βˆ‘distinct i,j=1⌊n/2βŒ‹E𝝅[d​(𝝅​(2​iβˆ’1),𝝅​(2​i))β‹…d​(𝝅​(2​jβˆ’1),𝝅​(2​j))]superscriptsubscriptdistinct i,j=1𝑛2subscriptE𝝅delimited-[]⋅𝑑𝝅2𝑖1𝝅2𝑖𝑑𝝅2𝑗1𝝅2𝑗\displaystyle\sum_{\text{\rm distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]
≀\displaystyle\leq⌊n2βŒ‹β€‹(⌊n2βŒ‹βˆ’1)β‹…1nβ‹…(nβˆ’1)β‹…(nβˆ’2)β‹…(nβˆ’3)β‹…(βˆ‘x,y∈[n]d​(x,y))2.⋅𝑛2𝑛211⋅𝑛𝑛1𝑛2𝑛3superscriptsubscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦2\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\left(\left\lfloor\frac{n}{2}\right\rfloor-1\right)\cdot\frac{1}{n\cdot(n-1)\cdot(n-2)\cdot(n-3)}\cdot\left(\sum_{x,y\in[n]}\,d\left(x,y\right)\right)^{2}.

This and equation (1) complete the proof. ∎

Define

Δ≑maxx,y∈[n]⁑d​(x,y)Ξ”subscriptπ‘₯𝑦delimited-[]𝑛𝑑π‘₯𝑦\Delta\equiv\max_{x,y\in[n]}\,d(x,y)

to be the diameter of ([n],d)delimited-[]𝑛𝑑([n],d).

Lemma 13.

If

δ​n​rβ‰₯Ξ”π›Ώπ‘›π‘ŸΞ”\displaystyle\delta nr\geq\Delta(16)

then

βˆ‘i=1⌊n/2βŒ‹E𝝅[d2​(𝝅​(2​iβˆ’1),𝝅​(2​i))]≀(12+O​(1n))​(δ​n2​r​rΒ―+Ξ΄2​n​r2).superscriptsubscript𝑖1𝑛2subscriptE𝝅delimited-[]superscript𝑑2𝝅2𝑖1𝝅2𝑖12𝑂1𝑛𝛿superscript𝑛2π‘ŸΒ―π‘Ÿsuperscript𝛿2𝑛superscriptπ‘Ÿ2\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]\leq\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).(17)
Proof.

Clearly, {𝝅​(2​iβˆ’1),𝝅​(2​i)}𝝅2𝑖1𝝅2𝑖\{\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\} is a uniformly random size-222 subset of [n]delimited-[]𝑛[n] for each i∈[⌊n/2βŒ‹]𝑖delimited-[]𝑛2i\in[\,\lfloor n/2\rfloor\,]. Therefore,

βˆ‘i=1⌊n/2βŒ‹E𝝅[d2​(𝝅​(2​iβˆ’1),𝝅​(2​i))]superscriptsubscript𝑖1𝑛2subscriptE𝝅delimited-[]superscript𝑑2𝝅2𝑖1𝝅2𝑖\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]=\displaystyle=βˆ‘i=1⌊n/2βŒ‹1nβ‹…(nβˆ’1)β‹…βˆ‘distinct xy∈[n]d2​(x,y)superscriptsubscript𝑖1𝑛2β‹…1⋅𝑛𝑛1subscriptdistinct xy∈[n]superscript𝑑2π‘₯𝑦\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\frac{1}{n\cdot(n-1)}\cdot\sum_{\text{distinct $x$, $y\in[n]$}}\,d^{2}\left(x,y\right)\,\,\,\,\,\,\,
≀\displaystyle\leqβˆ‘i=1⌊n/2βŒ‹1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]d2​(x,y)superscriptsubscript𝑖1𝑛2β‹…1⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]𝑛superscript𝑑2π‘₯𝑦\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d^{2}\left(x,y\right)
=\displaystyle=⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]d2​(x,y).⋅𝑛21⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]𝑛superscript𝑑2π‘₯𝑦\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d^{2}\left(x,y\right).

By inequality (16),

d​(x,y)≀δ​n​r𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ\displaystyle d\left(x,y\right)\leq\delta nr(19)

for all xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n].

By equations (1) and (4)–(19), the left-hand side of inequality (17) cannot exceed the optimal value of the following problem, called max square sum:

Find dx,yβˆˆβ„subscript𝑑π‘₯𝑦ℝd_{x,y}\in\mathbb{R} for all xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n] to maximize

⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]dx,y2⋅𝑛21⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]𝑛superscriptsubscript𝑑π‘₯𝑦2\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d_{x,y}^{2}(20)

subject to

1n2β‹…βˆ‘x,y∈[n]dx,y=rΒ―,β‹…1superscript𝑛2subscriptπ‘₯𝑦delimited-[]𝑛subscript𝑑π‘₯π‘¦Β―π‘Ÿ\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,d_{x,y}=\bar{r},(21)
βˆ€x,y∈[n],  0≀dx,y≀δ​n​r.formulae-sequencefor-allπ‘₯𝑦delimited-[]𝑛  0subscript𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ\displaystyle\forall x,y\in[n],\,\,0\leq d_{x,y}\leq\delta nr.(22)

Above, constraint (21) (resp., (22)) mimics equation (1) (resp., inequality (19) and the non-negativeness of distances). Appendix A bounds the optimal value of max square sum from above by

⌊n2βŒ‹β€‹1nβ‹…(nβˆ’1)β‹…(⌊n​r¯δ​rβŒ‹+1)β‹…(δ​n​r)2.⋅𝑛21⋅𝑛𝑛1π‘›Β―π‘Ÿπ›Ώπ‘Ÿ1superscriptπ›Ώπ‘›π‘Ÿ2\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\frac{1}{n\cdot(n-1)}\cdot\left(\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1\right)\cdot\left(\delta nr\right)^{2}.

This evaluates to be at most

(12+O​(1n))​(δ​n2​r​rΒ―+Ξ΄2​n​r2).12𝑂1𝑛𝛿superscript𝑛2π‘ŸΒ―π‘Ÿsuperscript𝛿2𝑛superscriptπ‘Ÿ2\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).

∎

Recall that the variance of any random variable X𝑋X equals E[X2]βˆ’(E[X])2Edelimited-[]superscript𝑋2superscriptEdelimited-[]𝑋2\mathop{\mathrm{E}}[X^{2}]-(\mathop{\mathrm{E}}[X])^{2}.

Lemma 14.

If δ​n​rβ‰₯Ξ”π›Ώπ‘›π‘ŸΞ”\delta nr\geq\Delta, then

var𝝅(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))≀(1+o​(1))​δ​n2​r2.subscriptvar𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖1π‘œ1𝛿superscript𝑛2superscriptπ‘Ÿ2\mathop{\mathrm{var}}_{\boldsymbol{\pi}}\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)\leq\left(1+o(1)\right)\delta n^{2}r^{2}.
Proof.

By equations (4)–(15) and Lemmas 12–13,

E𝝅[(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))2]subscriptE𝝅delimited-[]superscriptsuperscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖2\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)^{2}\,\right]
≀\displaystyle\leq14β‹…(1+O​(1n))​n2​rΒ―2+(12+O​(1n))​(δ​n2​r​rΒ―+Ξ΄2​n​r2).β‹…141𝑂1𝑛superscript𝑛2superscriptΒ―π‘Ÿ212𝑂1𝑛𝛿superscript𝑛2π‘ŸΒ―π‘Ÿsuperscript𝛿2𝑛superscriptπ‘Ÿ2\displaystyle\frac{1}{4}\cdot\left(1+O\left(\frac{1}{n}\right)\right)n^{2}\bar{r}^{2}+\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).

This and Lemma 5 imply

var𝝅(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))subscriptvar𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖\displaystyle\mathop{\mathrm{var}}_{\boldsymbol{\pi}}\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)
≀\displaystyle\leqO​(1n)β‹…n2​rΒ―2+(12+O​(1n))​(δ​n2​r​rΒ―+Ξ΄2​n​r2).⋅𝑂1𝑛superscript𝑛2superscriptΒ―π‘Ÿ212𝑂1𝑛𝛿superscript𝑛2π‘ŸΒ―π‘Ÿsuperscript𝛿2𝑛superscriptπ‘Ÿ2\displaystyle O\left(\frac{1}{n}\right)\cdot n^{2}\bar{r}^{2}+\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).

Finally, invoke Lemma 11 and recall that Ξ΄=ω​(1/n)π›Ώπœ”1𝑛\delta=\omega(1/\sqrt{n}). ∎

Lemma 15.

If δ​n​rβ‰₯Ξ”π›Ώπ‘›π‘ŸΞ”\delta nr\geq\Delta, then

Pr𝝅⁑[|(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))βˆ’12β‹…(1Β±O​(1n))​n​rΒ―|β‰₯k​(1+o​(1))​δ​n​r]≀1k2subscriptPr𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖⋅12plus-or-minus1𝑂1π‘›π‘›Β―π‘Ÿπ‘˜1π‘œ1π›Ώπ‘›π‘Ÿ1superscriptπ‘˜2\displaystyle\Pr_{\boldsymbol{\pi}}\left[\,\left|\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)-\frac{1}{2}\cdot\left(1\pm O\left(\frac{1}{n}\right)\right)n\bar{r}\,\right|\geq k\sqrt{\left(1+o(1)\right)\delta}\,nr\,\right]\leq\frac{1}{k^{2}}

for all k>1π‘˜1k>1.

Proof.

Use Chebyshev’s inequality and Lemmas 5 and 14. ∎

Lemma 16.

If δ​n​rβ‰₯Ξ”π›Ώπ‘›π‘ŸΞ”\delta nr\geq\Delta, then

Pr𝝅⁑[|(βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i)))βˆ’12β‹…(1Β±O​(1n))​n​rΒ―|β‰₯k​(1+o​(1))​δ​n​rΒ―]≀1k2subscriptPr𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖⋅12plus-or-minus1𝑂1π‘›π‘›Β―π‘Ÿπ‘˜1π‘œ1π›Ώπ‘›Β―π‘Ÿ1superscriptπ‘˜2\Pr_{\boldsymbol{\pi}}\left[\,\left|\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)-\frac{1}{2}\cdot\left(1\pm O\left(\frac{1}{n}\right)\right)n\bar{r}\,\right|\geq k\sqrt{\left(1+o(1)\right)\delta}\,n\bar{r}\,\right]\leq\frac{1}{k^{2}}

for all k>1π‘˜1k>1.

Proof.

By inequalities (3) and (12),

r≀rΒ―.π‘ŸΒ―π‘Ÿr\leq\bar{r}.

This and Lemma 15 complete the proof. ∎

We now arrive at an efficient estimation of the average distance on a graph.

Theorem 17.

Given nβˆˆβ„€+𝑛superscriptβ„€n\in\mathbb{Z}^{+}, Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}) and oracle access to a graph metric d:[n]Γ—[n]β†’β„•:𝑑→delimited-[]𝑛delimited-[]𝑛ℕd\colon[n]\times[n]\to\mathbb{N}, a real number in [(1βˆ’Ο΅)​rΒ―,(1+Ο΅)​rΒ―]1italic-Ο΅Β―π‘Ÿ1italic-Ο΅Β―π‘Ÿ[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,] can be found in O​(n)𝑂𝑛O(n) time with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1).

Proof.

Let G=([n],E)𝐺delimited-[]𝑛𝐸G=([n],E) be an undirected unweighted graph inducing the distance function d𝑑d. Then pick xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n] with d​(x,y)=Δ𝑑π‘₯𝑦Δd(x,y)=\Delta, i.e., (x,y)π‘₯𝑦(x,y) is a furthest pair of vertices of G𝐺G. Find a simple shortest xπ‘₯x-y𝑦y path, denoted (v0=x,v1,…,vΞ”=y)formulae-sequencesubscript𝑣0π‘₯subscript𝑣1…subscript𝑣Δ𝑦(v_{0}=x,v_{1},\ldots,v_{\Delta}=y), in G𝐺G. By equation (12),

rβ‰₯1nβ‹…βˆ‘i=0Ξ”d​(pβˆ—,vi).π‘Ÿβ‹…1𝑛superscriptsubscript𝑖0Δ𝑑superscript𝑝subscript𝑣𝑖\displaystyle r\geq\frac{1}{n}\cdot\sum_{i=0}^{\Delta}\,d\left(p^{*},v_{i}\right).(23)

Now,

βˆ‘i=0Ξ”d​(pβˆ—,vi)=12β‹…βˆ‘i=0Ξ”d​(pβˆ—,vi)+d​(pβˆ—,vΞ”βˆ’i)β‰₯12β‹…βˆ‘i=0Ξ”d​(vi,vΞ”βˆ’i)=12β‹…βˆ‘i=0Ξ”|Ξ”βˆ’2​i|β‰₯Ξ”24,superscriptsubscript𝑖0Δ𝑑superscript𝑝subscript𝑣𝑖⋅12superscriptsubscript𝑖0Δ𝑑superscript𝑝subscript𝑣𝑖𝑑superscript𝑝subscript𝑣Δ𝑖⋅12superscriptsubscript𝑖0Δ𝑑subscript𝑣𝑖subscript𝑣Δ𝑖⋅12superscriptsubscript𝑖0ΔΔ2𝑖superscriptΞ”24\displaystyle\sum_{i=0}^{\Delta}\,d\left(p^{*},v_{i}\right)=\frac{1}{2}\cdot\sum_{i=0}^{\Delta}d\left(p^{*},v_{i}\right)+d\left(p^{*},v_{\Delta-i}\right)\geq\frac{1}{2}\cdot\sum_{i=0}^{\Delta}\,d\left(v_{i},v_{\Delta-i}\right)=\frac{1}{2}\cdot\sum_{i=0}^{\Delta}\,\left|\,\Delta-2i\,\right|\geq\frac{\Delta^{2}}{4},(24)

where the first inequality (resp., the second equality) follows from the triangle inequality (resp., (v0,v1,…,vΞ”)subscript𝑣0subscript𝑣1…subscript𝑣Δ(v_{0},v_{1},\ldots,v_{\Delta}) being a shortest v0subscript𝑣0v_{0}-vΞ”subscript𝑣Δv_{\Delta} path).444It is easy to verify that βˆ‘i=0Ξ”|Ξ”βˆ’2​i|=(Ξ”+2)​Δ/2superscriptsubscript𝑖0ΔΔ2𝑖Δ2Ξ”2\sum_{i=0}^{\Delta}\,|\,\Delta-2i\,|=(\Delta+2)\Delta/2 if Δ≑0(mod2)Ξ”annotated0pmod2\Delta\equiv 0\pmod{2} and βˆ‘i=0Ξ”|Ξ”βˆ’2​i|=(Ξ”+1)2/2superscriptsubscript𝑖0ΔΔ2𝑖superscriptΞ”122\sum_{i=0}^{\Delta}\,|\,\Delta-2i\,|=(\Delta+1)^{2}/2 otherwise. By inequalities (23)–(24),

n​rβ‰₯Ξ”24.π‘›π‘ŸsuperscriptΞ”24\displaystyle nr\geq\frac{\Delta^{2}}{4}.(25)

Because d𝑑d is a graph metric, d​(x,y)β‰₯1𝑑π‘₯𝑦1d(x,y)\geq 1 for all distinct xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n]. So by equation (12),

rβ‰₯1nβ‹…βˆ‘x∈[n]βˆ–{pβˆ—} 1β‰₯12π‘Ÿβ‹…1𝑛subscriptπ‘₯delimited-[]𝑛superscript𝑝112\displaystyle r\geq\frac{1}{n}\cdot\sum_{x\in[n]\setminus\{p^{*}\}}\,1\geq\frac{1}{2}(26)

for all nβ‰₯2𝑛2n\geq 2.

By inequalities (25)–(26),

δ​n​rβ‰₯Ξ΄β‹…max⁑{Ξ”24,n2}.π›Ώπ‘›π‘Ÿβ‹…π›ΏsuperscriptΞ”24𝑛2\delta nr\geq\delta\cdot\max\left\{\frac{\Delta^{2}}{4},\frac{n}{2}\right\}.

So

δ​n​rβ‰₯Ξ”π›Ώπ‘›π‘ŸΞ”\displaystyle\delta nr\geq\Delta(27)

for all sufficiently large n𝑛n.555If Ξ”β‰₯4/δΔ4𝛿\Delta\geq 4/\delta, then δ​Δ2/4β‰₯Δ𝛿superscriptΞ”24Ξ”\delta\Delta^{2}/4\geq\Delta. Otherwise, δ​n/2β‰₯Δ𝛿𝑛2Ξ”\delta n/2\geq\Delta for all n>8/Ξ΄2𝑛8superscript𝛿2n>8/\delta^{2}. Finally, recall that Ξ΄=ω​(1/n)π›Ώπœ”1𝑛\delta=\omega(1/\sqrt{n}). By equation (11),

3​(1+o​(1))​δ≀0.1​ϡ31π‘œ1𝛿0.1italic-Ο΅\displaystyle 3\sqrt{\left(1+o(1)\right)\delta}\leq 0.1\,\epsilon(28)

for all sufficiently large n𝑛n. By inequalities (27)–(28), Lemma 16 with k=3π‘˜3k=3 and recalling that Ο΅=ω​(1/n1/4)italic-Ο΅πœ”1superscript𝑛14\epsilon=\omega(1/n^{1/4}),

Pr𝝅⁑[βˆ‘i=1⌊n/2βŒ‹d​(𝝅​(2​iβˆ’1),𝝅​(2​i))∈[(12βˆ’Ο΅2)​n​rΒ―,(12+Ο΅2)​n​rΒ―]]β‰₯1βˆ’19subscriptPr𝝅superscriptsubscript𝑖1𝑛2𝑑𝝅2𝑖1𝝅2𝑖12italic-Ο΅2π‘›Β―π‘Ÿ12italic-Ο΅2π‘›Β―π‘Ÿ119\displaystyle\Pr_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\right)\in\left[\,\left(\frac{1}{2}-\frac{\epsilon}{2}\right)n\bar{r},\left(\frac{1}{2}+\frac{\epsilon}{2}\right)n\bar{r}\,\right]\,\right]\geq 1-\frac{1}{9}(29)

for all sufficiently large n𝑛n. Consequently, the output of line 2 of average distance in Fig. 2 is in [(1βˆ’Ο΅)​rΒ―,(1+Ο΅)​rΒ―]1italic-Ο΅Β―π‘Ÿ1italic-Ο΅Β―π‘Ÿ[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,] with probability 1/2+Ω​(1)12Ξ©11/2+\Omega(1). Line 1 takes O​(n)𝑂𝑛O(n) time by the Knuth shuffle. Clearly, line 2 also takes O​(n)𝑂𝑛O(n) time. ∎

The time complexity of O​(n)𝑂𝑛O(n) in Theorem 17 is independent of Ο΅italic-Ο΅\epsilon. But for general metrics, we do not know whether the time complexity of O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2}) in Fact 2 can be improved to O​(n/Ο΅2βˆ’Ξ©β€‹(1))𝑂𝑛superscriptitalic-Ο΅2Ξ©1O(n/\epsilon^{2-\Omega(1)}).

Appendix A Analyzing max square sum

Max square sum has an optimal solution, denoted {d~x,yβˆˆβ„}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦ℝπ‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]}, because its feasible solutions (i.e., those satisfying constraints (21)–(22)) form a closed and bounded subset of ℝ(n2)superscriptℝsuperscript𝑛2\mathbb{R}^{(n^{2})}. (Recall from elementary mathematical analysis that a continuous real-valued function on a closed and bounded subset of ℝksuperscriptβ„π‘˜\mathbb{R}^{k} has a maximum value, where k<βˆžπ‘˜k<\infty.) Note that {d~x,yβˆˆβ„}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦ℝπ‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]} must be feasible to max square sum. Below is a consequence of constraint (21).

Lemma A.1.
|{(x,y)∈[n]Γ—[n]∣d~x,y=δ​n​r}|β‰€βŒŠn​r¯δ​rβŒ‹.conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿπ‘›Β―π‘Ÿπ›Ώπ‘Ÿ\displaystyle\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|\leq\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor.(30)
Proof.

Clearly,

n2​rΒ―=(21)βˆ‘x,y∈[n]d~x,yβ‰₯|{(x,y)∈[n]Γ—[n]∣d~x,y=δ​n​r}|⋅δ​n​r.superscript(21)superscript𝑛2Β―π‘Ÿsubscriptπ‘₯𝑦delimited-[]𝑛subscript~𝑑π‘₯𝑦⋅conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿπ›Ώπ‘›π‘Ÿn^{2}\bar{r}\stackrel{{\scriptstyle\text{(\ref{averagedistanceconstraint})}}}{{=}}\sum_{x,y\in[n]}\,\tilde{d}_{x,y}\geq\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|\cdot\delta nr.

Furthermore, the left-hand side of inequality (30) is an integer. ∎

Lemma A.2.
|{(x,y)∈[n]Γ—[n]∣d~x,y>0}|β‰€βŒŠn​r¯δ​rβŒ‹+1.conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯𝑦0π‘›Β―π‘Ÿπ›Ώπ‘Ÿ1\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}>0\right\}\right|\leq\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1.
Proof.

Assume otherwise. Then

|{(x,y)∈[n]Γ—[n]∣(d~x,y>0)∧(d~x,y≠δ​n​r)}|conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯𝑦0subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ\displaystyle\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\left(\tilde{d}_{x,y}>0\right)\land\left(\tilde{d}_{x,y}\neq\delta nr\right)\right\}\right|
β‰₯\displaystyle\geq|{(x,y)∈[n]Γ—[n]∣d~x,y>0}|βˆ’|{(x,y)∈[n]Γ—[n]∣d~x,y=δ​n​r}|conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯𝑦0conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ\displaystyle\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}>0\right\}\right|-\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|
β‰₯\displaystyle\geq⌊n​r¯δ​rβŒ‹+2βˆ’|{(x,y)∈[n]Γ—[n]∣d~x,y=δ​n​r}|π‘›Β―π‘Ÿπ›Ώπ‘Ÿ2conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ\displaystyle\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+2-\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|
β‰₯Lemma A.1superscriptLemma A.1\displaystyle\stackrel{{\scriptstyle\text{Lemma~{}\ref{maximumnumberoflargestvaluevariables}}}}{{\geq}}2.2\displaystyle 2.

So by constraint (22) (and the feasibility of {d~x,y}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\}_{x,y\in[n]} to max square sum),

|{(x,y)∈[n]Γ—[n]∣0<d~x,y<δ​n​r}|β‰₯2.conditional-setπ‘₯𝑦delimited-[]𝑛delimited-[]𝑛0subscript~𝑑π‘₯π‘¦π›Ώπ‘›π‘Ÿ2\left|\left\{\left(x,y\right)\in[n]\times[n]\mid 0<\tilde{d}_{x,y}<\delta nr\right\}\right|\geq 2.

Consequently, there exist distinct (xβ€²,yβ€²)superscriptπ‘₯β€²superscript𝑦′(x^{\prime},y^{\prime}), (xβ€²β€²,yβ€²β€²)∈[n]Γ—[n]superscriptπ‘₯β€²β€²superscript𝑦′′delimited-[]𝑛delimited-[]𝑛(x^{\prime\prime},y^{\prime\prime})\in[n]\times[n] satisfying

0<d~xβ€²,yβ€²,d~xβ€²β€²,yβ€²β€²<δ​n​r.formulae-sequence0subscript~𝑑superscriptπ‘₯β€²superscript𝑦′subscript~𝑑superscriptπ‘₯β€²β€²superscriptπ‘¦β€²β€²π›Ώπ‘›π‘Ÿ\displaystyle 0<\tilde{d}_{x^{\prime},y^{\prime}},\,\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}<\delta nr.(31)

By symmetry, assume d~xβ€²,yβ€²β‰₯d~xβ€²β€²,yβ€²β€²subscript~𝑑superscriptπ‘₯β€²superscript𝑦′subscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′\tilde{d}_{x^{\prime},y^{\prime}}\geq\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}. By inequality (31), there exists a small real number Ξ²>0𝛽0\beta>0 such that increasing d~xβ€²,yβ€²subscript~𝑑superscriptπ‘₯β€²superscript𝑦′\tilde{d}_{x^{\prime},y^{\prime}} by β𝛽\beta and simultaneously decreasing d~xβ€²β€²,yβ€²β€²subscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′\tilde{d}_{x^{\prime\prime},y^{\prime\prime}} by β𝛽\beta will preserve constraints (21)–(22). I.e., the solution {d^x,yβˆˆβ„}x,y∈[n]subscriptsubscript^𝑑π‘₯𝑦ℝπ‘₯𝑦delimited-[]𝑛\{\hat{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]} defined below is feasible to max square sum:

d^x,y={d~xβ€²,yβ€²+Ξ²,if (x,y)=(xβ€²,yβ€²),d~xβ€²β€²,yβ€²β€²βˆ’Ξ²,if (x,y)=(xβ€²β€²,yβ€²β€²),d~x,y,otherwise.subscript^𝑑π‘₯𝑦casessubscript~𝑑superscriptπ‘₯β€²superscript𝑦′𝛽if (x,y)=(xβ€²,yβ€²)subscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′𝛽if (x,y)=(xβ€²β€²,yβ€²β€²)subscript~𝑑π‘₯𝑦otherwise\displaystyle\hat{d}_{x,y}=\left\{\begin{array}[]{ll}\tilde{d}_{x^{\prime},y^{\prime}}+\beta,&\text{if $(x,y)=(x^{\prime},y^{\prime})$},\\ \tilde{d}_{x^{\prime\prime},y^{\prime\prime}}-\beta,&\text{if $(x,y)=(x^{\prime\prime},y^{\prime\prime})$},\\ \tilde{d}_{x,y},&\text{otherwise}.\end{array}\right.(35)

Clearly, objective (20) w.r.t. {d^x,y}x,y∈[n]subscriptsubscript^𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\hat{d}_{x,y}\}_{x,y\in[n]} exceeds that w.r.t. {d~x,y}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\}_{x,y\in[n]} by

⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n](d^x,y2βˆ’d~x,y2)⋅𝑛21⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]𝑛subscriptsuperscript^𝑑2π‘₯𝑦subscriptsuperscript~𝑑2π‘₯𝑦\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\left({\hat{d}}^{2}_{x,y}-{\tilde{d}}^{2}_{x,y}\right)
=(35)superscript(35)\displaystyle\stackrel{{\scriptstyle\text{(\ref{variatedsolution})}}}{{=}}⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…((d~xβ€²,yβ€²+Ξ²)2+(d~xβ€²β€²,yβ€²β€²βˆ’Ξ²)2βˆ’d~xβ€²,yβ€²2βˆ’d~xβ€²β€²,yβ€²β€²2)⋅𝑛21⋅𝑛𝑛1superscriptsubscript~𝑑superscriptπ‘₯β€²superscript𝑦′𝛽2superscriptsubscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′𝛽2subscriptsuperscript~𝑑2superscriptπ‘₯β€²superscript𝑦′subscriptsuperscript~𝑑2superscriptπ‘₯β€²β€²superscript𝑦′′\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(\left(\tilde{d}_{x^{\prime},y^{\prime}}+\beta\right)^{2}+\left(\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}-\beta\right)^{2}-{\tilde{d}}^{2}_{x^{\prime},y^{\prime}}-{\tilde{d}}^{2}_{x^{\prime\prime},y^{\prime\prime}}\right)
=\displaystyle=⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…(2​β​d~xβ€²,yβ€²βˆ’2​β​d~xβ€²β€²,yβ€²β€²+2​β2)⋅𝑛21⋅𝑛𝑛12𝛽subscript~𝑑superscriptπ‘₯β€²superscript𝑦′2𝛽subscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′2superscript𝛽2\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(2\beta\tilde{d}_{x^{\prime},y^{\prime}}-2\beta\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}+2\beta^{2}\right)
>\displaystyle>0,0\displaystyle 0,

where the inequality holds because d~xβ€²,yβ€²β‰₯d~xβ€²β€²,yβ€²β€²subscript~𝑑superscriptπ‘₯β€²superscript𝑦′subscript~𝑑superscriptπ‘₯β€²β€²superscript𝑦′′\tilde{d}_{x^{\prime},y^{\prime}}\geq\tilde{d}_{x^{\prime\prime},y^{\prime\prime}} and Ξ²>0𝛽0\beta>0.

In summary, {d^x,y}x,y∈[n]subscriptsubscript^𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\hat{d}_{x,y}\}_{x,y\in[n]} is a feasible solution to max square sum achieving a greater objective (20) than the optimal solution {d~x,y}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\}_{x,y\in[n]} does, a contradiction. ∎

We now bound the optimal value of max square sum.

Theorem A.3.

The optimal value of max square sum is at most

⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…(⌊n​r¯δ​rβŒ‹+1)β‹…(δ​n​r)2⋅𝑛21⋅𝑛𝑛1π‘›Β―π‘Ÿπ›Ώπ‘Ÿ1superscriptπ›Ώπ‘›π‘Ÿ2\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1\right)\cdot\left(\delta nr\right)^{2}
Proof.

W.r.t. the optimal (and thus feasible) solution {d~x,y}x,y∈[n]subscriptsubscript~𝑑π‘₯𝑦π‘₯𝑦delimited-[]𝑛\{\tilde{d}_{x,y}\}_{x,y\in[n]}, objective (20) equals

⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]χ​[d~x,yβ‰ 0]β‹…d~x,y2⋅𝑛21⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]π‘›β‹…πœ’delimited-[]subscript~𝑑π‘₯𝑦0subscriptsuperscript~𝑑2π‘₯𝑦\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\chi\left[\tilde{d}_{x,y}\neq 0\right]\cdot{\tilde{d}}^{2}_{x,y}
≀(22)superscript(22)\displaystyle\stackrel{{\scriptstyle\text{(\ref{largestdistanceconstraint})}}}{{\leq}}⌊n2βŒ‹β‹…1nβ‹…(nβˆ’1)β‹…βˆ‘x,y∈[n]χ​[d~x,y>0]β‹…(δ​n​r)2,⋅𝑛21⋅𝑛𝑛1subscriptπ‘₯𝑦delimited-[]π‘›β‹…πœ’delimited-[]subscript~𝑑π‘₯𝑦0superscriptπ›Ώπ‘›π‘Ÿ2\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\chi\left[\tilde{d}_{x,y}>0\right]\cdot\left(\delta nr\right)^{2},

where χ​[P]=1πœ’delimited-[]𝑃1\chi[P]=1 if P𝑃P is true and χ​[P]=0πœ’delimited-[]𝑃0\chi[P]=0 otherwise, for any predicate P𝑃P. Now invoke Lemma A.2. ∎

References

  • [1] K. Barhum, O. Goldreich, and A. Shraibman. On approximating the average distance between points. In Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization, pages 296–310, 2007.
  • [2] C.-L. Chang. Deterministic sublinear-time approximations for metric 111-median selection. Information Processing Letters, 113(8):288–292, 2013.
  • [3] C.-L. Chang. A deterministic sublinear-time nonadaptive algorithm for metric 111-median selection. Theoretical Computer Science, 602:149–157, 2015.
  • [4] C.-L. Chang. Metric 111-median selection: Query complexity vs. approximation ratio. In Proceedings of the 22nd International Computing and Combinatorics Conference, pages 131–142, Ho Chi Minh City, Vietnam, 2016. Full version at https://arxiv.org/abs/1509.05662.
  • [5] C.-L. Chang. A lower bound for metric 111-median selection. Journal of Computer and System Sciences, 84:44–51, 2017.
  • [6] D. Eppstein and J. Wang. Fast approximation of centrality. Journal of Graph Algorithms and Applications, 8(1):39–45, 2004.
  • [7] O. Goldreich and D. Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473–493, 2008.
  • [8] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.
  • [9] P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 428–434, 1999.
  • [10] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.
  • [11] A. Kumar, Y. Sabharwal, and S. Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.
  • [12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.
  • [13] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, Cambridge, UK, 1995.
  • [14] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 3rd edition, 1976.
  • [15] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
  • [16] B.-Y. Wu. On approximating metric 111-median in sublinear time. Information Processing Letters, 114(4):163–166, 2014.