1702.03106.

Ching-Lueh Chang ²²2Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan. Email: clchang@saturn.yzu.edu.tw ³³3Supported in part by the Ministry of Science and Technology of Taiwan under grant 105-2221-E-155-047-.

Abstract

Let $(\{1,2,\ldots,n\},d)$ be a metric space. We analyze the expected value and the variance of $\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i))$ for a uniformly random permutation $\boldsymbol{\pi}$ of $\{1,2,\ldots,n\}$ , leading to the following results:

•
Consider the problem of finding a point in $\{1,2,\ldots,n\}$ with the minimum sum of distances to all points. We show that this problem has a randomized algorithm that (1) always outputs a $(2+\epsilon)$ -approximate solution in expected $O(n/\epsilon^{2})$ time and that (2) inherits Indyk’s [9, 10] algorithm to output a $(1+\epsilon)$ -approximate solution in $O(n/\epsilon^{2})$ time with probability $\Omega(1)$ , where $\epsilon\in(0,1)$ .
•
The average distance in $(\{1,2,\ldots,n\},d)$ can be approximated in $O(n/\epsilon)$ time to within a multiplicative factor in $[\,1/2-\epsilon,1\,]$ with probability $1/2+\Omega(1)$ , where $\epsilon>0$ .
•
Assume $d$ to be a graph metric. Then the average distance in $(\{1,2,\ldots,n\},d)$ can be approximated in $O(n)$ time to within a multiplicative factor in $[\,1-\epsilon,1+\epsilon\,]$ with probability $1/2+\Omega(1)$ , where $\epsilon=\omega(1/n^{1/4})$ .

1 Introduction

A metric space is a nonempty set $M$ endowed with a metric, i.e., a function $d\colon M\times M\to[\,0,\infty\,)$ such that

•
$d(x,y)=0$ if and only if $x=y$ (identity of indiscernibles),
•
$d(x,y)=d(y,x)$ (symmetry), and
•
$d(x,y)+d(y,z)\geq d(x,z)$ (triangle inequality)

for all $x$ , $y$ , $z\in M$ [14].

For all $n\in\mathbb{Z}^{+}$ , define $[n]\equiv\{1,2,\ldots,n\}$ . Given $n\in\mathbb{Z}^{+}$ and oracle access to a metric $d\colon[n]\times[n]\to[\,0,\infty\,)$ , metric $1$ -median asks for $\mathop{\mathrm{argmin}}_{y\in[n]}\,\sum_{x\in[n]}\,d(y,x)$ , breaking ties arbitrarily. It generalizes the classical median selection on the real line and has a brute-force $\Theta(n^{2})$ -time algorithm. More generally, metric $k$ -median asks for $c_{1}$ , $c_{2}$ , $\ldots$ , $c_{k}\in[n]$ minimizing $\sum_{x\in[n]}\,\min_{i=1}^{k}\,d(x,c_{i})$ . Because $d(\cdot,\cdot)$ defines $\binom{n}{2}=\Theta(n^{2})$ nonzero distances, only $o(n^{2})$ -time algorithms are said to run in sublinear time [9]. For all $\alpha\geq 1$ , an $\alpha$ -approximate $1$ -median is a point $p\in[n]$ satisfying

\sum_{x\in[n]}\,d\left(p,x\right)\leq\alpha\cdot\min_{y\in[n]}\,\sum_{x\in[n]}\,d\left(y,x\right).

For all $\epsilon>0$ , metric $1$ -median has a Monte Carlo $(1+\epsilon)$ -approximation $O(n/\epsilon^{2})$ -time algorithm [9, 10]. Guha et al. [8] show that metric $k$ -median has a Monte Carlo, $O(\exp(O(1/\epsilon)))$ -approximation, $O(nk\log n)$ -time, $O(n^{\epsilon})$ -space and one-pass algorithm for all small $k$ as well as a deterministic, $O(\exp(O(1/\epsilon)))$ -approximation, $O(n^{1+\epsilon})$ -time, $O(n^{\epsilon})$ -space and one-pass algorithm. Given $n$ points in $\mathbb{R}^{D}$ with $D\geq 1$ , the Monte Carlo algorithms of Kumar et al. [11] find a $(1+\epsilon)$ -approximate $1$ -median in $O(D\cdot\exp(1/\epsilon^{O(1)}))$ time and a $(1+\epsilon)$ -approximate solution to metric $k$ -median in $O(Dn\cdot\exp((k/\epsilon)^{O(1)}))$ time. All randomized $O(1)$ -approximation algorithms for metric $k$ -median take $\Omega(nk)$ time [12, 8]. Chang [3] shows that metric $1$ -median has a deterministic, $(2h)$ -approximation, $O(hn^{1+1/h})$ -time and nonadaptive algorithm for all constants $h\in\mathbb{Z}^{+}\setminus\{1\}$ , generalizing the results of Chang [2] and Wu [16]. On the other hand, he disproves the existence of deterministic $(2h-\epsilon)$ -approximation $O(n^{1+1/(h-1)}/h)$ -time algorithms for all constants $h\in\mathbb{Z}^{+}\setminus\{1\}$ and $\epsilon>0$ [4, 5].

In social network analysis, the closeness centrality of a point $v$ is the reciprocal of the average distance from $v$ to all points [15]. So metric $1$ -median asks for a point with the maximum closeness centrality. Given oracle access to a graph metric, the Monte-Carlo algorithms of Goldreich and Ron [7] and Eppstein and Wang [6] estimate the closeness centrality of a given point and those of all points, respectively.

All known sublinear-time algorithms for metric $1$ -median are either deterministic or Monte Carlo, the latter having a positive probability of failure. For example, Indyk’s Monte Carlo $(1+\epsilon)$ -approximation algorithm outputs with a positive probability a solution without approximation guarantees. In contrast, we show that metric $1$ -median has a randomized algorithm that always outputs a $(2+\epsilon)$ -approximate solution in expected $O(n/\epsilon^{2})$ time for all $\epsilon\in(0,1)$ . So, excluding the known deterministic algorithms (which are Las Vegas only in the degenerate sense), this paper gives the first Las Vegas approximation algorithm for metric $1$ -median with an expected sublinear running time. Note that deterministic sublinear-time algorithms for metric $1$ -median can be $4$ -approximate but not $(4-\epsilon)$ -approximate for any constant $\epsilon>0$ [2, 5]. So our approximation ratio of $2+\epsilon$ beats that of any deterministic sublinear-time algorithm. Inheriting Indyk’s algorithm, our algorithm outputs a $(1+\epsilon)$ -approximate $1$ -median in $O(n/\epsilon^{2})$ time with probability $\Omega(1)$ for all $\epsilon\in(0,1)$ .

Indyk [9, 10] gives a Monte-Carlo $O(n/\epsilon^{3.5})$ -time algorithm that approximates the average distance in any metric space $([n],d)$ to within a multiplicative factor in $[\,1-\epsilon,1+\epsilon\,]$ , for all $\epsilon>0$ . Barhum, Goldreich and Shraibman [1] improve Indyk’s time complexity of $O(n/\epsilon^{3.5})$ to $O(n/\epsilon^{2})$ . This paper gives a Monte-Carlo $O(n/\epsilon)$ -time algorithm that approximates the average distance in $([n],d)$ to within a multiplicative factor in $[\,1/2-\epsilon,1\,]$ , for all $\epsilon>0$ . For all $\epsilon=\omega(1/n^{1/4})$ , we present a Monte-Carlo $O(n)$ -time algorithm approximating the average distance of any graph metric to within a multiplicative factor in $[\,1-\epsilon,1+\epsilon\,]$ . But for general metrics, we do not know whether the $O(n/\epsilon^{2})$ running time of Barhum, Goldreich and Shraibman can be improved to $O(n/\epsilon^{2-\Omega(1)})$ .

2 Definitions and preliminaries

For a metric space $([n],d)$ ,

	$\displaystyle\bar{r}$	$\displaystyle\equiv$	$\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,d\left(x,y\right),$		(1)
	$\displaystyle p^{*}$	$\displaystyle\equiv$	$\displaystyle\mathop{\mathrm{argmin}}_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right),$		(2)

breaking ties arbitrarily in equation (2). So $\bar{r}$ is the average distance in $([n],d)$ , and $p^{*}$ is a $1$ -median.

An algorithm $A$ with oracle access to $d\colon[n]\times[n]\to[\,0,\infty\,)$ is denoted by $A^{d}$ and may query $d$ on any $(x,y)\in[n]\times[n]$ for $d(x,y)$ . In this paper, all Landau symbols (such as $O(\cdot)$ , $o(\cdot)$ , $\Omega(\cdot)$ and $\omega(\cdot)$ ) are w.r.t. $n$ . The following result is due to Indyk.

Fact 1 ([9, 10]).

For all $\epsilon>0$ , metric $1$ -median has a Monte Carlo $(1+\epsilon)$ -approximation $O(n/\epsilon^{2})$ -time algorithm with a failure probability of at most $1/e$ .

Henceforth, denote Indyk’s algorithm in Fact 1 by Indyk median. It is given $n\in\mathbb{Z}^{+}$ , $\epsilon>0$ and oracle access to a metric $d\colon[n]\times[n]\to[\,0,\infty\,)$ . The following fact on estimating the average distance is due to Barhum, Goldreich and Shraibman.

Fact 2 ([1]).

Given $n\in\mathbb{Z}^{+}$ , $\epsilon>0$ and oracle access to a metric $d\colon[n]\times[n]\to[\,0,\infty\,)$ , a real number in $\left[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,\right]$ can be found in $O(n/\epsilon^{2})$ time with probability at least $1/2+\Omega(1)$ .

Chebyshev’s inequality ([13]).

Let $X$ be a random variable with a finite expected value and a finite nonzero variance. Then for all $k\geq 1$ ,

\Pr\left[\,\left|\,X-\mathop{\mathrm{E}}[X]\,\right|\geq k\sqrt{\mathop{\mathrm{var}}(X)}\,\right]\leq\frac{1}{k^{2}}.

3 Las Vegas approximation for metric $1$ -median selection

This section presents a randomized algorithm that always outputs a $(2+\epsilon)$ -approximate $1$ -median, where $\epsilon\in(0,1)$ . Clearly,

\displaystyle\sum_{x\in[n]}\,d\left(p^{*},x\right)\stackrel{{\scriptstyle\text{(\ref{optimalpoint})}}}{{=}}\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\leq\frac{1}{n}\cdot\sum_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\stackrel{{\scriptstyle\text{(\ref{averagedistance})}}}{{=}}n\bar{r}.

(3)

For each permutation $\pi\colon[n]\to[n]$ ,

\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\pi\left(2i-1\right),\pi\left(2i\right)\right)\leq\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(p^{*},\pi\left(2i-1\right)\right)+d\left(p^{*},\pi\left(2i\right)\right)\leq\sum_{x\in[n]}\,d\left(p^{*},x\right),

(4)

where the first and the second inequalities follow from the triangle inequality and the injectivity of $\pi$ .

Lemma 3.

When line 5 of Las Vegas median in Fig. 1 is run, $z$ is a $(2+\epsilon)$ -approximate $1$ -median.

Proof.

The condition in line 4 of Las Vegas median implies

\sum_{x\in[n]}\,d\left(z,x\right)\stackrel{{\scriptstyle\text{(\ref{matchingsizeandoptimalsumofdistances})}}}{{\leq}}\left(2+\epsilon\right)\sum_{x\in[n]}\,d\left(p^{*},x\right)\stackrel{{\scriptstyle\text{(\ref{optimalpoint})}}}{{=}}\left(2+\epsilon\right)\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right).

So when line 5 is run, it returns a $(2+\epsilon)$ -approximate $1$ -median. ∎

1: while true do

z\leftarrow\text{\sf Indyk median}^{d}(n,\epsilon/8)

;

3: Pick independent and uniformly random permutations

\boldsymbol{\pi}_{1}

\boldsymbol{\pi}_{2}

\ldots

\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}\colon[n]\to[n]

;

4: if there exists

j\in[\lceil 1/\epsilon\rceil]

satisfying

\sum_{x\in[n]}\,d(z,x)\leq(2+\epsilon)\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}_{j}(2i-1),\boldsymbol{\pi}_{j}(2i))

then

5: return

z

;

6: end if

7: end while

Figure 1: Algorithm Las Vegas median with oracle access to a metric

d\colon[n]\times[n]\to[\,0,\infty\,)

and with inputs

n\in\mathbb{Z}^{+}

and

\epsilon\in(0,1)

Inequalities (3)–(4) yield the following.

Lemma 4.

For each permutation $\pi\colon[n]\to[n]$ ,

\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\pi\left(2i-1\right),\pi\left(2i\right)\right)\leq n\bar{r}.

Lemma 5.

For a uniformly random permutation $\boldsymbol{\pi}\colon[n]\to[n]$ ,

\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right]=\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{n\bar{r}}{n-1}.

Proof.

For each $i\in[\lfloor n/2\rfloor]$ , $\{\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\}$ is a uniformly random size- $2$ subset of $[n]$ , implying

$\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right]$	$\displaystyle=$	$\displaystyle\frac{1}{n\cdot(n-1)}\cdot\sum_{\text{distinct $x$, $y\in[n]$}}\,d\left(x,y\right)$
	$\displaystyle=$	$\displaystyle\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d\left(x,y\right)$
	$\displaystyle\stackrel{{\scriptstyle\text{(\ref{averagedistance})}}}{{=}}$	$\displaystyle\frac{n\bar{r}}{n-1},$

where the second equality follows from the identity of indiscernibles. Finally, use the linearity of expectation. ∎

Lemma 6.

For all $\epsilon\in(0,1)$ and in each iteration of the while loop of Las Vegas median,

\displaystyle\Pr\left[\,\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right]\geq 0.9,

(5)

where the probability is taken over $\boldsymbol{\pi}_{1}$ , $\boldsymbol{\pi}_{2}$ , $\ldots$ , $\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}$ in line 3 of Las Vegas median.

Proof.

Let $\boldsymbol{\pi}\colon[n]\to[n]$ be a uniformly random permutation and

\displaystyle\alpha=\Pr_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right].

(6)

So by Lemma 4,

\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]\leq\alpha n\bar{r}+\left(1-\alpha\right)\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}.

This and Lemma 5 imply $\alpha\geq\epsilon/8$ . So the left-hand side of inequality (5) is at least $1-(1-\epsilon/8)^{80\lceil 1/\epsilon\rceil}\geq 0.9$ . ∎

Lemma 7.

For all $\epsilon\in(0,1)$ and in each iteration of the while loop of Las Vegas median,

	$\displaystyle\Pr\left[\,\left(\sum_{x\in[n]}\,d\left(z,x\right)\leq\left(1+\frac{\epsilon}{8}\right)\min_{p\in[n]}\,\sum_{x\in[n]}\,d\left(p,x\right)\right)\right.$	(7)
	$\displaystyle\left.\land\left(\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\geq\left(\frac{1}{2}-\frac{\epsilon}{8}\right)n\bar{r}\right)\right.$
	$\displaystyle\left.\land\left(\exists j\in\left[80\cdot\left\lceil\frac{1}{\epsilon}\right\rceil\right],\,\sum_{x\in[n]}\,d\left(z,x\right)\leq\left(2+\epsilon\right)\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\right)\,\right]$
$\displaystyle=$	$\displaystyle\frac{1}{2}+\Omega(1),$

where the probability is taken over $\boldsymbol{\pi}_{1}$ , $\boldsymbol{\pi}_{2}$ , $\ldots$ , $\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}$ and the random coin tosses of Indyk median.

Proof.

By Fact 1 and line 2 of Las Vegas median, the first condition within $\Pr[\cdot]$ in equation (7) holds with probability at least $1-1/e$ over the random coin tosses of Indyk median. By Lemma 6, the second condition holds with probability at least $0.9$ over $\boldsymbol{\pi}_{1}$ , $\boldsymbol{\pi}_{2}$ , $\ldots$ , $\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}$ . In summary, the first two conditions hold simultaneously with probability at least $(1-1/e)\cdot 0.9=1/2+\Omega(1)$ (note that the random coin tosses of Indyk median are independent of $\boldsymbol{\pi}_{1}$ , $\boldsymbol{\pi}_{2}$ , $\ldots$ , $\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}$ ). Finally, the first two conditions together imply the third by inequality (3) and the easy fact that

\left(1+\frac{\epsilon}{8}\right)\leq\left(2+\epsilon\right)\left(\frac{1}{2}-\frac{\epsilon}{8}\right).

∎

Theorem 8.

For all $\epsilon\in(0,1)$ , metric $1$ -median has a randomized algorithm that (1) always outputs a $(2+\epsilon)$ -approximate solution in expected $O(n/\epsilon^{2})$ time and (2) outputs a $(1+\epsilon)$ -approximate solution in $O(n/\epsilon^{2})$ time with probability $\Omega(1)$ .

Proof.

By Lemma 7, each execution of lines 4–5 of Las Vegas median returns with probability $1/2+\Omega(1)$ . So the expected number of iterations is $O(1)$ . By Fact 1, line 2 takes $O(n/\epsilon^{2})$ time. Line 3 takes $80\lceil 1/\epsilon\rceil\cdot O(n)$ time by the Knuth shuffle. Clearly, lines 4–5 take $O(n/\epsilon)$ time. In summary, the expected running time of Las Vegas median is $O(1)\cdot O(n/\epsilon^{2})=O(n/\epsilon^{2})$ . To prevent Las Vegas median from running forever, find a $1$ -median by brute force (which obviously takes $O(n^{2})$ time) after $n^{2}$ steps of computation. By Lemma 3, Las Vegas median is $(2+\epsilon)$ -approximate.

By Lemma 7, $z$ is $(1+\epsilon/8)$ -approximate and is also returned in line 5 with probability $\Omega(1)$ in the first (in fact, any) iteration. Finally, the previous paragraph has shown each iteration to take $O(n/\epsilon^{2})$ time. ∎

By Fact 1, Indyk median satisfies condition (2) in Theorem 8. But it does not satisfy condition (1).

We now justify the optimality of the ratio of $2+\epsilon$ in Theorem 8. Let $A$ be a randomized algorithm that always outputs a $(2-\epsilon)$ -approximate $1$ -median. Furthermore, denote by $p\in[n]$ (resp., $Q\subseteq[n]\times[n]$ ) the output (resp., the set of queries as unordered pairs) of $A^{d_{1}}(n)$ , where $d_{1}$ is the discrete metric (i.e., $d_{1}(x,y)=1$ and $d_{1}(x,x)=0$ for all distinct $x$ , $y\in[n]$ ). Without loss of generality, assume $(p,y)\in Q$ for all $y\in[n]\setminus\{p\}$ by adding dummy queries. So the queries in $Q$ witness that

\displaystyle\sum_{y\in[n]\setminus\{p\}}\,d_{1}\left(p,y\right)=n-1.

(8)

Assume without loss of generality that $A$ never queries for the distance from a point to itself.

In the sequel, consider the case that $|Q|<\epsilon\cdot(n-1)^{2}/8$ . By the averaging argument, there exists a point $\hat{p}\in[n]\setminus\{p\}$ involved in at most $2\cdot|Q|/(n-1)$ queries in $Q$ (note that each query involves two points). Because every function $f\colon[n]\times[n]\to[\,0,\infty\,)$ with

\left\{f\left(x,y\right)\mid\left(x,y\in[n]\right)\land\left(x\neq y\right)\right\}\subseteq\left\{\frac{1}{2},1\right\}

satisfies the triangle inequality, $A$ cannot exclude the possibility that $d_{1}(\hat{p},y)=1/2$ for all $y\in[n]\setminus\{\hat{p}\}$ satisfying $(\hat{p},y)\notin Q$ . In summary, $A$ cannot rule out the case that

\displaystyle\sum_{y\in[n]}\,d_{1}\left(\hat{p},y\right)

\displaystyle\leq

\displaystyle\frac{2\cdot|Q|}{n-1}\cdot 1+\left(n-1-\frac{2\cdot|Q|}{n-1}\right)\cdot\frac{1}{2}<\left(\frac{1}{2}+\frac{\epsilon}{8}\right)\cdot(n-1).\,\,\,\,\,

(9)

Equations (8)–(9) contradict the guarantee that $p$ is $(2-\epsilon)$ -approximate. Consequently, the case that $|Q|<\epsilon\cdot(n-1)^{2}/8$ should never happen. The next theorem summarizes the above.

Theorem 9.

Metric $1$ -median has no randomized algorithm that always outputs a $(2-\epsilon)$ -approximate solution and that makes fewer than $\epsilon\cdot(n-1)^{2}/8$ queries with a positive probability given oracle access to the discrete metric, for any constant $\epsilon\in(0,1)$ .

Lemmas 4 and 6 yield the following estimation of the average distance.

Theorem 10.

Given $n\in\mathbb{Z}^{+}$ , $\epsilon>0$ and oracle access to a metric $d\colon[n]\times[n]\to[\,0,\infty\,)$ , a real number in $[\,(1/2-\epsilon)\bar{r},\bar{r}\,]$ can be found in $O(n/\epsilon)$ time with probability $1/2+\Omega(1)$ .

Proof.

By Lemmas 4 and 6,

\displaystyle\frac{1}{n}\cdot\max_{j\in[80\cdot\lceil 1/\epsilon\rceil]}\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}_{j}\left(2i-1\right),\boldsymbol{\pi}_{j}\left(2i\right)\right)\in\left[\,\left(\frac{1}{2}-\frac{\epsilon}{8}\right)\bar{r},\bar{r}\,\right]

(10)

with probability $1/2+\Omega(1)$ . The Knuth shuffle picks $\boldsymbol{\pi}_{1}$ , $\boldsymbol{\pi}_{2}$ , $\ldots$ , $\boldsymbol{\pi}_{80\lceil 1/\epsilon\rceil}$ in $80\lceil 1/\epsilon\rceil\cdot O(n)$ time. Then the left-hand side of relation (10) can be calculated in $O(n/\epsilon)$ time. ∎

Note that the estimation of the average distance in Theorem 10 has only one-sided error. The time complexity (resp., approximation ratio) in Theorem 10 is better (resp., worse) than that in Fact 2.

4 Estimating the average distance of a graph metric

Throughout this section, take any $\epsilon=\omega(1/n^{1/4})$ less than a small constant, e.g., $\epsilon=10^{-100}$ . Define

	$\displaystyle\delta$	$\displaystyle\equiv$	$\displaystyle\frac{\epsilon^{2}}{10^{10}},$		(11)
	$\displaystyle r$	$\displaystyle\equiv$	$\displaystyle\frac{1}{n}\cdot\sum_{x\in[n]}\,d\left(p^{*},x\right),$		(12)

where $p^{*}$ is as in equation (2). As $\epsilon=\omega(1/n^{1/4})$ , $\delta=\omega(1/\sqrt{n})$ by equation (11).

Lemma 11.

$\bar{r}\leq 2r$ .

Proof.

By equation (1) and the triangle inequality,

$\displaystyle\bar{r}$	$\displaystyle\leq$	$\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,\left(d\left(p^{},x\right)+d\left(p^{},y\right)\right)$	(13)
	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}\cdot n\cdot\left(\sum_{x\in[n]}\,d\left(p^{},x\right)+\sum_{y\in[n]}\,d\left(p^{},y\right)\right)$
	$\displaystyle=$	$\displaystyle\frac{2}{n}\cdot\sum_{x\in[n]}\,d\left(p^{*},x\right).$

Equations (12)–(13) complete the proof. ∎

1: Pick a uniformly random permutation

\boldsymbol{\pi}\colon[n]\to[n]

;

2: return

\sum_{i=1}^{\lfloor n/2\rfloor}\,d(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i))\cdot 2/n

;

Figure 2: Algorithm average distance with oracle access to a metric

d\colon[n]\times[n]\to[\,0,\infty\,)

and with inputs

n\in\mathbb{Z}^{+}

and

\epsilon=\omega(1/n^{1/4})

As in line 1 of average distance in Fig. 2, let $\boldsymbol{\pi}\colon[n]\to[n]$ be a uniformly random permutation. Clearly,

	$\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)^{2}\,\right]$
$\displaystyle=$	$\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot\sum_{j=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]$
$\displaystyle=$	$\displaystyle\sum_{\text{distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]$
$\displaystyle+$	$\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right],$	(15)

where the last equality follows from the linearity of expectation and the separation of pairs $(i,j)$ according to whether $i=j$ . The next three lemmas analyze the variance of

\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right).

Lemma 12.

\displaystyle\sum_{\text{\rm distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]\leq\frac{1}{4}\cdot\left(1+O\left(\frac{1}{n}\right)\right)n^{2}\bar{r}^{2}.

Proof.

Pick any distinct $i$ , $j\in[\,\lfloor n/2\rfloor\,]$ . Clearly,

\left\{\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right),\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right\}

is a uniformly random size- $4$ subset of $[n]$ . So

			$\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\right)\cdot d\left(\boldsymbol{\pi}(2j-1),\boldsymbol{\pi}(2j)\right)\,\right]$
		$\displaystyle=$	$\displaystyle\frac{1}{n\cdot(n-1)\cdot(n-2)\cdot(n-3)}\cdot\sum_{\text{distinct $u$, $v$, $x$, $y\in[n]$}}\,d\left(u,v\right)\cdot d\left(x,y\right).$

Clearly,

$\displaystyle\sum_{\text{distinct $u$, $v$, $x$, $y\in[n]$}}\,d\left(u,v\right)\cdot d\left(x,y\right)$	$\displaystyle\leq$	$\displaystyle\sum_{u,v,x,y\in[n]}\,d\left(u,v\right)\cdot d\left(x,y\right)$
	$\displaystyle=$	$\displaystyle\sum_{u,v\in[n]}\,d\left(u,v\right)\cdot\sum_{x,y\in[n]}\,d\left(x,y\right)$
	$\displaystyle=$	$\displaystyle\left(\sum_{x,y\in[n]}\,d\left(x,y\right)\right)^{2}.$

In summary,

			$\displaystyle\sum_{\text{\rm distinct $i,j=1$}}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\cdot d\left(\boldsymbol{\pi}\left(2j-1\right),\boldsymbol{\pi}\left(2j\right)\right)\,\right]$
		$\displaystyle\leq$	$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\left(\left\lfloor\frac{n}{2}\right\rfloor-1\right)\cdot\frac{1}{n\cdot(n-1)\cdot(n-2)\cdot(n-3)}\cdot\left(\sum_{x,y\in[n]}\,d\left(x,y\right)\right)^{2}.$

This and equation (1) complete the proof. ∎

Define

\Delta\equiv\max_{x,y\in[n]}\,d(x,y)

to be the diameter of $([n],d)$ .

Lemma 13.

\displaystyle\delta nr\geq\Delta

(16)

then

\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]\leq\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).

(17)

Proof.

Clearly, $\{\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\}$ is a uniformly random size- $2$ subset of $[n]$ for each $i\in[\,\lfloor n/2\rfloor\,]$ . Therefore,

$\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,d^{2}\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\,\right]$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\frac{1}{n\cdot(n-1)}\cdot\sum_{\text{distinct $x$, $y\in[n]$}}\,d^{2}\left(x,y\right)\,\,\,\,\,\,\,$
	$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{\lfloor n/2\rfloor}\,\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d^{2}\left(x,y\right)$
	$\displaystyle=$	$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d^{2}\left(x,y\right).$

By inequality (16),

\displaystyle d\left(x,y\right)\leq\delta nr

(19)

for all $x$ , $y\in[n]$ .

By equations (1) and (4)–(19), the left-hand side of inequality (17) cannot exceed the optimal value of the following problem, called max square sum:

Find $d_{x,y}\in\mathbb{R}$ for all $x$ , $y\in[n]$ to maximize
$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,d_{x,y}^{2}$ (20)
subject to
$\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,d_{x,y}=\bar{r},$ (21)
$\displaystyle\forall x,y\in[n],\,\,0\leq d_{x,y}\leq\delta nr.$ (22)

Above, constraint (21) (resp., (22)) mimics equation (1) (resp., inequality (19) and the non-negativeness of distances). Appendix A bounds the optimal value of max square sum from above by

\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\frac{1}{n\cdot(n-1)}\cdot\left(\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1\right)\cdot\left(\delta nr\right)^{2}.

This evaluates to be at most

\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).

∎

Recall that the variance of any random variable $X$ equals $\mathop{\mathrm{E}}[X^{2}]-(\mathop{\mathrm{E}}[X])^{2}$ .

Lemma 14.

If $\delta nr\geq\Delta$ , then

\mathop{\mathrm{var}}_{\boldsymbol{\pi}}\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)\leq\left(1+o(1)\right)\delta n^{2}r^{2}.

Proof.

By equations (4)–(15) and Lemmas 12–13,

			$\displaystyle\mathop{\mathrm{E}}_{\boldsymbol{\pi}}\left[\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)^{2}\,\right]$
		$\displaystyle\leq$	$\displaystyle\frac{1}{4}\cdot\left(1+O\left(\frac{1}{n}\right)\right)n^{2}\bar{r}^{2}+\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).$

This and Lemma 5 imply

			$\displaystyle\mathop{\mathrm{var}}_{\boldsymbol{\pi}}\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)$
		$\displaystyle\leq$	$\displaystyle O\left(\frac{1}{n}\right)\cdot n^{2}\bar{r}^{2}+\left(\frac{1}{2}+O\left(\frac{1}{n}\right)\right)\left(\delta n^{2}r\bar{r}+\delta^{2}nr^{2}\right).$

Finally, invoke Lemma 11 and recall that $\delta=\omega(1/\sqrt{n})$ . ∎

Lemma 15.

If $\delta nr\geq\Delta$ , then

\displaystyle\Pr_{\boldsymbol{\pi}}\left[\,\left|\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)-\frac{1}{2}\cdot\left(1\pm O\left(\frac{1}{n}\right)\right)n\bar{r}\,\right|\geq k\sqrt{\left(1+o(1)\right)\delta}\,nr\,\right]\leq\frac{1}{k^{2}}

for all $k>1$ .

Proof.

Use Chebyshev’s inequality and Lemmas 5 and 14. ∎

Lemma 16.

If $\delta nr\geq\Delta$ , then

\Pr_{\boldsymbol{\pi}}\left[\,\left|\,\left(\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}\left(2i-1\right),\boldsymbol{\pi}\left(2i\right)\right)\right)-\frac{1}{2}\cdot\left(1\pm O\left(\frac{1}{n}\right)\right)n\bar{r}\,\right|\geq k\sqrt{\left(1+o(1)\right)\delta}\,n\bar{r}\,\right]\leq\frac{1}{k^{2}}

for all $k>1$ .

Proof.

By inequalities (3) and (12),

r\leq\bar{r}.

This and Lemma 15 complete the proof. ∎

We now arrive at an efficient estimation of the average distance on a graph.

Theorem 17.

Given $n\in\mathbb{Z}^{+}$ , $\epsilon=\omega(1/n^{1/4})$ and oracle access to a graph metric $d\colon[n]\times[n]\to\mathbb{N}$ , a real number in $[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,]$ can be found in $O(n)$ time with probability $1/2+\Omega(1)$ .

Proof.

Let $G=([n],E)$ be an undirected unweighted graph inducing the distance function $d$ . Then pick $x$ , $y\in[n]$ with $d(x,y)=\Delta$ , i.e., $(x,y)$ is a furthest pair of vertices of $G$ . Find a simple shortest $x$ - $y$ path, denoted $(v_{0}=x,v_{1},\ldots,v_{\Delta}=y)$ , in $G$ . By equation (12),

\displaystyle r\geq\frac{1}{n}\cdot\sum_{i=0}^{\Delta}\,d\left(p^{*},v_{i}\right).

(23)

Now,

\displaystyle\sum_{i=0}^{\Delta}\,d\left(p^{*},v_{i}\right)=\frac{1}{2}\cdot\sum_{i=0}^{\Delta}d\left(p^{*},v_{i}\right)+d\left(p^{*},v_{\Delta-i}\right)\geq\frac{1}{2}\cdot\sum_{i=0}^{\Delta}\,d\left(v_{i},v_{\Delta-i}\right)=\frac{1}{2}\cdot\sum_{i=0}^{\Delta}\,\left|\,\Delta-2i\,\right|\geq\frac{\Delta^{2}}{4},

(24)

where the first inequality (resp., the second equality) follows from the triangle inequality (resp., $(v_{0},v_{1},\ldots,v_{\Delta})$ being a shortest $v_{0}$ - $v_{\Delta}$ path).⁴⁴4It is easy to verify that $\sum_{i=0}^{\Delta}\,|\,\Delta-2i\,|=(\Delta+2)\Delta/2$ if $\Delta\equiv 0\pmod{2}$ and $\sum_{i=0}^{\Delta}\,|\,\Delta-2i\,|=(\Delta+1)^{2}/2$ otherwise. By inequalities (23)–(24),

\displaystyle nr\geq\frac{\Delta^{2}}{4}.

(25)

Because $d$ is a graph metric, $d(x,y)\geq 1$ for all distinct $x$ , $y\in[n]$ . So by equation (12),

\displaystyle r\geq\frac{1}{n}\cdot\sum_{x\in[n]\setminus\{p^{*}\}}\,1\geq\frac{1}{2}

(26)

for all $n\geq 2$ .

By inequalities (25)–(26),

\delta nr\geq\delta\cdot\max\left\{\frac{\Delta^{2}}{4},\frac{n}{2}\right\}.

\displaystyle\delta nr\geq\Delta

(27)

for all sufficiently large $n$ .⁵⁵5If $\Delta\geq 4/\delta$ , then $\delta\Delta^{2}/4\geq\Delta$ . Otherwise, $\delta n/2\geq\Delta$ for all $n>8/\delta^{2}$ . Finally, recall that $\delta=\omega(1/\sqrt{n})$ . By equation (11),

\displaystyle 3\sqrt{\left(1+o(1)\right)\delta}\leq 0.1\,\epsilon

(28)

for all sufficiently large $n$ . By inequalities (27)–(28), Lemma 16 with $k=3$ and recalling that $\epsilon=\omega(1/n^{1/4})$ ,

\displaystyle\Pr_{\boldsymbol{\pi}}\left[\,\sum_{i=1}^{\lfloor n/2\rfloor}\,d\left(\boldsymbol{\pi}(2i-1),\boldsymbol{\pi}(2i)\right)\in\left[\,\left(\frac{1}{2}-\frac{\epsilon}{2}\right)n\bar{r},\left(\frac{1}{2}+\frac{\epsilon}{2}\right)n\bar{r}\,\right]\,\right]\geq 1-\frac{1}{9}

(29)

for all sufficiently large $n$ . Consequently, the output of line 2 of average distance in Fig. 2 is in $[\,(1-\epsilon)\bar{r},(1+\epsilon)\bar{r}\,]$ with probability $1/2+\Omega(1)$ . Line 1 takes $O(n)$ time by the Knuth shuffle. Clearly, line 2 also takes $O(n)$ time. ∎

The time complexity of $O(n)$ in Theorem 17 is independent of $\epsilon$ . But for general metrics, we do not know whether the time complexity of $O(n/\epsilon^{2})$ in Fact 2 can be improved to $O(n/\epsilon^{2-\Omega(1)})$ .

Appendix A Analyzing max square sum

Max square sum has an optimal solution, denoted $\{\tilde{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]}$ , because its feasible solutions (i.e., those satisfying constraints (21)–(22)) form a closed and bounded subset of $\mathbb{R}^{(n^{2})}$ . (Recall from elementary mathematical analysis that a continuous real-valued function on a closed and bounded subset of $\mathbb{R}^{k}$ has a maximum value, where $k<\infty$ .) Note that $\{\tilde{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]}$ must be feasible to max square sum. Below is a consequence of constraint (21).

Lemma A.1.

\displaystyle\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|\leq\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor.

(30)

Proof.

Clearly,

n^{2}\bar{r}\stackrel{{\scriptstyle\text{(\ref{averagedistanceconstraint})}}}{{=}}\sum_{x,y\in[n]}\,\tilde{d}_{x,y}\geq\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right|\cdot\delta nr.

Furthermore, the left-hand side of inequality (30) is an integer. ∎

Lemma A.2.

\left|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}>0\right\}\right|\leq\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1.

Proof.

Assume otherwise. Then

			$\displaystyle\left\|\left\{\left(x,y\right)\in[n]\times[n]\mid\left(\tilde{d}_{x,y}>0\right)\land\left(\tilde{d}_{x,y}\neq\delta nr\right)\right\}\right\|$
		$\displaystyle\geq$	$\displaystyle\left\|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}>0\right\}\right\|-\left\|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right\|$
		$\displaystyle\geq$	$\displaystyle\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+2-\left\|\left\{\left(x,y\right)\in[n]\times[n]\mid\tilde{d}_{x,y}=\delta nr\right\}\right\|$
		$\displaystyle\stackrel{{\scriptstyle\text{Lemma~{}\ref{maximumnumberoflargestvaluevariables}}}}{{\geq}}$	$\displaystyle 2.$

So by constraint (22) (and the feasibility of $\{\tilde{d}_{x,y}\}_{x,y\in[n]}$ to max square sum),

\left|\left\{\left(x,y\right)\in[n]\times[n]\mid 0<\tilde{d}_{x,y}<\delta nr\right\}\right|\geq 2.

Consequently, there exist distinct $(x^{\prime},y^{\prime})$ , $(x^{\prime\prime},y^{\prime\prime})\in[n]\times[n]$ satisfying

\displaystyle 0<\tilde{d}_{x^{\prime},y^{\prime}},\,\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}<\delta nr.

(31)

By symmetry, assume $\tilde{d}_{x^{\prime},y^{\prime}}\geq\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}$ . By inequality (31), there exists a small real number $\beta>0$ such that increasing $\tilde{d}_{x^{\prime},y^{\prime}}$ by $\beta$ and simultaneously decreasing $\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}$ by $\beta$ will preserve constraints (21)–(22). I.e., the solution $\{\hat{d}_{x,y}\in\mathbb{R}\}_{x,y\in[n]}$ defined below is feasible to max square sum:

\displaystyle\hat{d}_{x,y}=\left\{\begin{array}[]{ll}\tilde{d}_{x^{\prime},y^{\prime}}+\beta,&\text{if $(x,y)=(x^{\prime},y^{\prime})$},\\ \tilde{d}_{x^{\prime\prime},y^{\prime\prime}}-\beta,&\text{if $(x,y)=(x^{\prime\prime},y^{\prime\prime})$},\\ \tilde{d}_{x,y},&\text{otherwise}.\end{array}\right.

(35)

Clearly, objective (20) w.r.t. $\{\hat{d}_{x,y}\}_{x,y\in[n]}$ exceeds that w.r.t. $\{\tilde{d}_{x,y}\}_{x,y\in[n]}$ by

			$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\left({\hat{d}}^{2}_{x,y}-{\tilde{d}}^{2}_{x,y}\right)$
		$\displaystyle\stackrel{{\scriptstyle\text{(\ref{variatedsolution})}}}{{=}}$	$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(\left(\tilde{d}_{x^{\prime},y^{\prime}}+\beta\right)^{2}+\left(\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}-\beta\right)^{2}-{\tilde{d}}^{2}_{x^{\prime},y^{\prime}}-{\tilde{d}}^{2}_{x^{\prime\prime},y^{\prime\prime}}\right)$
		$\displaystyle=$	$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(2\beta\tilde{d}_{x^{\prime},y^{\prime}}-2\beta\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}+2\beta^{2}\right)$
		$\displaystyle>$	$\displaystyle 0,$

where the inequality holds because $\tilde{d}_{x^{\prime},y^{\prime}}\geq\tilde{d}_{x^{\prime\prime},y^{\prime\prime}}$ and $\beta>0$ .

In summary, $\{\hat{d}_{x,y}\}_{x,y\in[n]}$ is a feasible solution to max square sum achieving a greater objective (20) than the optimal solution $\{\tilde{d}_{x,y}\}_{x,y\in[n]}$ does, a contradiction. ∎

We now bound the optimal value of max square sum.

Theorem A.3.

The optimal value of max square sum is at most

\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\left(\left\lfloor\frac{n\bar{r}}{\delta r}\right\rfloor+1\right)\cdot\left(\delta nr\right)^{2}

Proof.

W.r.t. the optimal (and thus feasible) solution $\{\tilde{d}_{x,y}\}_{x,y\in[n]}$ , objective (20) equals

			$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\chi\left[\tilde{d}_{x,y}\neq 0\right]\cdot{\tilde{d}}^{2}_{x,y}$
		$\displaystyle\stackrel{{\scriptstyle\text{(\ref{largestdistanceconstraint})}}}{{\leq}}$	$\displaystyle\left\lfloor\frac{n}{2}\right\rfloor\cdot\frac{1}{n\cdot(n-1)}\cdot\sum_{x,y\in[n]}\,\chi\left[\tilde{d}_{x,y}>0\right]\cdot\left(\delta nr\right)^{2},$

where $\chi[P]=1$ if $P$ is true and $\chi[P]=0$ otherwise, for any predicate $P$ . Now invoke Lemma A.2. ∎

References

[1] K. Barhum, O. Goldreich, and A. Shraibman. On approximating the average distance between points. In Proceedings of the 10th International Workshop on Approximation and the 11th International Workshop on Randomization, and Combinatorial Optimization, pages 296–310, 2007.
[2] C.-L. Chang. Deterministic sublinear-time approximations for metric $1$ -median selection. Information Processing Letters, 113(8):288–292, 2013.
[3] C.-L. Chang. A deterministic sublinear-time nonadaptive algorithm for metric $1$ -median selection. Theoretical Computer Science, 602:149–157, 2015.
[4] C.-L. Chang. Metric $1$ -median selection: Query complexity vs. approximation ratio. In Proceedings of the 22nd International Computing and Combinatorics Conference, pages 131–142, Ho Chi Minh City, Vietnam, 2016. Full version at https://arxiv.org/abs/1509.05662.
[5] C.-L. Chang. A lower bound for metric $1$ -median selection. Journal of Computer and System Sciences, 84:44–51, 2017.
[6] D. Eppstein and J. Wang. Fast approximation of centrality. Journal of Graph Algorithms and Applications, 8(1):39–45, 2004.
[7] O. Goldreich and D. Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473–493, 2008.
[8] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.
[9] P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 428–434, 1999.
[10] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.
[11] A. Kumar, Y. Sabharwal, and S. Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.
[12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.
[13] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, Cambridge, UK, 1995.
[14] W. Rudin. Principles of Mathematical Analysis. McGraw-Hill, 3rd edition, 1976.
[15] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[16] B.-Y. Wu. On approximating metric $1$ -median in sublinear time. Information Processing Letters, 114(4):163–166, 2014.

	$\displaystyle\frac{1}{n^{2}}\cdot\sum_{x,y\in[n]}\,d_{x,y}=\bar{r},$		(21)
	$\displaystyle\forall x,y\in[n],\,\,0\leq d_{x,y}\leq\delta nr.$		(22)

Metric random matchings with applications111A preliminary version of this paper appears in https://arxiv.org/abs/1702.03106.

Abstract

1 Introduction

2 Definitions and preliminaries

Fact 1 ([9, 10]).

Fact 2 ([1]).

Chebyshev’s inequality ([13]).

3 Las Vegas approximation for metric 111-median selection

Lemma 3.

Proof.

Lemma 4.

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7.

Proof.

Theorem 8.

Proof.

Theorem 9.

Theorem 10.

Proof.

4 Estimating the average distance of a graph metric

Lemma 11.

Proof.

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 14.

Proof.

Lemma 15.

Proof.

Lemma 16.

Proof.

Theorem 17.

Proof.

Appendix A Analyzing max square sum

Lemma A.1.

Proof.

Lemma A.2.

Proof.

Theorem A.3.

Proof.

References

Metric random matchings with applications¹¹1A preliminary version of this paper appears in https://arxiv.org/abs/1702.03106.

3 Las Vegas approximation for metric $1$ -median selection