A lower bound for metric 111-median selection 111A preliminary version of this paper appears in Proceedings of the 30th Workshop on Combinatorial Mathematics and Computation Theory, Hualien, Taiwan, April 2013, pp. 65–68.

Ching-Lueh Chang 222Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan. Email: clchang@saturn.yzu.edu.tw 333Supported in part by the National Science Council of Taiwan under grant 101-2221-E-155-015-MY2.
Abstract

Consider the problem of finding a point in an n𝑛n-point metric space with the minimum average distance to all points. We show that this problem has no deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query (4βˆ’Ξ©β€‹(1))4Ξ©1(4-\Omega(1))-approximation algorithms.

1 Introduction

Given oracle access to a metric space ({1,2,…,n},d)12…𝑛𝑑(\{1,2,\ldots,n\},d), the metric 111-median problem asks for a point with the minimum average distance to all points. Indyk [8, 9] shows that metric 111-median has a Monte-Carlo O​(n/Ο΅2)𝑂𝑛superscriptitalic-Ο΅2O(n/\epsilon^{2})-time (1+Ο΅)1italic-Ο΅(1+\epsilon)-approximation algorithm with an Ω​(1)Ξ©1\Omega(1) probability of success. The more general metric kπ‘˜k-median problem asks for x1subscriptπ‘₯1x_{1}, x2subscriptπ‘₯2x_{2}, ……\ldots, xk∈{1,2,…,n}subscriptπ‘₯π‘˜12…𝑛x_{k}\in\{1,2,\ldots,n\} minimizing βˆ‘x∈{1,2,…,n}mini=1k⁑d​(xi,x)subscriptπ‘₯12…𝑛superscriptsubscript𝑖1π‘˜π‘‘subscriptπ‘₯𝑖π‘₯\sum_{x\in\{1,2,\ldots,n\}}\,\min_{i=1}^{k}\,d(x_{i},x). Randomized as well as evasive algorithms are well-studied for metric kπ‘˜k-median and the related kπ‘˜k-means problem [7, 12, 1, 4, 11, 10], where kβ‰₯1π‘˜1k\geq 1 is part of the input rather than a constant.

This paper focuses on deterministic sublinear-query algorithms for metric 111-median. Guha et al. [7, Sec. 3.1–3.2] prove that metric kπ‘˜k-median has a deterministic O​(n1+Ο΅)𝑂superscript𝑛1italic-Ο΅O(n^{1+\epsilon})-time O​(nΟ΅)𝑂superscript𝑛italic-Ο΅O(n^{\epsilon})-space 2O​(1/Ο΅)superscript2𝑂1italic-Ο΅2^{O(1/\epsilon)}-approximation algorithm that reads distances in a single pass, where Ο΅>0italic-Ο΅0\epsilon>0. Chang [3] presents a deterministic nonadaptive O​(n1.5)𝑂superscript𝑛1.5O(n^{1.5})-time 444-approximation algorithm for metric 111-median. Wu [14] generalizes Chang’s result by showing an O​(n1+1/h)𝑂superscript𝑛11β„ŽO(n^{1+1/h})-time 2​h2β„Ž2h-approximation algorithm for any integer hβ‰₯2β„Ž2h\geq 2. On the negative side, Chang [2] shows that metric 111-median has no deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query (3βˆ’Ο΅)3italic-Ο΅(3-\epsilon)-approximation algorithms for any constant Ο΅>0italic-Ο΅0\epsilon>0 [2]. This paper improves upon his result by showing that metric 111-median has no deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query (4βˆ’Ο΅)4italic-Ο΅(4-\epsilon)-approximation algorithms for any constant Ο΅>0italic-Ο΅0\epsilon>0.

In social network analysis, the importance of an actor in a network may be quantified by several centrality measures, among which the closeness centrality of an actor is defined to be its average distance to other actors [13]. So metric 111-median can be interpreted as the problem of finding the most important point in a metric space. Goldreich and Ron [6] and Eppstein and Wang [5] present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.

2 Definitions

For nβˆˆβ„•π‘›β„•n\in\mathbb{N}, denote [n]≑{1,2,…,n}delimited-[]𝑛12…𝑛[n]\equiv\{1,2,\ldots,n\}. Trivially, [0]=βˆ…delimited-[]0[0]=\emptyset. An n𝑛n-point metric space ([n],d)delimited-[]𝑛𝑑([n],d) is the set [n]delimited-[]𝑛[n], called the groundset, endowed with a function d:[n]Γ—[n]→ℝ:𝑑→delimited-[]𝑛delimited-[]𝑛ℝd\colon[n]\times[n]\to\mathbb{R} satisfying

  1. (1)

    d​(x,y)β‰₯0𝑑π‘₯𝑦0d(x,y)\geq 0 (non-negativeness),

  2. (2)

    d​(x,y)=0𝑑π‘₯𝑦0d(x,y)=0 if and only if x=yπ‘₯𝑦x=y (identity of indiscernibles),

  3. (3)

    d​(x,y)=d​(y,x)𝑑π‘₯𝑦𝑑𝑦π‘₯d(x,y)=d(y,x) (symmetry), and

  4. (4)

    d​(x,y)+d​(x,z)β‰₯d​(y,z)𝑑π‘₯𝑦𝑑π‘₯𝑧𝑑𝑦𝑧d(x,y)+d(x,z)\geq d(y,z) (triangle inequality)

for all xπ‘₯x, y𝑦y, z∈[n]𝑧delimited-[]𝑛z\in[n]. An equivalent definition requires the triangle inequality only for distinct xπ‘₯x, y𝑦y, z∈[n]𝑧delimited-[]𝑛z\in[n], axioms (1)–(3) remaining.

An algorithm with oracle access to a metric space ([n],d)delimited-[]𝑛𝑑([n],d) is given n𝑛n and may query d𝑑d on any (x,y)∈[n]Γ—[n]π‘₯𝑦delimited-[]𝑛delimited-[]𝑛(x,y)\in[n]\times[n] to obtain d​(x,y)𝑑π‘₯𝑦d(x,y). Without loss of generality, we forbid queries for d​(x,x)𝑑π‘₯π‘₯d(x,x), which trivially return 00, as well as repeated queries, where a query for d​(x,y)𝑑π‘₯𝑦d(x,y) is considered to repeat that for d​(y,x)𝑑𝑦π‘₯d(y,x). For convenience, denote an algorithm ALG with oracle access to ([n],d)delimited-[]𝑛𝑑([n],d) by ALGdsuperscriptALG𝑑\text{ALG}^{d}.

Given oracle access to a finite metric space ([n],d)delimited-[]𝑛𝑑([n],d), the metric 111-median problem asks for a point in [n]delimited-[]𝑛[n] with the minimum average distance to all points. An algorithm for this problem is α𝛼\alpha-approximate if it outputs a point x∈[n]π‘₯delimited-[]𝑛x\in[n] satisfying

βˆ‘y∈[n]d​(x,y)≀α​minxβ€²βˆˆ[n]β€‹βˆ‘y∈[n]d​(xβ€²,y),subscript𝑦delimited-[]𝑛𝑑π‘₯𝑦𝛼subscriptsuperscriptπ‘₯β€²delimited-[]𝑛subscript𝑦delimited-[]𝑛𝑑superscriptπ‘₯′𝑦\sum_{y\in[n]}\,d\left(x,y\right)\leq\alpha\,\min_{x^{\prime}\in[n]}\,\sum_{y\in[n]}\,d\left(x^{\prime},y\right),

where Ξ±β‰₯1𝛼1\alpha\geq 1.

The following theorem is due to Chang [3] and generalized by Wu [14].

Theorem 1 ([3, 14]).

Metric 111-median has a deterministic nonadaptive O​(n1.5)𝑂superscript𝑛1.5O(n^{1.5})-time 444-approximation algorithm.

3 Lower bound

Fix arbitrarily a deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query algorithm A𝐴A for metric 111-median and a constant δ∈(0,0.1)𝛿00.1\delta\in(0,0.1). By padding queries, we may assume the existence of a function q:β„€+β†’β„€+:π‘žβ†’superscriptβ„€superscriptβ„€q\colon\mathbb{Z}^{+}\to\mathbb{Z}^{+} such that A𝐴A makes exactly q​(n)=o​(n2)π‘žπ‘›π‘œsuperscript𝑛2q(n)=o(n^{2}) queries given oracle access to any metric space with groundset [n]delimited-[]𝑛[n].

We introduce some notations concerning a function d:[n]Γ—[n]→ℝ:𝑑→delimited-[]𝑛delimited-[]𝑛ℝd\colon[n]\times[n]\to\mathbb{R} to be determined later. For i∈[q​(n)]𝑖delimited-[]π‘žπ‘›i\in[q(n)], denote the i𝑖ith query of Adsuperscript𝐴𝑑A^{d} by (xi,yi)∈[n]Γ—[n]subscriptπ‘₯𝑖subscript𝑦𝑖delimited-[]𝑛delimited-[]𝑛(x_{i},y_{i})\in[n]\times[n]; in other words, the i𝑖ith query of Adsuperscript𝐴𝑑A^{d} asks for d​(xi,yi)𝑑subscriptπ‘₯𝑖subscript𝑦𝑖d(x_{i},y_{i}). Note that (xi,yi)subscriptπ‘₯𝑖subscript𝑦𝑖(x_{i},y_{i}) depends only on d​(x1,y1)𝑑subscriptπ‘₯1subscript𝑦1d(x_{1},y_{1}), d​(x2,y2)𝑑subscriptπ‘₯2subscript𝑦2d(x_{2},y_{2}), ……\ldots, d​(xiβˆ’1,yiβˆ’1)𝑑subscriptπ‘₯𝑖1subscript𝑦𝑖1d(x_{i-1},y_{i-1}) because A𝐴A is deterministic and has been fixed. For x∈[n]π‘₯delimited-[]𝑛x\in[n] and i∈{0,1,…,q​(n)}𝑖01β€¦π‘žπ‘›i\in\{0,1,\ldots,q(n)\},

Ni​(x)subscript𝑁𝑖π‘₯\displaystyle N_{i}(x)=def.superscriptdef.\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}{y∈[n]∣{(x,y),(y,x)}∩{(xj,yj)∣j∈[i]}β‰ βˆ…},conditional-set𝑦delimited-[]𝑛π‘₯𝑦𝑦π‘₯conditional-setsubscriptπ‘₯𝑗subscript𝑦𝑗𝑗delimited-[]𝑖\displaystyle\left\{y\in[n]\mid\left\{\left(x,y\right),\left(y,x\right)\right\}\cap\left\{\left(x_{j},y_{j}\right)\mid j\in\left[i\right]\right\}\neq\emptyset\right\},(1)
Ξ±i​(x)subscript𝛼𝑖π‘₯\displaystyle\alpha_{i}(x)=def.superscriptdef.\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}|Ni​(x)|,subscript𝑁𝑖π‘₯\displaystyle\left|\,N_{i}(x)\,\right|,(2)

following Chang [2] with a slight change in notation. Equivalently, Ξ±i​(x)subscript𝛼𝑖π‘₯\alpha_{i}(x) is the degree of xπ‘₯x in the undirected graph with vertex set [n]delimited-[]𝑛[n] and edge set {(xj,yj)∣j∈[i]}conditional-setsubscriptπ‘₯𝑗subscript𝑦𝑗𝑗delimited-[]𝑖\{(x_{j},y_{j})\mid j\in[i]\}. As [0]=βˆ…delimited-[]0[0]=\emptyset, Ξ±0​(x)=0subscript𝛼0π‘₯0\alpha_{0}(x)=0 for x∈[n]π‘₯delimited-[]𝑛x\in[n]. Note that Ξ±i​(β‹…)subscript𝛼𝑖⋅\alpha_{i}(\cdot) depends only on (x1,y1)subscriptπ‘₯1subscript𝑦1(x_{1},y_{1}), (x2,y2)subscriptπ‘₯2subscript𝑦2(x_{2},y_{2}), ……\ldots, (xi,yi)subscriptπ‘₯𝑖subscript𝑦𝑖(x_{i},y_{i}). Denote the output of Adsuperscript𝐴𝑑A^{d} by p𝑝p. By adding at most nβˆ’1=o​(n2)𝑛1π‘œsuperscript𝑛2n-1=o(n^{2}) dummy queries, we may assume without loss of generality that

(p,y)∈{(xi,yi)∣i∈[q​(n)]}𝑝𝑦conditional-setsubscriptπ‘₯𝑖subscript𝑦𝑖𝑖delimited-[]π‘žπ‘›\displaystyle\left(p,y\right)\in\left\{\left(x_{i},y_{i}\right)\mid i\in\left[q(n)\right]\right\}(3)

for all y∈[n]βˆ–{p}𝑦delimited-[]𝑛𝑝y\in[n]\setminus\{p\}. Consequently,

Ξ±q​(n)​(p)=nβˆ’1.subscriptπ›Όπ‘žπ‘›π‘π‘›1\displaystyle\alpha_{q(n)}(p)=n-1.(4)

Fix any set SβŠ†[n]𝑆delimited-[]𝑛S\subseteq[n] of size βŒˆΞ΄β€‹nβŒ‰π›Ώπ‘›\lceil\delta n\rceil, e.g., S=[βŒˆΞ΄β€‹nβŒ‰]𝑆delimited-[]𝛿𝑛S=[\lceil\delta n\rceil].

We proceed to construct d𝑑d by gradually freezing distances. For brevity, freezing the value of d​(x,y)𝑑π‘₯𝑦d(x,y) implicitly freezes d​(y,x)𝑑𝑦π‘₯d(y,x) to the same value, where xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n]. Inductively, having answered the first iβˆ’1𝑖1i-1 queries of Adsuperscript𝐴𝑑A^{d} by freezing d​(x1,y1)𝑑subscriptπ‘₯1subscript𝑦1d(x_{1},y_{1}), d​(x2,y2)𝑑subscriptπ‘₯2subscript𝑦2d(x_{2},y_{2}), ……\ldots, d​(xiβˆ’1,yiβˆ’1)𝑑subscriptπ‘₯𝑖1subscript𝑦𝑖1d(x_{i-1},y_{i-1}), where i∈[q​(n)]𝑖delimited-[]π‘žπ‘›i\in[q(n)], answer the i𝑖ith query by

d​(xi,yi)𝑑subscriptπ‘₯𝑖subscript𝑦𝑖\displaystyle d\left(x_{i},y_{i}\right)=\displaystyle={3,if xiyi∈S;3,if xi∈Syiβˆ‰S and Ξ±iβˆ’1​(xi)≀δ​n;3,if yi∈Sxiβˆ‰S and Ξ±iβˆ’1​(yi)≀δ​n;4,if xi∈Syiβˆ‰S and Ξ±iβˆ’1​(xi)>δ​n;4,if yi∈Sxiβˆ‰S and Ξ±iβˆ’1​(yi)>δ​n;2,if xiyiβˆ‰S and max⁑{Ξ±iβˆ’1​(xi),Ξ±iβˆ’1​(yi)}≀δ​n;4,if xiyiβˆ‰S and max⁑{Ξ±iβˆ’1​(xi),Ξ±iβˆ’1​(yi)}>δ​n.cases3if xiyi∈S;3if xi∈Syiβˆ‰S and Ξ±iβˆ’1(xi)≀δn;3if yi∈Sxiβˆ‰S and Ξ±iβˆ’1(yi)≀δn;4if xi∈Syiβˆ‰S and Ξ±iβˆ’1(xi)>Ξ΄n;4if yi∈Sxiβˆ‰S and Ξ±iβˆ’1(yi)>Ξ΄n;2if xiyiβˆ‰S and max⁑{Ξ±iβˆ’1(xi),Ξ±iβˆ’1(yi)}≀δn;4if xiyiβˆ‰S and max⁑{Ξ±iβˆ’1(xi),Ξ±iβˆ’1(yi)}>Ξ΄n.\displaystyle\left\{\begin{array}[]{ll}3,&\text{if $x_{i}$, $y_{i}\in S$;}\\ 3,&\text{if $x_{i}\in S$, $y_{i}\notin S$ and $\alpha_{i-1}(x_{i})\leq\delta n$;}\\ 3,&\text{if $y_{i}\in S$, $x_{i}\notin S$ and $\alpha_{i-1}(y_{i})\leq\delta n$;}\\ 4,&\text{if $x_{i}\in S$, $y_{i}\notin S$ and $\alpha_{i-1}(x_{i})>\delta n$;}\\ 4,&\text{if $y_{i}\in S$, $x_{i}\notin S$ and $\alpha_{i-1}(y_{i})>\delta n$;}\\ 2,&\text{if $x_{i}$, $y_{i}\notin S$ and $\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}\leq\delta n$;}\\ 4,&\text{if $x_{i}$, $y_{i}\notin S$ and $\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}>\delta n$.}\end{array}\right.(12)

It is not hard to verify that the seven cases in equation (12) are exhaustive and mutually exclusive. We have now frozen d​(xi,yi)𝑑subscriptπ‘₯𝑖subscript𝑦𝑖d(x_{i},y_{i}) for all i∈[q​(n)]𝑖delimited-[]π‘žπ‘›i\in[q(n)] and none of the other distances. As repeated queries are forbidden, equation (12) does not freeze one distance twice, preventing inconsistency.

Set

B𝐡\displaystyle B=def.superscriptdef.\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}{x∈[n]∣αq​(n)​(x)>δ​n},conditional-setπ‘₯delimited-[]𝑛subscriptπ›Όπ‘žπ‘›π‘₯𝛿𝑛\displaystyle\left\{x\in[n]\mid\alpha_{q(n)}(x)>\delta n\right\},(13)
p^^𝑝\displaystyle\hat{p}=def.superscriptdef.\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}argminx∈SΞ±q​(n)​(x),subscriptargminπ‘₯𝑆subscriptπ›Όπ‘žπ‘›π‘₯\displaystyle\mathop{\rm argmin}_{x\in S}\,\alpha_{q(n)}(x),(14)

breaking ties arbitrarily. For all distinct xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n] with (x,y)π‘₯𝑦(x,y), (y,x)βˆ‰{(xi,yi)∣i∈[q​(n)]}𝑦π‘₯conditional-setsubscriptπ‘₯𝑖subscript𝑦𝑖𝑖delimited-[]π‘žπ‘›(y,x)\notin\{(x_{i},y_{i})\mid i\in[q(n)]\}, let

d​(x,y)={1,if x=p^yβˆ‰SβˆͺB;1,if y=p^xβˆ‰SβˆͺB;3,if xy∈SβˆͺB;4,if x∈(SβˆͺB)βˆ–{p^} and yβˆ‰(SβˆͺBβˆͺ{p^});4,if y∈(SβˆͺB)βˆ–{p^} and xβˆ‰(SβˆͺBβˆͺ{p^});2,otherwise.𝑑π‘₯𝑦cases1if x=p^yβˆ‰SβˆͺB;1if y=p^xβˆ‰SβˆͺB;3if xy∈SβˆͺB;4if x∈(SβˆͺB)βˆ–{p^} and yβˆ‰(SβˆͺBβˆͺ{p^});4if y∈(SβˆͺB)βˆ–{p^} and xβˆ‰(SβˆͺBβˆͺ{p^});2otherwise.\displaystyle d\left(x,y\right)=\left\{\begin{array}[]{ll}1,&\text{if $x=\hat{p}$, $y\notin S\cup B$;}\\ 1,&\text{if $y=\hat{p}$, $x\notin S\cup B$;}\\ 3,&\text{if $x$, $y\in S\cup B$;}\\ 4,&\text{if $x\in(S\cup B)\setminus\{\hat{p}\}$ and $y\notin(S\cup B\cup\{\hat{p}\})$;}\\ 4,&\text{if $y\in(S\cup B)\setminus\{\hat{p}\}$ and $x\notin(S\cup B\cup\{\hat{p}\})$;}\\ 2,&\text{otherwise.}\end{array}\right.(21)

Clearly, the six cases in equation (21) are exhaustive and mutually exclusive. Furthermore, equation (21) assigns the same value to d​(x,y)𝑑π‘₯𝑦d(x,y) and d​(y,x)𝑑𝑦π‘₯d(y,x). Finally, for all x∈[n]π‘₯delimited-[]𝑛x\in[n],

d​(x,x)=0.𝑑π‘₯π‘₯0\displaystyle d\left(x,x\right)=0.(22)

Equations (12), (21) and (22) complete the construction of d𝑑d by freezing all distances.

The following lemma is straightforward.

Lemma 2.

For all distinct xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n], d​(x,y)∈{1,2,3,4}𝑑π‘₯𝑦1234d(x,y)\in\{1,2,3,4\}.

Below is an immediate consequence of equation (14).

Lemma 3.

p^∈S^𝑝𝑆\hat{p}\in S.

The following lemma is a consequence of equations (1)–(2) and our forbidding repeated queries.

Lemma 4.

For all x∈[n]π‘₯delimited-[]𝑛x\in[n] and i∈[q​(n)]𝑖delimited-[]π‘žπ‘›i\in[q(n)],

Ξ±i​(x)βˆ’Ξ±iβˆ’1​(x)={0,if xβˆ‰{xi,yi};1,otherwise.subscript𝛼𝑖π‘₯subscript𝛼𝑖1π‘₯cases0if xβˆ‰{xi,yi};1otherwise.\displaystyle\alpha_{i}(x)-\alpha_{i-1}(x)=\left\{\begin{array}[]{ll}0,&\text{if $x\notin\{x_{i},y_{i}\}$;}\\ 1,&\text{otherwise.}\end{array}\right.
Proof.

The case of xβˆ‰{xi,yi}π‘₯subscriptπ‘₯𝑖subscript𝑦𝑖x\notin\{x_{i},y_{i}\} is immediate from equations (1)–(2). Suppose that x∈{xi,yi}π‘₯subscriptπ‘₯𝑖subscript𝑦𝑖x\in\{x_{i},y_{i}\}. By symmetry, we may assume x=xiπ‘₯subscriptπ‘₯𝑖x=x_{i}. So by equation (1),

Ni​(x)=Niβˆ’1​(x)βˆͺ{yi}.subscript𝑁𝑖π‘₯subscript𝑁𝑖1π‘₯subscript𝑦𝑖\displaystyle N_{i}(x)=N_{i-1}(x)\cup\left\{y_{i}\right\}.(24)

As (x,yi)=(xi,yi)π‘₯subscript𝑦𝑖subscriptπ‘₯𝑖subscript𝑦𝑖(x,y_{i})=(x_{i},y_{i}) is the i𝑖ith query and we forbid repeated queries,

yiβˆ‰Niβˆ’1​(x)subscript𝑦𝑖subscript𝑁𝑖1π‘₯\displaystyle y_{i}\notin N_{i-1}(x)(25)

by equation (1).444In detail, if yi∈Niβˆ’1​(x)subscript𝑦𝑖subscript𝑁𝑖1π‘₯y_{i}\in N_{i-1}(x), then (xj,yj)∈{(x,yi),(yi,x)}subscriptπ‘₯𝑗subscript𝑦𝑗π‘₯subscript𝑦𝑖subscript𝑦𝑖π‘₯(x_{j},y_{j})\in\{(x,y_{i}),(y_{i},x)\} for some j∈[iβˆ’1]𝑗delimited-[]𝑖1j\in[i-1] by equation (1); hence the i𝑖ith query (xi,yi)=(x,yi)subscriptπ‘₯𝑖subscript𝑦𝑖π‘₯subscript𝑦𝑖(x_{i},y_{i})=(x,y_{i}) repeats the j𝑗jth query, a contradiction. Equations (2) and (24)–(25) complete the proof. ∎

In short, Lemma 4 says that adding the edge (xi,yi)subscriptπ‘₯𝑖subscript𝑦𝑖(x_{i},y_{i}) to an undirected graph without that edge increases the degree of xπ‘₯x by 111 if and only if x∈{xi,yi}π‘₯subscriptπ‘₯𝑖subscript𝑦𝑖x\in\{x_{i},y_{i}\}.

Lemma 5.

For all x∈[n]π‘₯delimited-[]𝑛x\in[n] and i∈[q​(n)+1]𝑖delimited-[]π‘žπ‘›1i\in[q(n)+1], if Ξ±iβˆ’1​(x)>δ​nsubscript𝛼𝑖1π‘₯𝛿𝑛\alpha_{i-1}(x)>\delta n, then x∈Bπ‘₯𝐡x\in B.

Proof.

By Lemma 4, Ξ±q​(n)​(x)β‰₯Ξ±iβˆ’1​(x)subscriptπ›Όπ‘žπ‘›π‘₯subscript𝛼𝑖1π‘₯\alpha_{q(n)}(x)\geq\alpha_{i-1}(x). Invoking equation (13) then completes the proof. ∎

Lemma 6.
βˆ‘x∈[n]Ξ±q​(n)​(x)=2​q​(n).subscriptπ‘₯delimited-[]𝑛subscriptπ›Όπ‘žπ‘›π‘₯2π‘žπ‘›\sum_{x\in[n]}\,\alpha_{q(n)}(x)=2\,q(n).
Proof.

Recall that the left-hand side is the sum of degrees in the undirected graph with vertex set [n]delimited-[]𝑛[n] and edge set {(xi,yi)∣i∈[q​(n)]}conditional-setsubscriptπ‘₯𝑖subscript𝑦𝑖𝑖delimited-[]π‘žπ‘›\{(x_{i},y_{i})\mid i\in[q(n)]\}. As we forbid repeated queries, |{(xi,yi)∣i∈[q​(n)]}|=q​(n)conditional-setsubscriptπ‘₯𝑖subscript𝑦𝑖𝑖delimited-[]π‘žπ‘›π‘žπ‘›\left|\,\{(x_{i},y_{i})\mid i\in[q(n)]\}\,\right|=q(n) Finally, it is a basic fact in graph theory that the sum of degrees in an undirected graph equals twice the number of edges. ∎

Lemma 7 (Implicit in [2, Lemma 13]).

|B|=o​(n)π΅π‘œπ‘›|B|=o(n).

Proof.

We have

|B|​δ​n≀equation (13)βˆ‘x∈BΞ±q​(n)​(x)β‰€βˆ‘x∈[n]Ξ±q​(n)​(x)=Lemma 62​q​(n).superscriptequation (13)𝐡𝛿𝑛subscriptπ‘₯𝐡subscriptπ›Όπ‘žπ‘›π‘₯subscriptπ‘₯delimited-[]𝑛subscriptπ›Όπ‘žπ‘›π‘₯superscriptLemma 62π‘žπ‘›|B|\,\delta n\stackrel{{\scriptstyle\text{equation~{}(\ref{badpoints})}}}{{\leq}}\sum_{x\in B}\alpha_{q(n)}(x)\leq\sum_{x\in[n]}\alpha_{q(n)}(x)\stackrel{{\scriptstyle\text{Lemma~{}\ref{sumofdegrees}}}}{{=}}2\,q(n).

This gives |B|=o​(n)π΅π‘œπ‘›|B|=o(n) as δ∈(0,0.1)𝛿00.1\delta\in(0,0.1) is a constant and q​(n)=o​(n2)π‘žπ‘›π‘œsuperscript𝑛2q(n)=o(n^{2}). ∎

Lemma 8.

For all sufficiently large n𝑛n and all i∈[q​(n)+1]𝑖delimited-[]π‘žπ‘›1i\in[q(n)+1],

Ξ±iβˆ’1​(p^)subscript𝛼𝑖1^𝑝\displaystyle\alpha_{i-1}\left(\hat{p}\right)≀\displaystyle\leqδ​n.𝛿𝑛\displaystyle\delta n.(26)
Proof.

By Lemma 7, |S|=βŒˆΞ΄β€‹nβŒ‰π‘†π›Ώπ‘›|S|=\lceil\delta n\rceil and δ∈(0,0.1)𝛿00.1\delta\in(0,0.1) being a constant, Sβˆ–Bβ‰ βˆ…π‘†π΅S\setminus B\neq\emptyset for all sufficiently large n𝑛n. By equation (13), Sβˆ–Bβ‰ βˆ…π‘†π΅S\setminus B\neq\emptyset Ξ±q​(n)​(x)≀δ​nsubscriptπ›Όπ‘žπ‘›π‘₯𝛿𝑛\alpha_{q(n)}(x)\leq\delta n for some x∈Sπ‘₯𝑆x\in S, which together with equation (14) gives Ξ±q​(n)​(p^)≀δ​nsubscriptπ›Όπ‘žπ‘›^𝑝𝛿𝑛\alpha_{q(n)}(\hat{p})\leq\delta n. Finally, Lemma 4 and Ξ±q​(n)​(p^)≀δ​nsubscriptπ›Όπ‘žπ‘›^𝑝𝛿𝑛\alpha_{q(n)}(\hat{p})\leq\delta n imply inequality (26) for all i∈[q​(n)+1]𝑖delimited-[]π‘žπ‘›1i\in[q(n)+1]. ∎

Henceforth, assume n𝑛n to be sufficiently large to satisfy inequality (26) for all i∈[q​(n)+1]𝑖delimited-[]π‘žπ‘›1i\in[q(n)+1].

Lemma 9.

For all xπ‘₯x, y∈[n]𝑦delimited-[]𝑛y\in[n], if d​(x,y)=1𝑑π‘₯𝑦1d(x,y)=1, then one of the following conditions is true:

  • β€’

    x=p^π‘₯^𝑝x=\hat{p}and yβˆ‰SβˆͺB𝑦𝑆𝐡y\notin S\cup B;

  • β€’

    y=p^𝑦^𝑝y=\hat{p}and xβˆ‰SβˆͺBπ‘₯𝑆𝐡x\notin S\cup B.

Proof.

Inspect equation (21), which is the only equation that may set distances to 111. ∎

Lemma 10.

For all distinct xπ‘₯x, y∈[n]βˆ–(SβˆͺB)𝑦delimited-[]𝑛𝑆𝐡y\in[n]\setminus(S\cup B), d​(x,y)=2𝑑π‘₯𝑦2d\left(x,y\right)=2.

Proof.

By Lemma 5, max⁑{Ξ±iβˆ’1​(xi),Ξ±iβˆ’1​(yi)}>δ​nsubscript𝛼𝑖1subscriptπ‘₯𝑖subscript𝛼𝑖1subscript𝑦𝑖𝛿𝑛\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}>\delta n means {xi,yi}∩Bβ‰ βˆ…subscriptπ‘₯𝑖subscript𝑦𝑖𝐡\{x_{i},y_{i}\}\cap B\neq\emptyset, where i∈[q​(n)+1]𝑖delimited-[]π‘žπ‘›1i\in[q(n)+1]. So only the second-to-last case in equation (12), which sets d​(xi,yi)=2𝑑subscriptπ‘₯𝑖subscript𝑦𝑖2d(x_{i},y_{i})=2, may be consistent with xisubscriptπ‘₯𝑖x_{i}, yiβˆ‰SβˆͺBsubscript𝑦𝑖𝑆𝐡y_{i}\notin S\cup B.

By Lemma 3, p^∈S^𝑝𝑆\hat{p}\in S. So only the last case in equation (21), which sets d​(x,y)=2𝑑π‘₯𝑦2d(x,y)=2, may be consistent with xπ‘₯x, yβˆ‰SβˆͺB𝑦𝑆𝐡y\notin S\cup B. ∎

Lemma 11.

For all x∈[n]βˆ–{p^}π‘₯delimited-[]𝑛^𝑝x\in[n]\setminus\{\hat{p}\}, d​(p^,x)∈{1,3}𝑑^𝑝π‘₯13d(\hat{p},x)\in\{1,3\}.

Proof.

By Lemma 3 and inequality (26), only the first three cases in equation (12), which set d​(xi,yi)=3𝑑subscriptπ‘₯𝑖subscript𝑦𝑖3d(x_{i},y_{i})=3, may be consistent with xi=p^subscriptπ‘₯𝑖^𝑝x_{i}=\hat{p} or yi=p^subscript𝑦𝑖^𝑝y_{i}=\hat{p}.

Again by Lemma 3, only the first three cases in equation (21), which set d​(x,y)∈{1,3}𝑑π‘₯𝑦13d(x,y)\in\{1,3\}, may be consistent with x=p^π‘₯^𝑝x=\hat{p} or y=p^𝑦^𝑝y=\hat{p}. ∎

Lemma 12.

There do not exist distinct xπ‘₯x, y𝑦y, z∈[n]𝑧delimited-[]𝑛z\in[n] with d​(x,y)=1𝑑π‘₯𝑦1d(x,y)=1 and {d​(x,z),d​(y,z)}={2,4}𝑑π‘₯𝑧𝑑𝑦𝑧24\{d(x,z),d(y,z)\}=\{2,4\}.

Proof.

By Lemma 9, d​(x,y)=1𝑑π‘₯𝑦1d(x,y)=1 implies p^∈{x,y}^𝑝π‘₯𝑦\hat{p}\in\{x,y\}. By symmetry, assume x=p^π‘₯^𝑝x=\hat{p}. Then d​(x,z)∈{1,3}𝑑π‘₯𝑧13d(x,z)\in\{1,3\} by Lemma 11. ∎

Lemma 13.

There do not exist distinct xπ‘₯x, y𝑦y, z∈[n]𝑧delimited-[]𝑛z\in[n] with d​(x,y)=d​(x,z)=1𝑑π‘₯𝑦𝑑π‘₯𝑧1d(x,y)=d(x,z)=1 and d​(y,z)∈{3,4}𝑑𝑦𝑧34d(y,z)\in\{3,4\}.

Proof.

By Lemma 9, d​(x,y)=d​(x,z)=1𝑑π‘₯𝑦𝑑π‘₯𝑧1d(x,y)=d(x,z)=1 implies x=p^π‘₯^𝑝x=\hat{p} and y𝑦y, zβˆ‰SβˆͺB𝑧𝑆𝐡z\notin S\cup B. Then d​(y,z)=2𝑑𝑦𝑧2d(y,z)=2 by Lemma 10. ∎

Lemmas 12–13 forbid all possible violations of the triangle inequality, yielding the following lemma.

Lemma 14.

([n],d)delimited-[]𝑛𝑑([n],d)is a metric space.

Proof.

Lemmas 2 and 12–13 establish the triangle inequality for d𝑑d. Furthermore, d𝑑d is symmetric because (1) freezing d​(x,y)𝑑π‘₯𝑦d(x,y) automatically freezes d​(y,x)𝑑𝑦π‘₯d(y,x) to the same value, (2) forbidding repeated queries prevents equation (12) from assigning inconsistent values to one distance and (3) equation (21) is symmetric. All the other axioms for metrics are easy to verify. ∎

Recall that p𝑝p denotes the output of Adsuperscript𝐴𝑑A^{d}. We proceed to compare βˆ‘x∈[n]d​(p,x)subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯\sum_{x\in[n]}\,d(p,x) with βˆ‘x∈[n]d​(p^,x)subscriptπ‘₯delimited-[]𝑛𝑑^𝑝π‘₯\sum_{x\in[n]}\,d(\hat{p},x).

Lemma 15.

There exist k​(1)π‘˜1k(1), k​(2)π‘˜2k(2), ……\ldots, k​(nβˆ’1)∈[q​(n)]π‘˜π‘›1delimited-[]π‘žπ‘›k(n-1)\in[q(n)] and distinct zk​(1)subscriptπ‘§π‘˜1z_{k(1)}, zk​(2)subscriptπ‘§π‘˜2z_{k(2)}, ……\ldots, zk​(nβˆ’1)∈[n]subscriptπ‘§π‘˜π‘›1delimited-[]𝑛z_{k(n-1)}\in[n] such that

Ξ±k​(t)βˆ’1​(p)subscriptπ›Όπ‘˜π‘‘1𝑝\displaystyle\alpha_{k(t)-1}(p)=\displaystyle=tβˆ’1,𝑑1\displaystyle t-1,(27)
Ξ±k​(t)​(p)subscriptπ›Όπ‘˜π‘‘π‘\displaystyle\ \alpha_{k(t)}(p)=\displaystyle=t,𝑑\displaystyle t,(28)
(p,zk​(t))𝑝subscriptπ‘§π‘˜π‘‘\displaystyle\left(p,z_{k(t)}\right)∈\displaystyle\in{(xk​(t),yk​(t)),(yk​(t),xk​(t))}subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘subscriptπ‘₯π‘˜π‘‘\displaystyle\left\{\left(x_{k(t)},y_{k(t)}\right),\left(y_{k(t)},x_{k(t)}\right)\right\}(29)

for all t∈[nβˆ’1]𝑑delimited-[]𝑛1t\in[n-1].

Proof.

By Lemma 4, equation (4) and the easy fact that Ξ±0​(p)=0subscript𝛼0𝑝0\alpha_{0}(p)=0, there exist distinct k​(1)π‘˜1k(1), k​(2)π‘˜2k(2), ……\ldots, k​(nβˆ’1)∈[q​(n)]π‘˜π‘›1delimited-[]π‘žπ‘›k(n-1)\in[q(n)] satisfying equations (27)–(28) for all t∈[nβˆ’1]𝑑delimited-[]𝑛1t\in[n-1].555Observe that Ξ±i​(p)subscript𝛼𝑖𝑝\alpha_{i}(p) must go through all of 00, 111, ……\ldots, nβˆ’1𝑛1n-1 as i𝑖i increases from 00 to q​(n)π‘žπ‘›q(n). Lemma 4 and equations (27)–(28) imply p∈{xk​(t),yk​(t)}𝑝subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘p\in\{x_{k(t)},y_{k(t)}\}, establishing the existence of zk​(t)subscriptπ‘§π‘˜π‘‘z_{k(t)} satisfying equation (29). If zk​(1)subscriptπ‘§π‘˜1z_{k(1)}, zk​(2)subscriptπ‘§π‘˜2z_{k(2)}, ……\ldots, zk​(nβˆ’1)subscriptπ‘§π‘˜π‘›1z_{k(n-1)} are not distinct, then there are repeated queries by equation (29), a contradiction. ∎

From now on, let k​(1)π‘˜1k(1), k​(2)π‘˜2k(2), ……\ldots, k​(nβˆ’1)∈[q​(n)]π‘˜π‘›1delimited-[]π‘žπ‘›k(n-1)\in[q(n)] and distinct zk​(1)subscriptπ‘§π‘˜1z_{k(1)}, zk​(2)subscriptπ‘§π‘˜2z_{k(2)}, ……\ldots, zk​(nβˆ’1)∈[n]subscriptπ‘§π‘˜π‘›1delimited-[]𝑛z_{k(n-1)}\in[n] satisfy equations (27)–(29) for all t∈[nβˆ’1]𝑑delimited-[]𝑛1t\in[n-1].

Lemma 16.

For each t∈[nβˆ’1]𝑑delimited-[]𝑛1t\in[n-1], if tβ‰₯βŒˆΞ΄β€‹nβŒ‰+2𝑑𝛿𝑛2t\geq\lceil\delta n\rceil+2 and zk​(t)βˆ‰Ssubscriptπ‘§π‘˜π‘‘π‘†z_{k(t)}\notin S, then d​(p,zk​(t))=4𝑑𝑝subscriptπ‘§π‘˜π‘‘4d(p,z_{k(t)})=4.

Proof.

Assume in equation (29) that p=xk​(t)𝑝subscriptπ‘₯π‘˜π‘‘p=x_{k(t)} and zk​(t)=yk​(t)subscriptπ‘§π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘z_{k(t)}=y_{k(t)}; the other case will be symmetric. By equation (27),

Ξ±k​(t)βˆ’1​(xk​(t))=tβˆ’1>δ​n.subscriptπ›Όπ‘˜π‘‘1subscriptπ‘₯π‘˜π‘‘π‘‘1𝛿𝑛\displaystyle\alpha_{k(t)-1}\left(x_{k(t)}\right)=t-1>\delta n.(30)
  1. Case 1:

    xk​(t)∈Ssubscriptπ‘₯π‘˜π‘‘π‘†x_{k(t)}\in S. By equation (12), xk​(t)∈Ssubscriptπ‘₯π‘˜π‘‘π‘†x_{k(t)}\in S and yk​(t)=zk​(t)βˆ‰Ssubscriptπ‘¦π‘˜π‘‘subscriptπ‘§π‘˜π‘‘π‘†y_{k(t)}=z_{k(t)}\notin S,

    d​(xk​(t),yk​(t))={3,if Ξ±k​(t)βˆ’1​(xk​(t))≀δ​n;4,if Ξ±k​(t)βˆ’1​(xk​(t))>δ​n.𝑑subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘cases3if Ξ±k(t)βˆ’1(xk(t))≀δn;4if Ξ±k(t)βˆ’1(xk(t))>Ξ΄n.\displaystyle d\left(x_{k(t)},y_{k(t)}\right)=\left\{\begin{array}[]{ll}3,&\text{if $\alpha_{k(t)-1}(x_{k(t)})\leq\delta n$;}\\ 4,&\text{if $\alpha_{k(t)-1}(x_{k(t)})>\delta n$.}\end{array}\right.(33)
  2. Case 2:

    xk​(t)βˆ‰Ssubscriptπ‘₯π‘˜π‘‘π‘†x_{k(t)}\notin S. By equation (12), xk​(t)βˆ‰Ssubscriptπ‘₯π‘˜π‘‘π‘†x_{k(t)}\notin S and yk​(t)=zk​(t)βˆ‰Ssubscriptπ‘¦π‘˜π‘‘subscriptπ‘§π‘˜π‘‘π‘†y_{k(t)}=z_{k(t)}\notin S,

    d​(xk​(t),yk​(t))={2,if max⁑{Ξ±k​(t)βˆ’1​(xk​(t)),Ξ±k​(t)βˆ’1​(yk​(t))}≀δ​n;4,if max⁑{Ξ±k​(t)βˆ’1​(xk​(t)),Ξ±k​(t)βˆ’1​(yk​(t))}>δ​n.𝑑subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘cases2if max⁑{Ξ±k(t)βˆ’1(xk(t)),Ξ±k(t)βˆ’1(yk(t))}≀δn;4if max⁑{Ξ±k(t)βˆ’1(xk(t)),Ξ±k(t)βˆ’1(yk(t))}>Ξ΄n.\displaystyle d\left(x_{k(t)},y_{k(t)}\right)=\left\{\begin{array}[]{ll}2,&\text{if $\max\{\alpha_{k(t)-1}(x_{k(t)}),\alpha_{k(t)-1}(y_{k(t)})\}\leq\delta n$;}\\ 4,&\text{if $\max\{\alpha_{k(t)-1}(x_{k(t)}),\alpha_{k(t)-1}(y_{k(t)})\}>\delta n$.}\end{array}\right.(36)

Equation (30) together with any one of equations (33)–(36) implies d​(xk​(t),yk​(t))=4𝑑subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘4d(x_{k(t)},y_{k(t)})=4. Hence d​(p,zk​(t))=d​(xk​(t),yk​(t))=4𝑑𝑝subscriptπ‘§π‘˜π‘‘π‘‘subscriptπ‘₯π‘˜π‘‘subscriptπ‘¦π‘˜π‘‘4d(p,z_{k(t)})=d(x_{k(t)},y_{k(t)})=4. ∎

We are now able to analyze the quality of p𝑝p as a solution to metric 111-median.

Lemma 17.
βˆ‘x∈[n]d​(p,x)β‰₯4​(nβˆ’2β€‹βŒˆΞ΄β€‹nβŒ‰βˆ’2).subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯4𝑛2𝛿𝑛2\sum_{x\in[n]}\,d\left(p,x\right)\geq 4\left(n-2\left\lceil\delta n\right\rceil-2\right).
Proof.

By the distinctness of zk​(1)subscriptπ‘§π‘˜1z_{k(1)}, zk​(2)subscriptπ‘§π‘˜2z_{k(2)}, ……\ldots, zk​(nβˆ’1)subscriptπ‘§π‘˜π‘›1z_{k(n-1)} in Lemma 15,

βˆ‘x∈[n]d​(p,x)β‰₯βˆ‘t∈[nβˆ’1]d​(p,zk​(t)).subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯subscript𝑑delimited-[]𝑛1𝑑𝑝subscriptπ‘§π‘˜π‘‘\displaystyle\sum_{x\in[n]}\,d\left(p,x\right)\geq\sum_{t\in[n-1]}\,d\left(p,z_{k(t)}\right).(37)

Write A={t∈[nβˆ’1]∣zk​(t)∈S}𝐴conditional-set𝑑delimited-[]𝑛1subscriptπ‘§π‘˜π‘‘π‘†A=\{t\in[n-1]\mid z_{k(t)}\in S\}. As zk​(1)subscriptπ‘§π‘˜1z_{k(1)}, zk​(2)subscriptπ‘§π‘˜2z_{k(2)}, ……\ldots, zk​(nβˆ’1)subscriptπ‘§π‘˜π‘›1z_{k(n-1)} are distinct,

|A|≀|S|.𝐴𝑆\displaystyle|A|\leq|S|.(38)

Furthermore,

βˆ‘t∈[nβˆ’1]d​(p,zk​(t))subscript𝑑delimited-[]𝑛1𝑑𝑝subscriptπ‘§π‘˜π‘‘\displaystyle\sum_{t\in[n-1]}\,d\left(p,z_{k(t)}\right)(39)
β‰₯\displaystyle\geqβˆ‘t∈[nβˆ’1],tβ‰₯βŒˆΞ΄β€‹nβŒ‰+2,tβˆ‰Ad​(p,zk​(t))subscriptformulae-sequence𝑑delimited-[]𝑛1formulae-sequence𝑑𝛿𝑛2𝑑𝐴𝑑𝑝subscriptπ‘§π‘˜π‘‘\displaystyle\sum_{t\in[n-1],\,t\geq\lceil\delta n\rceil+2,\,t\notin A}\,d\left(p,z_{k(t)}\right)
=Lemma 16superscriptLemma 16\displaystyle\stackrel{{\scriptstyle\text{Lemma~{}\ref{algorithmoutputtypicaldistances}}}}{{=}}βˆ‘t∈[nβˆ’1],tβ‰₯βŒˆΞ΄β€‹nβŒ‰+2,tβˆ‰A 4subscriptformulae-sequence𝑑delimited-[]𝑛1formulae-sequence𝑑𝛿𝑛2𝑑𝐴4\displaystyle\sum_{t\in[n-1],\,t\geq\lceil\delta n\rceil+2,\,t\notin A}\,4
β‰₯\displaystyle\geq4​(nβˆ’βŒˆΞ΄β€‹nβŒ‰βˆ’2βˆ’|A|).4𝑛𝛿𝑛2𝐴\displaystyle 4\left(n-\left\lceil\delta n\right\rceil-2-|A|\right).

Equations (37)–(39) and |S|=βŒˆΞ΄β€‹nβŒ‰π‘†π›Ώπ‘›|S|=\lceil\delta n\rceil complete the proof. ∎

We now analyze the quality of p^^𝑝\hat{p} as a solution to metric 111-median. The following lemma is immediate from equation (21).

Lemma 18.

For all y∈[n]βˆ–(SβˆͺB)𝑦delimited-[]𝑛𝑆𝐡y\in[n]\setminus(S\cup B), if yβ‰ p^𝑦^𝑝y\neq\hat{p} and (p^,y)^𝑝𝑦(\hat{p},y), (y,p^)βˆ‰{(xj,yj)∣j∈[q​(n)]}𝑦^𝑝conditional-setsubscriptπ‘₯𝑗subscript𝑦𝑗𝑗delimited-[]π‘žπ‘›(y,\hat{p})\notin\{(x_{j},y_{j})\mid j\in[q(n)]\}, then d​(p^,y)=1𝑑^𝑝𝑦1d(\hat{p},y)=1.

Lemma 19.
βˆ‘y∈[n]d​(p^,y)≀n+3β‹…(βŒˆΞ΄β€‹nβŒ‰+o​(n)+δ​n).subscript𝑦delimited-[]𝑛𝑑^𝑝𝑦𝑛⋅3π›Ώπ‘›π‘œπ‘›π›Ώπ‘›\sum_{y\in[n]}\,d\left(\hat{p},y\right)\leq n+3\cdot\left(\left\lceil\delta n\right\rceil+o(n)+\delta n\right).
Proof.

By equation (1),

Nq​(n)​(p^)={y∈[n]∣{(p^,y),(y,p^)}∩{(xj,yj)∣j∈[q​(n)]}β‰ βˆ…}.subscriptπ‘π‘žπ‘›^𝑝conditional-set𝑦delimited-[]𝑛^𝑝𝑦𝑦^𝑝conditional-setsubscriptπ‘₯𝑗subscript𝑦𝑗𝑗delimited-[]π‘žπ‘›N_{q(n)}\left(\hat{p}\right)=\left\{y\in[n]\mid\left\{\left(\hat{p},y\right),\left(y,\hat{p}\right)\right\}\cap\left\{\left(x_{j},y_{j}\right)\mid j\in\left[q(n)\right]\right\}\neq\emptyset\right\}.

This and Lemma 18 imply d​(p^,y)=1𝑑^𝑝𝑦1d(\hat{p},y)=1 for all y∈[n]βˆ–(SβˆͺB)𝑦delimited-[]𝑛𝑆𝐡y\in[n]\setminus(S\cup B) with yβ‰ p^𝑦^𝑝y\neq\hat{p} and yβˆ‰Nq​(n)​(p^)𝑦subscriptπ‘π‘žπ‘›^𝑝y\notin N_{q(n)}(\hat{p}). Therefore,

βˆ‘y∈[n]βˆ–(SβˆͺBβˆͺNq​(n)​(p^))d​(p^,y)≀nβˆ’|SβˆͺBβˆͺNq​(n)​(p^)|.subscript𝑦delimited-[]𝑛𝑆𝐡subscriptπ‘π‘žπ‘›^𝑝𝑑^𝑝𝑦𝑛𝑆𝐡subscriptπ‘π‘žπ‘›^𝑝\displaystyle\sum_{y\in[n]\setminus(S\cup B\cup N_{q(n)}(\hat{p}))}\,d\left(\hat{p},y\right)\leq n-\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|.(40)

Clearly,

βˆ‘y∈SβˆͺBβˆͺNq​(n)​(p^)d​(p^,y)≀Lemma 2βˆ‘y∈SβˆͺBβˆͺNq​(n)​(p^) 4=4β‹…|SβˆͺBβˆͺNq​(n)​(p^)|superscriptLemma 2subscript𝑦𝑆𝐡subscriptπ‘π‘žπ‘›^𝑝𝑑^𝑝𝑦subscript𝑦𝑆𝐡subscriptπ‘π‘žπ‘›^𝑝4β‹…4𝑆𝐡subscriptπ‘π‘žπ‘›^𝑝\displaystyle\sum_{y\in S\cup B\cup N_{q(n)}(\hat{p})}\,d\left(\hat{p},y\right)\stackrel{{\scriptstyle\text{Lemma~{}\ref{distancesarezeroto4}}}}{{\leq}}\sum_{y\in S\cup B\cup N_{q(n)}(\hat{p})}\,4=4\cdot\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|(41)

Furthermore,

|Nq​(n)​(p^)|=equation (2)Ξ±q​(n)​(p^)≀inequality (26)δ​n.superscriptequation (2)subscriptπ‘π‘žπ‘›^𝑝subscriptπ›Όπ‘žπ‘›^𝑝superscriptinequality (26)𝛿𝑛\displaystyle\left|\,N_{q(n)}\left(\hat{p}\right)\,\right|\stackrel{{\scriptstyle\text{equation~{}(\ref{numberoffrozenincidentdistances})}}}{{=}}\alpha_{q(n)}\left(\hat{p}\right)\stackrel{{\scriptstyle\text{inequality~{}(\ref{sparselyaskedpointequation})}}}{{\leq}}\delta n.

This and Lemma 7 imply

|SβˆͺBβˆͺNq​(n)​(p^)|β‰€βŒˆΞ΄β€‹nβŒ‰+o​(n)+δ​n𝑆𝐡subscriptπ‘π‘žπ‘›^π‘π›Ώπ‘›π‘œπ‘›π›Ώπ‘›\displaystyle\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|\leq\left\lceil\delta n\right\rceil+o(n)+\delta n(42)

as |S|=βŒˆΞ΄β€‹nβŒ‰π‘†π›Ώπ‘›|S|=\lceil\delta n\rceil. To complete the proof, sum up inequalities (40)–(41) and then use inequality (42) in the trivial way. ∎

Combining Lemmas 1417 and 19 yields our main theorem, stated below.

Theorem 20.

Metric 111-median has no deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query (4βˆ’Ο΅)4italic-Ο΅(4-\epsilon)-approximation algorithm for any constant Ο΅>0italic-Ο΅0\epsilon>0.

Proof.

Lemma 14 asserts that ([n],d)delimited-[]𝑛𝑑([n],d) is a metric space. By Lemmas 17 and 19,

βˆ‘x∈[n]d​(p,x)β‰₯4​(1βˆ’8β€‹Ξ΄βˆ’o​(1))β€‹βˆ‘x∈[n]d​(p^,x).subscriptπ‘₯delimited-[]𝑛𝑑𝑝π‘₯418π›Ώπ‘œ1subscriptπ‘₯delimited-[]𝑛𝑑^𝑝π‘₯\sum_{x\in[n]}\,d\left(p,x\right)\geq 4\left(1-8\delta-o(1)\right)\sum_{x\in[n]}\,d\left(\hat{p},x\right).

This proves the theorem because the deterministic o​(n2)π‘œsuperscript𝑛2o(n^{2})-query algorithm A𝐴A and the constant δ∈(0,0.1)𝛿00.1\delta\in(0,0.1) are picked arbitrarily (note that p𝑝p denotes the output of Adsuperscript𝐴𝑑A^{d}). ∎

Theorem 20 complements Theorem 1.

It is possible to simplify equation (21) at the expensive of an additional assumption. Without loss of generality, we may assume that Ξ±q​(n)​(x)=nβˆ’1subscriptπ›Όπ‘žπ‘›π‘₯𝑛1\alpha_{q(n)}(x)=n-1 for all x∈Bπ‘₯𝐡x\in B; this increases the query complexity by a multiplicative factor of O​(1)𝑂1O(1) by equation (13). Therefore, if x∈Bπ‘₯𝐡x\in B or y∈B𝑦𝐡y\in B, then d​(x,y)𝑑π‘₯𝑦d(x,y) will be frozen by equation (12). So the third to fifth cases in equation (21), which satisfies x∈Bπ‘₯𝐡x\in B or y∈B𝑦𝐡y\in B, can be omitted.

References

  • [1] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for kπ‘˜k-median and facility location problems. SIAM Journal on Computing, 33(3):544–562, 2004.
  • [2] C.-L. Chang. Some results on approximate 111-median selection in metric spaces. Theoretical Computer Science, 426:1–12, 2012.
  • [3] C.-L. Chang. Deterministic sublinear-time approximations for metric 111-median selection. Information Processing Letters, 113(8):288–292, 2013.
  • [4] K. Chen. On coresets for kπ‘˜k-median and kπ‘˜k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.
  • [5] D. Eppstein and J. Wang. Fast approximation of centrality. Journal of Graph Algorithms and Applications, 8(1):39–45, 2004.
  • [6] O. Goldreich and D. Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473–493, 2008.
  • [7] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.
  • [8] P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 428–434, 1999.
  • [9] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.
  • [10] R. Jaiswal, A. Kumar, and S. Sen. A simple D2superscript𝐷2D^{2}-sampling based PTAS for kπ‘˜k-means and other clustering problems. In Proceedings of the 18th Annual International Conference on Computing and Combinatorics, pages 13–24, 2012.
  • [11] A. Kumar, Y. Sabharwal, and S. Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.
  • [12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.
  • [13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
  • [14] B.-Y. Wu. On approximating metric 111-median in sublinear time. Information Processing Letters, 114(4):163–166, 2014.