A lower bound for metric $1$ -median selection ¹¹1A preliminary version of this paper appears in Proceedings of the 30th Workshop on Combinatorial Mathematics and Computation Theory, Hualien, Taiwan, April 2013, pp. 65–68.

Ching-Lueh Chang ²²2Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan. Email: clchang@saturn.yzu.edu.tw ³³3Supported in part by the National Science Council of Taiwan under grant 101-2221-E-155-015-MY2.

Abstract

Consider the problem of finding a point in an $n$ -point metric space with the minimum average distance to all points. We show that this problem has no deterministic $o(n^{2})$ -query $(4-\Omega(1))$ -approximation algorithms.

1 Introduction

Given oracle access to a metric space $(\{1,2,\ldots,n\},d)$ , the metric $1$ -median problem asks for a point with the minimum average distance to all points. Indyk [8, 9] shows that metric $1$ -median has a Monte-Carlo $O(n/\epsilon^{2})$ -time $(1+\epsilon)$ -approximation algorithm with an $\Omega(1)$ probability of success. The more general metric $k$ -median problem asks for $x_{1}$ , $x_{2}$ , $\ldots$ , $x_{k}\in\{1,2,\ldots,n\}$ minimizing $\sum_{x\in\{1,2,\ldots,n\}}\,\min_{i=1}^{k}\,d(x_{i},x)$ . Randomized as well as evasive algorithms are well-studied for metric $k$ -median and the related $k$ -means problem [7, 12, 1, 4, 11, 10], where $k\geq 1$ is part of the input rather than a constant.

This paper focuses on deterministic sublinear-query algorithms for metric $1$ -median. Guha et al. [7, Sec. 3.1–3.2] prove that metric $k$ -median has a deterministic $O(n^{1+\epsilon})$ -time $O(n^{\epsilon})$ -space $2^{O(1/\epsilon)}$ -approximation algorithm that reads distances in a single pass, where $\epsilon>0$ . Chang [3] presents a deterministic nonadaptive $O(n^{1.5})$ -time $4$ -approximation algorithm for metric $1$ -median. Wu [14] generalizes Chang’s result by showing an $O(n^{1+1/h})$ -time $2h$ -approximation algorithm for any integer $h\geq 2$ . On the negative side, Chang [2] shows that metric $1$ -median has no deterministic $o(n^{2})$ -query $(3-\epsilon)$ -approximation algorithms for any constant $\epsilon>0$ [2]. This paper improves upon his result by showing that metric $1$ -median has no deterministic $o(n^{2})$ -query $(4-\epsilon)$ -approximation algorithms for any constant $\epsilon>0$ .

In social network analysis, the importance of an actor in a network may be quantified by several centrality measures, among which the closeness centrality of an actor is defined to be its average distance to other actors [13]. So metric $1$ -median can be interpreted as the problem of finding the most important point in a metric space. Goldreich and Ron [6] and Eppstein and Wang [5] present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.

2 Definitions

For $n\in\mathbb{N}$ , denote $[n]\equiv\{1,2,\ldots,n\}$ . Trivially, $[0]=\emptyset$ . An $n$ -point metric space $([n],d)$ is the set $[n]$ , called the groundset, endowed with a function $d\colon[n]\times[n]\to\mathbb{R}$ satisfying

(1)
$d(x,y)\geq 0$ (non-negativeness),
(2)
$d(x,y)=0$ if and only if $x=y$ (identity of indiscernibles),
(3)
$d(x,y)=d(y,x)$ (symmetry), and
(4)
$d(x,y)+d(x,z)\geq d(y,z)$ (triangle inequality)

for all $x$ , $y$ , $z\in[n]$ . An equivalent definition requires the triangle inequality only for distinct $x$ , $y$ , $z\in[n]$ , axioms (1)–(3) remaining.

An algorithm with oracle access to a metric space $([n],d)$ is given $n$ and may query $d$ on any $(x,y)\in[n]\times[n]$ to obtain $d(x,y)$ . Without loss of generality, we forbid queries for $d(x,x)$ , which trivially return $0$ , as well as repeated queries, where a query for $d(x,y)$ is considered to repeat that for $d(y,x)$ . For convenience, denote an algorithm ALG with oracle access to $([n],d)$ by $\text{ALG}^{d}$ .

Given oracle access to a finite metric space $([n],d)$ , the metric $1$ -median problem asks for a point in $[n]$ with the minimum average distance to all points. An algorithm for this problem is $\alpha$ -approximate if it outputs a point $x\in[n]$ satisfying

\sum_{y\in[n]}\,d\left(x,y\right)\leq\alpha\,\min_{x^{\prime}\in[n]}\,\sum_{y\in[n]}\,d\left(x^{\prime},y\right),

where $\alpha\geq 1$ .

The following theorem is due to Chang [3] and generalized by Wu [14].

Theorem 1 ([3, 14]).

Metric $1$ -median has a deterministic nonadaptive $O(n^{1.5})$ -time $4$ -approximation algorithm.

3 Lower bound

Fix arbitrarily a deterministic $o(n^{2})$ -query algorithm $A$ for metric $1$ -median and a constant $\delta\in(0,0.1)$ . By padding queries, we may assume the existence of a function $q\colon\mathbb{Z}^{+}\to\mathbb{Z}^{+}$ such that $A$ makes exactly $q(n)=o(n^{2})$ queries given oracle access to any metric space with groundset $[n]$ .

We introduce some notations concerning a function $d\colon[n]\times[n]\to\mathbb{R}$ to be determined later. For $i\in[q(n)]$ , denote the $i$ th query of $A^{d}$ by $(x_{i},y_{i})\in[n]\times[n]$ ; in other words, the $i$ th query of $A^{d}$ asks for $d(x_{i},y_{i})$ . Note that $(x_{i},y_{i})$ depends only on $d(x_{1},y_{1})$ , $d(x_{2},y_{2})$ , $\ldots$ , $d(x_{i-1},y_{i-1})$ because $A$ is deterministic and has been fixed. For $x\in[n]$ and $i\in\{0,1,\ldots,q(n)\}$ ,

	$\displaystyle N_{i}(x)$	$\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}$	$\displaystyle\left\{y\in[n]\mid\left\{\left(x,y\right),\left(y,x\right)\right\}\cap\left\{\left(x_{j},y_{j}\right)\mid j\in\left[i\right]\right\}\neq\emptyset\right\},$		(1)
	$\displaystyle\alpha_{i}(x)$	$\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}$	$\displaystyle\left\|\,N_{i}(x)\,\right\|,$		(2)

following Chang [2] with a slight change in notation. Equivalently, $\alpha_{i}(x)$ is the degree of $x$ in the undirected graph with vertex set $[n]$ and edge set $\{(x_{j},y_{j})\mid j\in[i]\}$ . As $[0]=\emptyset$ , $\alpha_{0}(x)=0$ for $x\in[n]$ . Note that $\alpha_{i}(\cdot)$ depends only on $(x_{1},y_{1})$ , $(x_{2},y_{2})$ , $\ldots$ , $(x_{i},y_{i})$ . Denote the output of $A^{d}$ by $p$ . By adding at most $n-1=o(n^{2})$ dummy queries, we may assume without loss of generality that

\displaystyle\left(p,y\right)\in\left\{\left(x_{i},y_{i}\right)\mid i\in\left[q(n)\right]\right\}

(3)

for all $y\in[n]\setminus\{p\}$ . Consequently,

\displaystyle\alpha_{q(n)}(p)=n-1.

(4)

Fix any set $S\subseteq[n]$ of size $\lceil\delta n\rceil$ , e.g., $S=[\lceil\delta n\rceil]$ .

We proceed to construct $d$ by gradually freezing distances. For brevity, freezing the value of $d(x,y)$ implicitly freezes $d(y,x)$ to the same value, where $x$ , $y\in[n]$ . Inductively, having answered the first $i-1$ queries of $A^{d}$ by freezing $d(x_{1},y_{1})$ , $d(x_{2},y_{2})$ , $\ldots$ , $d(x_{i-1},y_{i-1})$ , where $i\in[q(n)]$ , answer the $i$ th query by

\displaystyle d\left(x_{i},y_{i}\right)

\displaystyle=

\displaystyle\left\{\begin{array}[]{ll}3,&\text{if $x_{i}$, $y_{i}\in S$;}\\ 3,&\text{if $x_{i}\in S$, $y_{i}\notin S$ and $\alpha_{i-1}(x_{i})\leq\delta n$;}\\ 3,&\text{if $y_{i}\in S$, $x_{i}\notin S$ and $\alpha_{i-1}(y_{i})\leq\delta n$;}\\ 4,&\text{if $x_{i}\in S$, $y_{i}\notin S$ and $\alpha_{i-1}(x_{i})>\delta n$;}\\ 4,&\text{if $y_{i}\in S$, $x_{i}\notin S$ and $\alpha_{i-1}(y_{i})>\delta n$;}\\ 2,&\text{if $x_{i}$, $y_{i}\notin S$ and $\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}\leq\delta n$;}\\ 4,&\text{if $x_{i}$, $y_{i}\notin S$ and $\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}>\delta n$.}\end{array}\right.

(12)

It is not hard to verify that the seven cases in equation (12) are exhaustive and mutually exclusive. We have now frozen $d(x_{i},y_{i})$ for all $i\in[q(n)]$ and none of the other distances. As repeated queries are forbidden, equation (12) does not freeze one distance twice, preventing inconsistency.

Set

	$\displaystyle B$	$\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}$	$\displaystyle\left\{x\in[n]\mid\alpha_{q(n)}(x)>\delta n\right\},$		(13)
	$\displaystyle\hat{p}$	$\displaystyle\stackrel{{\scriptstyle\text{def.}}}{{=}}$	$\displaystyle\mathop{\rm argmin}_{x\in S}\,\alpha_{q(n)}(x),$		(14)

breaking ties arbitrarily. For all distinct $x$ , $y\in[n]$ with $(x,y)$ , $(y,x)\notin\{(x_{i},y_{i})\mid i\in[q(n)]\}$ , let

\displaystyle d\left(x,y\right)=\left\{\begin{array}[]{ll}1,&\text{if $x=\hat{p}$, $y\notin S\cup B$;}\\ 1,&\text{if $y=\hat{p}$, $x\notin S\cup B$;}\\ 3,&\text{if $x$, $y\in S\cup B$;}\\ 4,&\text{if $x\in(S\cup B)\setminus\{\hat{p}\}$ and $y\notin(S\cup B\cup\{\hat{p}\})$;}\\ 4,&\text{if $y\in(S\cup B)\setminus\{\hat{p}\}$ and $x\notin(S\cup B\cup\{\hat{p}\})$;}\\ 2,&\text{otherwise.}\end{array}\right.

(21)

Clearly, the six cases in equation (21) are exhaustive and mutually exclusive. Furthermore, equation (21) assigns the same value to $d(x,y)$ and $d(y,x)$ . Finally, for all $x\in[n]$ ,

\displaystyle d\left(x,x\right)=0.

(22)

Equations (12), (21) and (22) complete the construction of $d$ by freezing all distances.

The following lemma is straightforward.

Lemma 2.

For all distinct $x$ , $y\in[n]$ , $d(x,y)\in\{1,2,3,4\}$ .

Below is an immediate consequence of equation (14).

Lemma 3.

$\hat{p}\in S$ .

The following lemma is a consequence of equations (1)–(2) and our forbidding repeated queries.

Lemma 4.

For all $x\in[n]$ and $i\in[q(n)]$ ,

\displaystyle\alpha_{i}(x)-\alpha_{i-1}(x)=\left\{\begin{array}[]{ll}0,&\text{if $x\notin\{x_{i},y_{i}\}$;}\\ 1,&\text{otherwise.}\end{array}\right.

Proof.

The case of $x\notin\{x_{i},y_{i}\}$ is immediate from equations (1)–(2). Suppose that $x\in\{x_{i},y_{i}\}$ . By symmetry, we may assume $x=x_{i}$ . So by equation (1),

\displaystyle N_{i}(x)=N_{i-1}(x)\cup\left\{y_{i}\right\}.

(24)

As $(x,y_{i})=(x_{i},y_{i})$ is the $i$ th query and we forbid repeated queries,

\displaystyle y_{i}\notin N_{i-1}(x)

(25)

by equation (1).⁴⁴4In detail, if $y_{i}\in N_{i-1}(x)$ , then $(x_{j},y_{j})\in\{(x,y_{i}),(y_{i},x)\}$ for some $j\in[i-1]$ by equation (1); hence the $i$ th query $(x_{i},y_{i})=(x,y_{i})$ repeats the $j$ th query, a contradiction. Equations (2) and (24)–(25) complete the proof. ∎

In short, Lemma 4 says that adding the edge $(x_{i},y_{i})$ to an undirected graph without that edge increases the degree of $x$ by $1$ if and only if $x\in\{x_{i},y_{i}\}$ .

Lemma 5.

For all $x\in[n]$ and $i\in[q(n)+1]$ , if $\alpha_{i-1}(x)>\delta n$ , then $x\in B$ .

Proof.

By Lemma 4, $\alpha_{q(n)}(x)\geq\alpha_{i-1}(x)$ . Invoking equation (13) then completes the proof. ∎

Lemma 6.

\sum_{x\in[n]}\,\alpha_{q(n)}(x)=2\,q(n).

Proof.

Recall that the left-hand side is the sum of degrees in the undirected graph with vertex set $[n]$ and edge set $\{(x_{i},y_{i})\mid i\in[q(n)]\}$ . As we forbid repeated queries, $\left|\,\{(x_{i},y_{i})\mid i\in[q(n)]\}\,\right|=q(n)$ Finally, it is a basic fact in graph theory that the sum of degrees in an undirected graph equals twice the number of edges. ∎

Lemma 7 (Implicit in [2, Lemma 13]).

$|B|=o(n)$ .

Proof.

We have

|B|\,\delta n\stackrel{{\scriptstyle\text{equation~{}(\ref{badpoints})}}}{{\leq}}\sum_{x\in B}\alpha_{q(n)}(x)\leq\sum_{x\in[n]}\alpha_{q(n)}(x)\stackrel{{\scriptstyle\text{Lemma~{}\ref{sumofdegrees}}}}{{=}}2\,q(n).

This gives $|B|=o(n)$ as $\delta\in(0,0.1)$ is a constant and $q(n)=o(n^{2})$ . ∎

Lemma 8.

For all sufficiently large $n$ and all $i\in[q(n)+1]$ ,

\displaystyle\alpha_{i-1}\left(\hat{p}\right)

\displaystyle\leq

\displaystyle\delta n.

(26)

Proof.

By Lemma 7, $|S|=\lceil\delta n\rceil$ and $\delta\in(0,0.1)$ being a constant, $S\setminus B\neq\emptyset$ for all sufficiently large $n$ . By equation (13), $S\setminus B\neq\emptyset$ $\alpha_{q(n)}(x)\leq\delta n$ for some $x\in S$ , which together with equation (14) gives $\alpha_{q(n)}(\hat{p})\leq\delta n$ . Finally, Lemma 4 and $\alpha_{q(n)}(\hat{p})\leq\delta n$ imply inequality (26) for all $i\in[q(n)+1]$ . ∎

Henceforth, assume $n$ to be sufficiently large to satisfy inequality (26) for all $i\in[q(n)+1]$ .

Lemma 9.

For all $x$ , $y\in[n]$ , if $d(x,y)=1$ , then one of the following conditions is true:

•
$x=\hat{p}$ and $y\notin S\cup B$ ;
•
$y=\hat{p}$ and $x\notin S\cup B$ .

Proof.

Inspect equation (21), which is the only equation that may set distances to $1$ . ∎

Lemma 10.

For all distinct $x$ , $y\in[n]\setminus(S\cup B)$ , $d\left(x,y\right)=2$ .

Proof.

By Lemma 5, $\max\{\alpha_{i-1}(x_{i}),\alpha_{i-1}(y_{i})\}>\delta n$ means $\{x_{i},y_{i}\}\cap B\neq\emptyset$ , where $i\in[q(n)+1]$ . So only the second-to-last case in equation (12), which sets $d(x_{i},y_{i})=2$ , may be consistent with $x_{i}$ , $y_{i}\notin S\cup B$ .

By Lemma 3, $\hat{p}\in S$ . So only the last case in equation (21), which sets $d(x,y)=2$ , may be consistent with $x$ , $y\notin S\cup B$ . ∎

Lemma 11.

For all $x\in[n]\setminus\{\hat{p}\}$ , $d(\hat{p},x)\in\{1,3\}$ .

Proof.

By Lemma 3 and inequality (26), only the first three cases in equation (12), which set $d(x_{i},y_{i})=3$ , may be consistent with $x_{i}=\hat{p}$ or $y_{i}=\hat{p}$ .

Again by Lemma 3, only the first three cases in equation (21), which set $d(x,y)\in\{1,3\}$ , may be consistent with $x=\hat{p}$ or $y=\hat{p}$ . ∎

Lemma 12.

There do not exist distinct $x$ , $y$ , $z\in[n]$ with $d(x,y)=1$ and $\{d(x,z),d(y,z)\}=\{2,4\}$ .

Proof.

By Lemma 9, $d(x,y)=1$ implies $\hat{p}\in\{x,y\}$ . By symmetry, assume $x=\hat{p}$ . Then $d(x,z)\in\{1,3\}$ by Lemma 11. ∎

Lemma 13.

There do not exist distinct $x$ , $y$ , $z\in[n]$ with $d(x,y)=d(x,z)=1$ and $d(y,z)\in\{3,4\}$ .

Proof.

By Lemma 9, $d(x,y)=d(x,z)=1$ implies $x=\hat{p}$ and $y$ , $z\notin S\cup B$ . Then $d(y,z)=2$ by Lemma 10. ∎

Lemmas 12–13 forbid all possible violations of the triangle inequality, yielding the following lemma.

Lemma 14.

$([n],d)$ is a metric space.

Proof.

Lemmas 2 and 12–13 establish the triangle inequality for $d$ . Furthermore, $d$ is symmetric because (1) freezing $d(x,y)$ automatically freezes $d(y,x)$ to the same value, (2) forbidding repeated queries prevents equation (12) from assigning inconsistent values to one distance and (3) equation (21) is symmetric. All the other axioms for metrics are easy to verify. ∎

Recall that $p$ denotes the output of $A^{d}$ . We proceed to compare $\sum_{x\in[n]}\,d(p,x)$ with $\sum_{x\in[n]}\,d(\hat{p},x)$ .

Lemma 15.

There exist $k(1)$ , $k(2)$ , $\ldots$ , $k(n-1)\in[q(n)]$ and distinct $z_{k(1)}$ , $z_{k(2)}$ , $\ldots$ , $z_{k(n-1)}\in[n]$ such that

$\displaystyle\alpha_{k(t)-1}(p)$	$\displaystyle=$	$\displaystyle t-1,$	(27)
$\displaystyle\ \alpha_{k(t)}(p)$	$\displaystyle=$	$\displaystyle t,$	(28)
$\displaystyle\left(p,z_{k(t)}\right)$	$\displaystyle\in$	$\displaystyle\left\{\left(x_{k(t)},y_{k(t)}\right),\left(y_{k(t)},x_{k(t)}\right)\right\}$	(29)

for all $t\in[n-1]$ .

Proof.

By Lemma 4, equation (4) and the easy fact that $\alpha_{0}(p)=0$ , there exist distinct $k(1)$ , $k(2)$ , $\ldots$ , $k(n-1)\in[q(n)]$ satisfying equations (27)–(28) for all $t\in[n-1]$ .⁵⁵5Observe that $\alpha_{i}(p)$ must go through all of $0$ , $1$ , $\ldots$ , $n-1$ as $i$ increases from $0$ to $q(n)$ . Lemma 4 and equations (27)–(28) imply $p\in\{x_{k(t)},y_{k(t)}\}$ , establishing the existence of $z_{k(t)}$ satisfying equation (29). If $z_{k(1)}$ , $z_{k(2)}$ , $\ldots$ , $z_{k(n-1)}$ are not distinct, then there are repeated queries by equation (29), a contradiction. ∎

From now on, let $k(1)$ , $k(2)$ , $\ldots$ , $k(n-1)\in[q(n)]$ and distinct $z_{k(1)}$ , $z_{k(2)}$ , $\ldots$ , $z_{k(n-1)}\in[n]$ satisfy equations (27)–(29) for all $t\in[n-1]$ .

Lemma 16.

For each $t\in[n-1]$ , if $t\geq\lceil\delta n\rceil+2$ and $z_{k(t)}\notin S$ , then $d(p,z_{k(t)})=4$ .

Proof.

Assume in equation (29) that $p=x_{k(t)}$ and $z_{k(t)}=y_{k(t)}$ ; the other case will be symmetric. By equation (27),

\displaystyle\alpha_{k(t)-1}\left(x_{k(t)}\right)=t-1>\delta n.

(30)

Case 1:

$x_{k(t)}\in S$ . By equation (12), $x_{k(t)}\in S$ and $y_{k(t)}=z_{k(t)}\notin S$ ,

\displaystyle d\left(x_{k(t)},y_{k(t)}\right)=\left\{\begin{array}[]{ll}3,&\text{if $\alpha_{k(t)-1}(x_{k(t)})\leq\delta n$;}\\ 4,&\text{if $\alpha_{k(t)-1}(x_{k(t)})>\delta n$.}\end{array}\right.

(33)

Case 2:

$x_{k(t)}\notin S$ . By equation (12), $x_{k(t)}\notin S$ and $y_{k(t)}=z_{k(t)}\notin S$ ,

\displaystyle d\left(x_{k(t)},y_{k(t)}\right)=\left\{\begin{array}[]{ll}2,&\text{if $\max\{\alpha_{k(t)-1}(x_{k(t)}),\alpha_{k(t)-1}(y_{k(t)})\}\leq\delta n$;}\\ 4,&\text{if $\max\{\alpha_{k(t)-1}(x_{k(t)}),\alpha_{k(t)-1}(y_{k(t)})\}>\delta n$.}\end{array}\right.

(36)

Equation (30) together with any one of equations (33)–(36) implies $d(x_{k(t)},y_{k(t)})=4$ . Hence $d(p,z_{k(t)})=d(x_{k(t)},y_{k(t)})=4$ . ∎

We are now able to analyze the quality of $p$ as a solution to metric $1$ -median.

Lemma 17.

\sum_{x\in[n]}\,d\left(p,x\right)\geq 4\left(n-2\left\lceil\delta n\right\rceil-2\right).

Proof.

By the distinctness of $z_{k(1)}$ , $z_{k(2)}$ , $\ldots$ , $z_{k(n-1)}$ in Lemma 15,

\displaystyle\sum_{x\in[n]}\,d\left(p,x\right)\geq\sum_{t\in[n-1]}\,d\left(p,z_{k(t)}\right).

(37)

Write $A=\{t\in[n-1]\mid z_{k(t)}\in S\}$ . As $z_{k(1)}$ , $z_{k(2)}$ , $\ldots$ , $z_{k(n-1)}$ are distinct,

\displaystyle|A|\leq|S|.

(38)

Furthermore,

	$\displaystyle\sum_{t\in[n-1]}\,d\left(p,z_{k(t)}\right)$	(39)
$\displaystyle\geq$	$\displaystyle\sum_{t\in[n-1],\,t\geq\lceil\delta n\rceil+2,\,t\notin A}\,d\left(p,z_{k(t)}\right)$
$\displaystyle\stackrel{{\scriptstyle\text{Lemma~{}\ref{algorithmoutputtypicaldistances}}}}{{=}}$	$\displaystyle\sum_{t\in[n-1],\,t\geq\lceil\delta n\rceil+2,\,t\notin A}\,4$
$\displaystyle\geq$	$\displaystyle 4\left(n-\left\lceil\delta n\right\rceil-2-\|A\|\right).$

Equations (37)–(39) and $|S|=\lceil\delta n\rceil$ complete the proof. ∎

We now analyze the quality of $\hat{p}$ as a solution to metric $1$ -median. The following lemma is immediate from equation (21).

Lemma 18.

For all $y\in[n]\setminus(S\cup B)$ , if $y\neq\hat{p}$ and $(\hat{p},y)$ , $(y,\hat{p})\notin\{(x_{j},y_{j})\mid j\in[q(n)]\}$ , then $d(\hat{p},y)=1$ .

Lemma 19.

\sum_{y\in[n]}\,d\left(\hat{p},y\right)\leq n+3\cdot\left(\left\lceil\delta n\right\rceil+o(n)+\delta n\right).

Proof.

By equation (1),

N_{q(n)}\left(\hat{p}\right)=\left\{y\in[n]\mid\left\{\left(\hat{p},y\right),\left(y,\hat{p}\right)\right\}\cap\left\{\left(x_{j},y_{j}\right)\mid j\in\left[q(n)\right]\right\}\neq\emptyset\right\}.

This and Lemma 18 imply $d(\hat{p},y)=1$ for all $y\in[n]\setminus(S\cup B)$ with $y\neq\hat{p}$ and $y\notin N_{q(n)}(\hat{p})$ . Therefore,

\displaystyle\sum_{y\in[n]\setminus(S\cup B\cup N_{q(n)}(\hat{p}))}\,d\left(\hat{p},y\right)\leq n-\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|.

(40)

Clearly,

\displaystyle\sum_{y\in S\cup B\cup N_{q(n)}(\hat{p})}\,d\left(\hat{p},y\right)\stackrel{{\scriptstyle\text{Lemma~{}\ref{distancesarezeroto4}}}}{{\leq}}\sum_{y\in S\cup B\cup N_{q(n)}(\hat{p})}\,4=4\cdot\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|

(41)

Furthermore,

\displaystyle\left|\,N_{q(n)}\left(\hat{p}\right)\,\right|\stackrel{{\scriptstyle\text{equation~{}(\ref{numberoffrozenincidentdistances})}}}{{=}}\alpha_{q(n)}\left(\hat{p}\right)\stackrel{{\scriptstyle\text{inequality~{}(\ref{sparselyaskedpointequation})}}}{{\leq}}\delta n.

This and Lemma 7 imply

\displaystyle\left|\,S\cup B\cup N_{q(n)}\left(\hat{p}\right)\,\right|\leq\left\lceil\delta n\right\rceil+o(n)+\delta n

(42)

as $|S|=\lceil\delta n\rceil$ . To complete the proof, sum up inequalities (40)–(41) and then use inequality (42) in the trivial way. ∎

Combining Lemmas 14, 17 and 19 yields our main theorem, stated below.

Theorem 20.

Metric $1$ -median has no deterministic $o(n^{2})$ -query $(4-\epsilon)$ -approximation algorithm for any constant $\epsilon>0$ .

Proof.

Lemma 14 asserts that $([n],d)$ is a metric space. By Lemmas 17 and 19,

\sum_{x\in[n]}\,d\left(p,x\right)\geq 4\left(1-8\delta-o(1)\right)\sum_{x\in[n]}\,d\left(\hat{p},x\right).

This proves the theorem because the deterministic $o(n^{2})$ -query algorithm $A$ and the constant $\delta\in(0,0.1)$ are picked arbitrarily (note that $p$ denotes the output of $A^{d}$ ). ∎

Theorem 20 complements Theorem 1.

It is possible to simplify equation (21) at the expensive of an additional assumption. Without loss of generality, we may assume that $\alpha_{q(n)}(x)=n-1$ for all $x\in B$ ; this increases the query complexity by a multiplicative factor of $O(1)$ by equation (13). Therefore, if $x\in B$ or $y\in B$ , then $d(x,y)$ will be frozen by equation (12). So the third to fifth cases in equation (21), which satisfies $x\in B$ or $y\in B$ , can be omitted.

References

[1] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for $k$ -median and facility location problems. SIAM Journal on Computing, 33(3):544–562, 2004.
[2] C.-L. Chang. Some results on approximate $1$ -median selection in metric spaces. Theoretical Computer Science, 426:1–12, 2012.
[3] C.-L. Chang. Deterministic sublinear-time approximations for metric $1$ -median selection. Information Processing Letters, 113(8):288–292, 2013.
[4] K. Chen. On coresets for $k$ -median and $k$ -means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923–947, 2009.
[5] D. Eppstein and J. Wang. Fast approximation of centrality. Journal of Graph Algorithms and Applications, 8(1):39–45, 2004.
[6] O. Goldreich and D. Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473–493, 2008.
[7] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3):515–528, 2003.
[8] P. Indyk. Sublinear time algorithms for metric space problems. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 428–434, 1999.
[9] P. Indyk. High-dimensional computational geometry. PhD thesis, Stanford University, 2000.
[10] R. Jaiswal, A. Kumar, and S. Sen. A simple $D^{2}$ -sampling based PTAS for $k$ -means and other clustering problems. In Proceedings of the 18th Annual International Conference on Computing and Combinatorics, pages 13–24, 2012.
[11] A. Kumar, Y. Sabharwal, and S. Sen. Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2):5, 2010.
[12] R. R. Mettu and C. G. Plaxton. Optimal time bounds for approximate clustering. Machine Learning, 56(1–3):35–60, 2004.
[13] S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[14] B.-Y. Wu. On approximating metric $1$ -median in sublinear time. Information Processing Letters, 114(4):163–166, 2014.

A lower bound for metric 111-median selection 111A preliminary version of this paper appears in Proceedings of the 30th Workshop on Combinatorial Mathematics and Computation Theory, Hualien, Taiwan, April 2013, pp. 65–68.

Abstract

1 Introduction

2 Definitions

Theorem 1 ([3, 14]).

3 Lower bound

Lemma 2.

Lemma 3.

Lemma 4.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7 (Implicit in [2, Lemma 13]).

Proof.

Lemma 8.

Proof.

Lemma 9.

Proof.

Lemma 10.

Proof.

Lemma 11.

Proof.

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 14.

Proof.

Lemma 15.

Proof.

Lemma 16.

Proof.

Lemma 17.

Proof.

Lemma 18.

Lemma 19.

Proof.

Theorem 20.

Proof.

References

A lower bound for metric $1$ -median selection ¹¹1A preliminary version of this paper appears in Proceedings of the 30th Workshop on Combinatorial Mathematics and Computation Theory, Hualien, Taiwan, April 2013, pp. 65–68.