1 Introduction
Given oracle access to a metric space , the metric -median problem asks for a point with the minimum average distance to all points. Indyk [8, 9] shows that metric -median has a Monte-Carlo -time -approximation algorithm with an probability of success. The more general metric -median problem asks for , , , minimizing . Randomized as well as evasive algorithms are well-studied for metric -median and the related -means problem [7, 12, 1, 4, 11, 10], where is part of the input rather than a constant.
This paper focuses on deterministic sublinear-query algorithms for metric -median. Guha et al. [7, Sec. 3.1β3.2] prove that metric -median has a deterministic -time -space -approximation algorithm that reads distances in a single pass, where . Chang [3] presents a deterministic nonadaptive -time -approximation algorithm for metric -median. Wu [14] generalizes Changβs result by showing an -time -approximation algorithm for any integer . On the negative side, Chang [2] shows that metric -median has no deterministic -query -approximation algorithms for any constant [2]. This paper improves upon his result by showing that metric -median has no deterministic -query -approximation algorithms for any constant .
In social network analysis, the importance of an actor in a network may be quantified by several centrality measures, among which the closeness centrality of an actor is defined to be its average distance to other actors [13]. So metric -median can be interpreted as the problem of finding the most important point in a metric space. Goldreich and Ron [6] and Eppstein and Wang [5] present randomized algorithms for approximating the closeness centralities of vertices in undirected graphs.
2 Definitions
For , denote . Trivially, . An -point metric space is the set , called the groundset, endowed with a function satisfying
- (1)
(non-negativeness),
- (2)
if and only if (identity of indiscernibles),
- (3)
(symmetry), and
- (4)
(triangle inequality)
for all , , . An equivalent definition requires the triangle inequality only for distinct , , , axioms (1)β(3) remaining.
An algorithm with oracle access to a metric space is given and may query on any to obtain . Without loss of generality, we forbid queries for , which trivially return , as well as repeated queries, where a query for is considered to repeat that for . For convenience, denote an algorithm ALG with oracle access to by .
Given oracle access to a finite metric space , the metric -median problem asks for a point in with the minimum average distance to all points. An algorithm for this problem is -approximate if it outputs a point satisfying
| | |
where .
The following theorem is due to Chang [3] and generalized by Wu [14].
Theorem 1 ([3, 14]).
Metric -median has a deterministic nonadaptive -time -approximation algorithm.
3 Lower bound
Fix arbitrarily a deterministic -query algorithm for metric -median and a constant . By padding queries, we may assume the existence of a function such that makes exactly queries given oracle access to any metric space with groundset .
We introduce some notations concerning a function to be determined later. For , denote the th query of by ; in other words, the th query of asks for . Note that depends only on , , , because is deterministic and has been fixed. For and ,
| | | | | (1) |
| | | | | (2) |
following Chang [2] with a slight change in notation. Equivalently, is the degree of in the undirected graph with vertex set and edge set . As , for . Note that depends only on , , , . Denote the output of by . By adding at most dummy queries, we may assume without loss of generality that
| | | (3) |
for all . Consequently,
| | | (4) |
Fix any set of size , e.g., .
We proceed to construct by gradually freezing distances. For brevity, freezing the value of implicitly freezes to the same value, where , . Inductively, having answered the first queries of by freezing , , , , where , answer the th query by
| | | | | (12) |
It is not hard to verify that the seven cases in equation (12) are exhaustive and mutually exclusive. We have now frozen for all and none of the other distances. As repeated queries are forbidden, equation (12) does not freeze one distance twice, preventing inconsistency.
Set
| | | | | (13) |
| | | | | (14) |
breaking ties arbitrarily. For all distinct , with , , let
| | | (21) |
Clearly, the six cases in equation (21) are exhaustive and mutually exclusive. Furthermore, equation (21) assigns the same value to and . Finally, for all ,
| | | (22) |
Equations (12), (21) and (22) complete the construction of by freezing all distances.
The following lemma is straightforward.
Lemma 2.
For all distinct , , .
Below is an immediate consequence of equation (14).
Lemma 3.
.
The following lemma is a consequence of equations (1)β(2) and our forbidding repeated queries.
Lemma 4.
For all and ,
| | |
Proof.
The case of is immediate from equations (1)β(2). Suppose that . By symmetry, we may assume . So by equation (1),
| | | (24) |
As is the th query and we forbid repeated queries,
| | | (25) |
by equation (1). Equations (2) and (24)β(25) complete the proof. β
In short, Lemma 4 says that adding the edge to an undirected graph without that edge increases the degree of by if and only if .
Lemma 5.
For all and , if , then .
Proof.
By Lemma 4, . Invoking equation (13) then completes the proof. β
Lemma 6.
| | |
Proof.
Recall that the left-hand side is the sum of degrees in the undirected graph with vertex set and edge set . As we forbid repeated queries, Finally, it is a basic fact in graph theory that the sum of degrees in an undirected graph equals twice the number of edges. β
Lemma 7 (Implicit in [2, Lemma 13]).
.
Proof.
We have
| | |
This gives as is a constant and . β
Lemma 8.
For all sufficiently large and all ,
| | | | | (26) |
Proof.
By Lemma 7, and being a constant, for all sufficiently large . By equation (13), for some , which together with equation (14) gives . Finally, Lemma 4 and imply inequality (26) for all . β
Henceforth, assume to be sufficiently large to satisfy inequality (26) for all .
Lemma 9.
For all , , if , then one of the following conditions is true:
- β’
and ;
- β’
and .
Proof.
Inspect equation (21), which is the only equation that may set distances to . β
Lemma 10.
For all distinct , , .
Proof.
By Lemma 5, means , where . So only the second-to-last case in equation (12), which sets , may be consistent with , .
By Lemma 3, . So only the last case in equation (21), which sets , may be consistent with , . β
Lemma 11.
For all , .
Proof.
By Lemma 3 and inequality (26), only the first three cases in equation (12), which set , may be consistent with or .
Again by Lemma 3, only the first three cases in equation (21), which set , may be consistent with or . β
Lemma 12.
There do not exist distinct , , with and .
Proof.
By Lemma 9, implies . By symmetry, assume . Then by Lemma 11. β
Lemma 13.
There do not exist distinct , , with and .
Proof.
By Lemma 9, implies and , . Then by Lemma 10. β
Lemmas 12β13 forbid all possible violations of the triangle inequality, yielding the following lemma.
Lemma 14.
is a metric space.
Proof.
Lemmas 2 and 12β13 establish the triangle inequality for . Furthermore, is symmetric because (1) freezing automatically freezes to the same value, (2) forbidding repeated queries prevents equation (12) from assigning inconsistent values to one distance and (3) equation (21) is symmetric. All the other axioms for metrics are easy to verify. β
Recall that denotes the output of . We proceed to compare with .
Lemma 15.
There exist , , , and distinct , , , such that
| | | | | (27) |
| | | | | (28) |
| | | | | (29) |
for all .
Proof.
By Lemma 4, equation (4) and the easy fact that , there exist distinct , , , satisfying equations (27)β(28) for all . Lemma 4 and equations (27)β(28) imply , establishing the existence of satisfying equation (29). If , , , are not distinct, then there are repeated queries by equation (29), a contradiction. β
From now on, let , , , and distinct , , , satisfy equations (27)β(29) for all .
Lemma 16.
For each , if and , then .
Proof.
Assume in equation (29) that and ; the other case will be symmetric. By equation (27),
| | | (30) |
- Case 1:
. By equation (12), and ,
| | | (33) |
- Case 2:
. By equation (12), and ,
| | | (36) |
Equation (30) together with any one of equations (33)β(36) implies . Hence . β
We are now able to analyze the quality of as a solution to metric -median.
Lemma 17.
| | |
Proof.
By the distinctness of , , , in Lemma 15,
| | | (37) |
Write . As , , , are distinct,
| | | (38) |
Furthermore,
| | | | | (39) |
| | | | |
| | | | |
| | | | |
Equations (37)β(39) and complete the proof. β
We now analyze the quality of as a solution to metric -median. The following lemma is immediate from equation (21).
Lemma 18.
For all , if and , , then .
Lemma 19.
| | |
Proof.
By equation (1),
| | |
This and Lemma 18 imply for all with and . Therefore,
| | | (40) |
Clearly,
| | | (41) |
Furthermore,
| | |
This and Lemma 7 imply
| | | (42) |
as . To complete the proof, sum up inequalities (40)β(41) and then use inequality (42) in the trivial way. β
Combining Lemmas 14, 17 and 19 yields our main theorem, stated below.
Theorem 20.
Metric -median has no deterministic -query -approximation algorithm for any constant .
Proof.
Lemma 14 asserts that is a metric space. By Lemmas 17 and 19,
| | |
This proves the theorem because the deterministic -query algorithm and the constant are picked arbitrarily (note that denotes the output of ). β
Theorem 20 complements Theorem 1.
It is possible to simplify equation (21) at the expensive of an additional assumption. Without loss of generality, we may assume that for all ; this increases the query complexity by a multiplicative factor of by equation (13). Therefore, if or , then will be frozen by equation (12). So the third to fifth cases in equation (21), which satisfies or , can be omitted.