## Abstract

In the last two decades, a steady stream of research has been devoted to studying various computational aspects of the ordered *k*-median problem, which subsumes traditional facility location problems (such as median, center, *p*-centrum, etc.) through a unified modeling approach. Given a finite metric space, the objective is to locate *k* facilities in order to minimize the ordered median cost function. In its general form, this function penalizes the coverage distance of each vertex by a multiplicative weight, depending on its ranking (or percentile) in the ordered list of all coverage distances. While antecedent literature has focused on mathematical properties of ordered median functions, integer programming methods, various heuristics, and special cases, this problem was not studied thus far through the lens of approximation algorithms. In particular, even on simple network topologies, such as trees or line graphs, obtaining non-trivial approximation guarantees is an open question. The main contribution of this paper is to devise the first provably-good approximation algorithms for the ordered *k*-median problem. We develop a novel approach that relies primarily on a surrogate model, where the ordered median function is replaced by a simplified ranking-invariant functional form, via efficient enumeration. Surprisingly, while this surrogate model is \(\varOmega ( n^{ \varOmega (1) } )\)-hard to approximate on general metrics, we obtain an \(O(\log n)\)-approximation for our original problem by employing local search methods on a smooth variant of the surrogate function. In addition, an improved guarantee of \(2+\epsilon \) is obtained on tree metrics by optimally solving the surrogate model through dynamic programming. Finally, we show that the latter optimality gap is tight up to an \(O(\epsilon )\) term.

This is a preview of subscription content, access via your institution.

## References

- 1.
Abraham, I., Bartal, Y., Neiman, O.: Nearly tight low stretch spanning trees. In: Proceedings of the 49th IEEE Annual Symposium on Foundations of Computer Science, pp. 781–790 (2008)

- 2.
Alon, N., Karp, R.M., Peleg, D., West, D.: A graph-theoretic game and its application to the \(k\)-server problem. SIAM J. Comput.

**24**(1), 78–100 (1995) - 3.
Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., Pandit, V.: Local search heuristics for \(k\)-median and facility location problems. SIAM J. Comput.

**33**(3), 544–562 (2004) - 4.
Bartal, Y.: Probabilistic approximations of metric spaces and its algorithmic applications. In: Proceedings of the 37th IEEE Annual Symposium on Foundations of Computer Science, pp. 184–193 (1996)

- 5.
Bartal, Y.: On approximating arbitrary metrics by tree metrics. In: Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, pp. 161–168 (1998)

- 6.
Bertsimas, D., Mazumder, R.: Least quantile regression via modern optimization. Ann. Stat.

**42**(6), 2494–2525 (2014) - 7.
Bertsimas, D., Sim, M.: Robust discrete optimization and network flows. Math. Program.

**98**(1–3), 49–71 (2003) - 8.
Bertsimas, D., Weismantel, R.: Optimization Over Integers. Dynamic Ideas, Belmont (2005)

- 9.
Blanco, V., Puerto, J., Salmerón, R.: A general framework for locating hyperplanes to fitting set of points. (2015). arXiv preprint arXiv:1505.03451

- 10.
Boland, N., Domínguez-Marín, P., Nickel, S., Puerto, J.: Exact procedures for solving the discrete ordered median problem. Comput. Oper. Res.

**33**(11), 3270–3300 (2006) - 11.
Bradley, P.S., Fayyad, U.M., Mangasarian, O.L.: Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput.

**11**(3), 217–238 (1999) - 12.
Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for \(k\)-median and positive correlation in budgeted optimization. ACM Trans. Algorithms

**13**(2), 23:1–23:31 (2017) - 13.
Byrka, J., Sornat, K., Spoerhase, J.: Constant-factor approximation for ordered \(k\)-median, 2017. In:Proceedings of the 50th Annual ACM Symposium on the Theory of Computing (to appear). Available as arXiv preprint arXiv:1711.01972

- 14.
Chakrabarty, D., Swamy, C.: Interpolating between \(k\)-median and \(k\)-center: Approximation algorithms for ordered \(k\)-median. (2017). arXiv preprint arXiv:1711.08715

- 15.
Charikar, M., Guha, S.: Improved combinatorial algorithms for facility location problems. SIAM J. Comput.

**34**(4), 803–824 (2005) - 16.
Charikar, M., Guha, S., Tardos, É., Shmoys, D. B.: A constant-factor approximation algorithm for the \(k\)-median problem. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pp. 1–10 (1999)

- 17.
Domínguez-Marín, P., Nickel, S., Hansen, P., Mladenović, N.: Heuristic procedures for solving the discrete ordered median problem. Ann. Oper. Res.

**136**(1), 145–173 (2005) - 18.
Drezner, Z., Hamacher, H.W.: Facility Location: Applications and Theory. Springer Science & Business Media, Berlin (2004)

- 19.
Drezner, Z., Nickel, S.: Constructing a DC decomposition for ordered median problems. J. Global Optim.

**45**(2), 187–201 (2009) - 20.
Drezner, Z., Nickel, S.: Solving the ordered one-median problem in the plane. Eur. J. Oper. Res.

**195**(1), 46–61 (2009) - 21.
Elkin, M., Emek, Y., Spielman, D.A., Teng, S.-H.: Lower-stretch spanning trees. SIAM J. Comput.

**38**(2), 608–628 (2008) - 22.
Espejo, I., Rodríguez-Chía, A.M., Valero, C.: Convex ordered median problem with \(\ell _p\)-norms. Comput. Oper. Res.

**36**(7), 2250–2262 (2009) - 23.
Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci.

**69**(3), 485–497 (2004) - 24.
Feige, U.: A threshold of \(\ln n\) for approximating set cover. J. ACM

**45**(4), 634–652 (1998) - 25.
Fernández, E., Pozo, M.A., Puerto, J., Scozzari, A.: Ordered weighted average optimization in multiobjective spanning tree problem. Eur. J. Oper. Res.

**260**(3), 886–903 (2017) - 26.
Gassner, E.: An inverse approach to convex ordered median problems in trees. J. Combin. Optim.

**23**(2), 261–273 (2012) - 27.
Gupta, A., Tangwongsan, K.: Simpler analyses of local search algorithms for facility location. arXiv preprint arXiv:0809.2554 (2008)

- 28.
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the \(k\)-center problem. Math. Oper. Res.

**10**(2), 180–184 (1985) - 29.
Hsu, W.-L., Nemhauser, G.L.: Easy and hard bottleneck location problems. Discrete Appl. Math.

**1**(3), 209–215 (1979) - 30.
Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Proceedings on the 34th Annual ACM Symposium on Theory of Computing, pp. 731–740 (2002)

- 31.
Jain, K., Vazirani, V.V.: Approximation algorithms for metric facility location and \(k\)-median problems using the primal-dual schema and lagrangian relaxation. J. ACM

**48**(2), 274–296 (2001) - 32.
Kalcsics, J., Nickel, S., Puerto, J.: Multifacility ordered median problems on networks: a further analysis. Networks

**41**(1), 1–12 (2003) - 33.
Kalcsics, J., Nickel, S., Puerto, J., Tamir, A.: Algorithmic results for ordered median problems. Oper. Res. Lett.

**30**(3), 149–158 (2002) - 34.
Korupolu, M.R., Plaxton, C.G., Rajaraman, R.: Analysis of a local search heuristic for facility location problems. J. Algorithms

**37**(1), 146–188 (2000) - 35.
Laporte, G., Nickel, S., da Gama, F.S.: Location Science. Springer, Berlin (2015)

- 36.
Lei, T.L., Church, R.L.: Vector assignment ordered median problem: a unified median problem. Int. Region. Sci. Rev.

**37**(2), 194–224 (2014) - 37.
Li, S., Svensson, O.: Approximating \(k\)-median via pseudo-approximation. SIAM J. Comput.

**45**(2), 530–547 (2016) - 38.
Liberty, E., Sviridenko, M.: Greedy minimization of weakly supermodular set functions. In: Proceedings of the 20th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, pp. 19:1–19:11 (2017)

- 39.
Lin, J.-H., Vitter, J.S.: Approximation algorithms for geometric median problems. Inf. Process. Lett.

**44**(5), 245–249 (1992) - 40.
Lin, J.-H., Vitter, J.S.: e-approximations with minimum packing constraint violation. In: Proceedings of the 24th Annual ACM Symposium on Theory of Computing, pp. 771–782 (1992)

- 41.
Lozano, A.J., Plastria, F.: The ordered median Euclidean straight-line location problem. Stud. Locat. Anal.

**17**, 29–43 (2009) - 42.
Mendel, M., Naor, A.: Maximum gradient embeddings and monotone clustering. Combinatorica

**30**(5), 581–615 (2010) - 43.
Mirchandani, P .B., Francis, R .L.: Discrete Location Theory. Wiley, New York (1990)

- 44.
Nickel, S.: Discrete ordered weber problems. In: Operations Research Proceedings: Selected Papers of the Symposium on Operations Research (OR 2000), pp. 71–76 (2001)

- 45.
Nickel, S., Puerto, J.: A unified approach to network location problems. Networks

**34**(4), 283–290 (1999) - 46.
Nickel, S., Puerto, J.: Location Theory: A Unified Approach. Springer Science & Business Media, Berlin (2005)

- 47.
Plesník, J.: On the computational complexity of centers locating in a graph. Aplikace matematiky

**25**(6), 445–452 (1980) - 48.
Puerto, J., Fernández, F.R.: Geometrical properties of the symmetrical single facility location problem. J. Nonlinear Convex Anal.

**1**(3), 321–342 (2000) - 49.
Puerto, J., Pérez-Brito, D., García-González, C.G.: A modified variable neighborhood search for the discrete ordered median problem. Eur. J. Oper. Res.

**234**(1), 61–76 (2014) - 50.
Puerto, J., Ramos, A.B., Rodríguez-Chía, A.M.: Single-allocation ordered median hub location problems. Comput. Oper. Res.

**38**(2), 559–570 (2011) - 51.
Puerto, J., Ramos, A.B., Rodríguez-Chía, A.M.: A specialized branch & bound & cut for single-allocation ordered median hub location problems. Discrete Appl. Math.

**161**(16), 2624–2646 (2013) - 52.
Puerto, J., Rodríguez-Chía, A.M.: On the exponential cardinality of FDS for the ordered \(p\)-median problem. Oper. Res. Lett.

**33**(6), 641–651 (2005) - 53.
Puerto, J., Rodríguez-Chía, A.M., Tamir, A.: Minimax regret single-facility ordered median location problems on networks. INFORMS J. Comput.

**21**(1), 77–87 (2009) - 54.
Puerto, J., Rodríguez-Chía, A.M., Tamir, A.: Revisiting \(k\)-sum optimization. Math. Program.

**165**(2), 579–604 (2017) - 55.
Puerto, J., Tamir, A.: Locating tree-shaped facilities using the ordered median objective. Math. Program.

**102**(2), 313–338 (2005) - 56.
Rodríguez-Chía, A.M., Puerto, J., Pérez-Brito, D., Moreno, J.A.: The \(p\)-facility ordered median problem on networks. Top

**13**(1), 105–126 (2005) - 57.
Stanimirović, Z., Kratica, J., Dugošija, D.: Genetic algorithms for solving the discrete ordered median problem. Eur. J. Oper. Res.

**182**(3), 983–1001 (2007) - 58.
Tamir, A.: The \(k\)-centrum multi-facility location problem. Discrete Appl. Math.

**109**(3), 293–307 (2001) - 59.
Tamir, A., Pérez-Brito, D., Moreno-Pérez, J.A.: A polynomial algorithm for the \(p\)-centdian problem on a tree. Networks

**32**(4), 255–262 (1998)

## Acknowledgements

We would like to thank Mathematical Programming’s review team for a host of technical and editorial comments, that have assisted us in improving the presentation of our results. In particular, we are grateful to an anonymous reviewer, who proposed a direct and elegant way to derive Theorem 3, substituting a more involved proof that appeared in an early version of this paper.

## Author information

### Affiliations

### Corresponding author

## Appendices

### Inapproximability of the surrogate problem

In what follows, we prove that the surrogate model is strongly inapproximable. Here, we are given a finite metric space (*V*, *d*) on *n* vertices, a non-decreasing left-continuous step-function \(\lambda : [0,\infty ) \rightarrow \mathbb {R}^+\), and an integer parameter *k*. For any set of facilities \({{\mathcal {F}}} \subseteq V\), the distance \(d(V,{{\mathcal {F}}})\) of each vertex \(v \in V\) to its nearest facility is penalized by a multiplicative weight of \(\lambda (d(v, {{\mathcal {F}}}))\). The objective is to compute a set \({{\mathcal {F}}} \subseteq V\) of at most *k* facilities that minimizes the surrogate function \(\xi ({{\mathcal {F}}}) = \sum _{v \in V} \lambda (d(v,{{\mathcal {F}}})) \cdot d(v,{{\mathcal {F}}})\).

### Theorem 5

Unless \(\mathrm {NP} \subseteq \mathrm {TIME}(n^{O(\log \log n)})\), there are constants \({\delta \in (0,1)}\) and \(C_{\mathrm {sg}} > 0\) such that the surrogate model cannot be approximated in polynomial time within factor \(C_{\mathrm {sg}} n^{\delta } \ln n\).

### Proof

To establish the claim, we describe a gap-preserving reduction from the dominating set problem. Given an undirected graph \(G = (V,E)\), a subset of vertices \(D \subseteq V\) is a dominating set if every vertex not in *D* is adjacent to at least one member of *D*. The objective is to compute a minimum-cardinality dominating set. We utilize a well-known inapproximability result of Feige [24], stating that the set cover problem cannot be efficiently approximated within factor \((1-\epsilon )\cdot \ln n\) for any \(\epsilon \in (0,1)\), unless \(\mathrm {NP} \subseteq \mathrm {TIME}(n^{O(\log \log n)})\). When this result is translated to dominating set terms, including the precise parameters involved in Feige’s construction, one can infer that there exists a constant \(\delta \in (0,1)\) such that it is hard to distinguish between graphs with \(\gamma (G) \le n^{\delta }\) and those with \(\gamma (G) \ge n^{\delta }\cdot (1-\epsilon )\cdot \ln n\) under the same complexity assumption, where \(\gamma (G)\) stands for the minimum cardinality of a dominating set in *G*.

Now, given an instance of the dominating set problem, consisting of a graph \(G = (V,E)\) on *n* vertices, we define a corresponding instance of the surrogate model as follows:

The underlying metric (

*V*,*d*) on the same set of vertices is obtained by defining the distance function:$$\begin{aligned} d(u,v) = \left\{ \begin{array}{ll} 1, &{}\quad \text {if } (u,v)\in E\\ 2, &{}\quad \text {if } (u,v) \notin E. \end{array} \right. \end{aligned}$$The penalty function \(\lambda :[0,\infty ) \rightarrow \mathbb {R}^+\) is given by

$$\begin{aligned} \lambda (d) = \left\{ \begin{array}{ll} 1,&{} \quad \text {if } d\in [0,1.5)\\ n,&{} \quad \text {if } d\in [1.5,\infty ). \end{array} \right. \end{aligned}$$The number of facilities to be located is at most \(k = n^{\delta }\).

We first argue that \(\gamma (G) \le n^{\delta }\) implies the existence of a feasible facility set \({{\mathcal {F}}}\) with \(\xi ( {{\mathcal {F}}} ) \le n\). Indeed, by picking \({{\mathcal {F}}}\) as a minimum-cardinality dominating set, every remaining vertex is within distance 1 of its nearest facility in \({{\mathcal {F}}}\), and therefore, \(\xi ({{\mathcal {F}}}) = n - |{{\mathcal {F}}}| \le n\). Conversely, when \(\gamma (G) \ge n^{\delta }\cdot (1-\epsilon )\cdot \ln n\), any set of at most \(n^{\delta }\) facilities leaves at least \(n^{\delta } \cdot (1-\epsilon )\cdot \ln n - n^{\delta } - 1\) vertices whose distance to their nearest facility is 2. Otherwise, by adding these vertices to the chosen set of facilities, we would have obtained a dominating set of size smaller than \(n^{\delta } \cdot (1-\epsilon )\cdot \ln n\). As a result, for the optimal set of facilities \({{\mathcal {F}}}^*\), we must have \(\xi ({{\mathcal {F}}}^*) \ge (n^{\delta }\cdot (1-\epsilon )\cdot \ln n - n^{\delta } - 1)\cdot n\).

To summarize, it follows that unless \(\mathrm {NP} \subseteq \mathrm {TIME}(n^{O(\log \log n)})\), the surrogate model cannot be approximated in polynomial time within factor \(n^{\delta }\cdot (1-\epsilon )\cdot \ln n - n^{\delta } - 1 \ge \frac{1-\epsilon }{2}\cdot n^{\delta } \ln n\), for sufficiently large *n*. \(\square \)

It is worth mentioning that our reduction creates instances of the surrogate model where the ratio between the maximum and minimum values of \(\lambda _{\mathrm {sg}}\) is *O*(*n*), similar to the instances created by the algorithm we present in Sect. 3.1.

### Tight example for tree networks

In what follows, we show that the analysis conducted for proving Theorem 3 is essentially tight when the underlying metric is induced by tree, and the objective is to minimize the surrogate cost \(\xi \). Specifically, given an accuracy parameter \(\epsilon \in (0,1/4)\), we construct a family of instances, indexed by an integer parameter \(n \ge 25\), that matches our approximation bound up to lower-order terms, i.e., \(\lim _{n\rightarrow \infty }\psi ( {{\mathcal {F}}}_{ \mathrm {sg}}(n) )/\psi ( {{\mathcal {F}}}^*(n) ) = 2 - O(\epsilon )\). Here, \({{\mathcal {F}}}^*(n)\) is an optimal set of facilities for the ordered *k*-median problem, and \({{\mathcal {F}}}_{ \mathrm {sg}}(n)\) is an optimal surrogate solution.

We begin by constructing in Sect. B.1 a family of worst-possible trees for the original cost function \(\psi \). Next, we show in Sect. B.2 that \(\psi ( {{\mathcal {F}}}^*(n) )= (1 + o(1))\cdot n^2\), and prove in Sect. B.3 that \(\psi ( {{\mathcal {F}}}_{ \mathrm {sg}}(n) ) = (2-2\epsilon + o(1))\cdot n^2\).

### Instance construction

*Graph description* The tree \({{\mathcal {T}}}(n)\) consists of three components, shown in Fig. 3:

*Core vertex.*We first introduce the core vertex*C*, which is connected to an auxiliary vertex*A*by an edge with distance \(d(C,A) = 1 - \epsilon \).*Stars.*The core vertex*C*is connected to*n*distinct stars, indexed by \(i \in [n]\). Each star is formed by a center \(c_i\), connected to*n*vertices \(m^i_1, \ldots , m^i_n\), to which we refer as nearby neighbors. Here, \(d(c_i,C) = \epsilon \), and \(d(c_i,m^i_j) = 1-\epsilon \) for every neighbor index \(j \in [n]\).*Remote vertices.*Finally, each center \(c_i\) is connected to a (distinct) remote vertex \(R_i\) by an edge with \(d(c_i, R_i) = (1-\epsilon )\cdot n\).

It is easy to verify that the tree \({{\mathcal {T}}}(n)\) consists of \(n^2 + 2n + 2\) vertices.

*Instance parameters* To finalize the construction, we fix the allowed number of facilities to \(k = n\). In addition, the penalty weights are picked such that the top \(n^2 + 1\) values are \(\lambda _1 = \cdots =\lambda _{n^2 + 1} = 1\), and the \(2n+ 1\) remaining weights are \(\lambda _{n^2+2} = \cdots =\lambda _{n^2 + 2n + 2} = \epsilon /n^2\).

### Upper bounding \(\psi ({{\mathcal {F}}}^*(n))\)

We begin by characterizing the set of optimal solutions to the instance constructed above.

### Lemma 6

Any optimal set of facilities for the ordered *k*-median problem on \({{\mathcal {T}}}(n)\) is comprised of the core vertex *C* and \(n-1\) remote vertices.

### Proof

To arrive at a contradiction, suppose that \({{\mathcal {F}}}^*\) is an optimal solution that opens at most \(n-2\) facilities at the remote vertices \(R_1, \ldots , R_n\). Since all pairwise distances are positive, \({{\mathcal {F}}}^*\) necessarily consists of *n* facilities. As a result, \({{\mathcal {F}}}^*\) contains at least two facilities at non-remote vertices. We now create a modified solution \(\tilde{{{\mathcal {F}}}}\), where one of these non-remote facilities is relocated at the core vertex *C*, and another non-remote facility is relocated to a free remote vertex, that was not holding a facility in \({{\mathcal {F}}}^*\). In the next claim, whose proof is given in Appendix C.3, we show that the objective value of \(\tilde{{\mathcal {F}}}\) is strictly smaller than that of \({{\mathcal {F}}}^*\), contradicting the optimality of \({{\mathcal {F}}}^*\).

### Claim 2

\(\psi (\tilde{{\mathcal {F}}}) < \psi ({{\mathcal {F}}}^*)\).

Consequently, \({{\mathcal {F}}}^*\) opens at least \(n-1\) facilities at the remote vertices \(R_1,\ldots ,R_n\), and it remains to show that one facility is necessarily located at the core vertex *C*. For this purpose, we first observe that the sequence of ordered distances for solutions of the latter type is given by

meaning that their cost is

To complete the proof, this quantity should be compared to the cost of other candidate solutions, in which \(n-1\) facilities are located at remote vertices and the remaining facility is located at either: (1) The auxiliary vertex *A*; (2) One of the center vertices \(c_i\); (3) One of the nearby neighbors \(m^i_j\); or (4) The remaining remote vertex. It is easy to compute the objective value of such solutions and to show that each is strictly greater than \(\varphi \) when \(n \ge 25\) and \(\epsilon \in (0,1/4)\). \(\square \)

It follows that the cost of the optimal solution is \(\psi ({{\mathcal {F}}}^*(n)) = \varphi = (1+ o(1))\cdot n^2\).

### Lower bounding \(\psi ({{\mathcal {F}}}_\mathrm {sg}(n))\)

We now describe the surrogate cost function \(\xi \), arising from the construction described in Sects. 3.1 and 3.2. Given the structure of optimal solutions as stated in Lemma 6, any such solution forms the sequence of ordered distances described by (20). Since \(\lambda _1 = \cdots =\lambda _{n^2 + 1} = 1\) and \(\lambda _{n^2+2} = \cdots =\lambda _{n^2 + 2n + 2} = \epsilon /n^2\), we have \({\lambda }_{\mathrm {sg}}((1-\epsilon )\cdot n + \epsilon ) = 1\), \({\lambda }_{\mathrm {sg}}(1) = 1\), \({\lambda }_{\mathrm {sg}}(1-\epsilon ) = \epsilon /n^2\), \({\lambda }_{\mathrm {sg}}(\epsilon ) = \epsilon /n^2\), and \({\lambda }_{\mathrm {sg}}(d) = \epsilon /n^2\) for any \(d \le \epsilon /n\). Indeed, the preprocessing step of Sect. 3.1 does not modify the original penalty weights (as \(i_{\min } = n^2 + n + 3\)), since both the ratios between extremal penalty weights (\(n^2/\epsilon \)) and between extremal positive distances (\(((1-\epsilon )\cdot n + \epsilon )/\epsilon \)) are smaller than \(|V({{\mathcal {T}}}(n))|/\epsilon = (n^2 + 2n + 2)/\epsilon \). We can now proceed by characterizing the optimal surrogate solution.

### Lemma 7

The surrogate problem on \({{\mathcal {T}}}(n)\) has a unique optimal solution, consisting of the *n* star centers, i.e., \({{\mathcal {F}}}_{\mathrm {sg}}(n) = \{c_1,\ldots , c_n\}\).

### Proof

To arrive at a contradiction, suppose that \({{\mathcal {F}}}\) is an optimal set of facilities to the surrogate problem, that opens at most \(n-1\) facilities at the centers \(c_1,\ldots ,c_n\). We construct a new solution \(\tilde{{{\mathcal {F}}}}\) by picking one facility \(f \in {{\mathcal {F}}}\), chosen among those in \({{\mathcal {F}}} {\setminus }\{c_i: i \in [n]\}\) as explained below, and relocate it to a free star center, i.e., chosen out of \( \{c_i: i \in [n]\} {\setminus } {{\mathcal {F}}}\). The proof proceeds by considering three cases.

*Case 1*\({{\mathcal {F}}}\cap \{R_1,\ldots ,R_n\} \ne \emptyset \). In this case, at least one star does not contain any facility in \({{\mathcal {F}}}\). Since the construction of \({{\mathcal {T}}}(n)\) is symmetric, we assume without loss of generality that the corresponding star has index 1. Consequently, *f* is arbitrarily picked as one of the remote vertices \({{\mathcal {F}}}\cap \{R_1,\ldots ,R_n\}\), and relocated at the free center \(c_1\). The next claim, that compares between the surrogate costs of \(\tilde{{{\mathcal {F}}}}\) and \({{\mathcal {F}}}\), is proven in Appendix C.4.

### Claim 3

\(\xi (\tilde{{\mathcal {F}}}) \le \xi ({{{\mathcal {F}}}}) - \epsilon n + 3\epsilon \).

*Case 2*\({{\mathcal {F}}}\cap \{R_1,\ldots ,R_n\} = \emptyset \)*and there exist*\(i_1\)*and**j**such that*\(m^{i_1}_j \in {{\mathcal {F}}}\). In this case, there necessarily exists a star index \(i_2\) (potentially equal to \(i_1\)) containing at most one facility, whose center \(c_{i_2}\) is free. Here, we create \(\tilde{{\mathcal {F}}}\) by picking \(f = m^{i_1}_j\) and relocating it to \(c_{i_2}\). The next claim is proven in Appendix C.5.

### Claim 4

\(\xi (\tilde{{\mathcal {F}}}) \le \xi ({{{\mathcal {F}}}}) -(n-1)\cdot (1 - \frac{\epsilon ^2}{n}) + 1+\epsilon \).

*Case 3*\({{\mathcal {F}}} \subseteq \{C,A,c_1,\ldots ,c_n\}\). In this case, every star either holds a single facility at its center, or does not contain any facility. Given that facilities are either placed at the center vertices or at two other locations, namely the core vertex *C* and the auxiliary vertex *A*, and the optimal solution makes use of *n* facilities, there are at least \(n-2\) stars holding a vertex at their center. We now pick *f* as a vertex in \({{\mathcal {F}}}\) that does not correspond to the center of a star (i.e., either the core vertex *C* or the auxiliary vertex *A*, since \({{\mathcal {F}}}\) has at most \(n-1\) facilities in the centers \(\{c_1,\ldots ,c_n\}\)), and relocate it to a free center \(c_{i_1}\). The next claim is proven in Appendix C.6.

### Claim 5

\(\xi (\tilde{{\mathcal {F}}}) \le \xi ({{{\mathcal {F}}}}) -\frac{n}{2} + 3\).

To summarize, it is easy to verify in each of the above cases that \(\xi (\tilde{{\mathcal {F}}}) < \xi ({{{\mathcal {F}}}})\) when \(n \ge 25\) and \(\epsilon \in (0,1/4)\), contradicting the optimality of \({{\mathcal {F}}}\). \(\square \)

Based on Lemma 7, we can now compute the distances associated with \({{\mathcal {F}}}_{\mathrm {sg}}(n)\):

*Auxiliary and core vertices.*The core vertex*C*is at distance \(\epsilon \) from each of the facilities \(c_1,\ldots ,c_n\), while the auxiliary vertex*A*is at distance 1.*Stars.*In each star, the nearby neighbor vertices are at distance \(1-\epsilon \) from their nearest facility, located at the center, while the centers are at distance 0.*Remote vertices.*Each remote vertex is connected to the center of a star, and thus its nearest facility is at distance \((1-\epsilon )\cdot n\).

By arranging these distances in non-increasing order and multiplying by the penalty weights, we obtain:

### Additional proofs

### Proof of Lemma 1

Since only the penalty weights of rankings \(i \ge i_{\min }\) are modified, we have

where the second inequality holds by observing that for any ranking \(i \ge i_{\min }\) we have \({\tilde{\lambda }}_i \le \frac{\epsilon \cdot \lambda _1}{n}\) or \(\varDelta _{{{\mathcal {F}}}^*}(i) \le \frac{\epsilon \cdot \varDelta _{{{\mathcal {F}}}^*}(1)}{n}\), while \(\lambda _i \le \lambda _1\) and \(\varDelta _{{{\mathcal {F}}}^*}(i) \le \varDelta _{{{\mathcal {F}}}^*}(1)\).

### Proof of Lemma 4

Let \(p \ge p'\) be the unique integers for which \(d \in {{\mathcal {D}}}^{\alpha }_p\) and \(d' \in {{\mathcal {D}}}^{\alpha }_{p'}\). We consider three cases, depending on the relation between *p* and \(p'\).

\(\varvec{p'\le p-2}\). In this case,

$$\begin{aligned} \frac{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d')}{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d)} \le \frac{\lambda _1}{\lambda _n} \le n \le n^{ 2 (d'/d-1)}, \end{aligned}$$where the last inequality holds since \(d'/d \ge (\min {{\mathcal {D}}}^{\alpha }_{p'})/(\max {{\mathcal {D}}}^{\alpha }_{p}) = 2^{p-p'-1} \ge 2\).

\(\varvec{p' = p-1}\). In this case, letting \(d = \gamma \cdot \delta _{p}/2\) and \(d' = \gamma ' \cdot \delta _{p-1}/2\), we obtain

$$\begin{aligned} \frac{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d')}{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d)}= & {} \eta _p^{2-\gamma } \cdot \eta _{p-1}^{\gamma ' - 1} \\\le & {} n^{(\gamma ' -1 + \gamma \cdot (2/\gamma -1) )} \\\le & {} n^{ 2 (\gamma ' -1 + 2/\gamma -1) } \\\le & {} n^{ 2(2\gamma '/\gamma -1) } \\= & {} n^{2 (d'/d - 1)}. \end{aligned}$$Here, the first inequality is due to \(\max \{\eta _p,\eta _{p-1}\} \le n\), and the second inequality holds since \(\gamma \le 2\). The third inequality proceeds from observing that \(\gamma ' \ge 1\) and \(2/\gamma \ge 1\), thus \(\gamma '\cdot (2/\gamma ) -1 \ge (\gamma ' - 1) + (2/\gamma - 1)\). Finally, the last equality holds since \(d'/d = (\gamma '\cdot \delta _{p-1})/(\gamma \cdot \delta _p) = 2\gamma '/\gamma \).

\(\varvec{p'=p}\). Letting \(d = \gamma \cdot \delta _p/2\) and \(d' = \gamma ' \cdot \delta _p/2\), we have

$$\begin{aligned} \frac{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d')}{{\tilde{\lambda }}_{\mathrm {sg},\alpha }(d)} = \eta _p^{\gamma ' - \gamma } = \eta _p^{\gamma \cdot (d'/d -1)} \le n^{2(d'/d - 1)}, \end{aligned}$$where the last inequality holds since \(\eta _p \le n\) and \(\gamma \le 2\).

### Proof of Claim 2

To analyze the effects of this transformation, we bound the variation of the marginal cost due to each vertex \(v\in V\), given by \(\lambda _{i_2}\cdot d(v,\tilde{{\mathcal {F}}}) - \lambda _{i_1}\cdot d(v,{{\mathcal {F}}}^*)\) when *v* occupies the rankings \(i_1\) in \({{\mathcal {F}}}^*\) and \(i_2\) in \(\tilde{{\mathcal {F}}}\). Specifically, the cost terms are broken down according to the three components of the tree \({{\mathcal {T}}}(n)\):

*Core vertex and auxiliary vertex*Since the core vertex*C*is now holding a facility in \(\tilde{{\mathcal {F}}}\), and the auxiliary vertex*A*is at distance \(1 -\epsilon \), the variation of the cost due to these vertices is clearly upper-bounded by \(\lambda _1 \cdot (1-\epsilon ) = 1-\epsilon \).*Remote vertices*Note that the free remote vertices in \({{\mathcal {F}}}^*\) have distances at least \((1-\epsilon )\cdot n > 2\) to their nearest facility, and therefore necessarily occupy rankings within \(1,\ldots ,n\), since all other non-remote vertices are within distance 2 of any non-remote facility in \({{\mathcal {F}}}^*\). As a result, their corresponding penalty weights are 1. Hence, an upper bound on the cost variation due to all remote vertices is given by \( 2\epsilon - (1-\epsilon )\cdot n\). Indeed, our relocation procedure increases the distance of at most two remote vertices, each by at most \(\epsilon \), and reduces to 0 the distance of at least one remote vertex (holding a new facility), incurring a cost variation of \(-(1-\epsilon )\cdot n\).*Stars*Our transformation relocates at most two facilities, and in addition, \(\tilde{{{\mathcal {F}}}}\) holds a facility at the core vertex*C*, which is nearest to any star than any vertex contained in another star. Consequently, there are at most two distinct stars where the distance between a vertex to its nearest facility may increase. Within each such star, since the core vertex*C*holds a facility in \({\tilde{{\mathcal {F}}}}\), the distance of the two (non-remote) vertices made vacant by our transformation would increase by at most 1, while the distance of all other vertices increases by at most \(\epsilon \). In addition, there are at most 3 vertices in the stars that could have a larger penalty weight in \({\tilde{{\mathcal {F}}}}\) than in \({{\mathcal {F}}}^*\), since the only vertices outside of the star graphs with a potentially improved ranking are: the core vertex*C*, the auxiliary vertex*A*, and the remote vertex chosen for the relocation. As a result, the variation of the cost due to the stars is upper bounded by \( 2\epsilon n + 5\).

Overall, we obtain that \(\psi (\tilde{{\mathcal {F}}}) - \psi ({{\mathcal {F}}}^*) \le -(1-3\epsilon ) n + 6 + \epsilon \), which is clearly negative for \(n \ge 25\) and \(\epsilon \in (0,1/4)\).

### Proof of Claim 3

To analyze the effects of this transformation, we distinguish between remote and non-remote vertices:

*Remote vertices*Note that the surrogate cost terms due to remote vertices increase by at most \( (1-\epsilon )\cdot n + 2\epsilon \). Indeed, the distance of*f*to its nearest facility in \(\tilde{{\mathcal {F}}}\) is at most \(d(f,c_1) = (1-\epsilon )\cdot n + 2\epsilon \), whereas the distance of any other remote vertex can only decrease.*Non-remote vertices*The distance of any non-remote vertex to its nearest facility can only decrease following this relocation procedure, since \(c_1\) is closer than the remote vertex*f*to any non-remote vertex. In particular, the distance of all nearby neighbors \(m^1_1,\ldots , m^1_n\) of \(c_1\), which was previously at least 1 since this star did not contain any facility in \({{\mathcal {F}}}\), is now \(1-\epsilon \). Since \(\lambda _{\mathrm {sg}}(1-\epsilon ) = \epsilon /n^2\) and \(\lambda _{\mathrm {sg}}(1) = 1\), the surrogate cost terms due to non-remote vertices decrease by at least \(n\cdot (1 -\epsilon (1-\epsilon )/n^2)\).

Overall, the surrogate cost variation is bounded by

### Proof of Claim 4

Note that, since \(\epsilon \in (0,1/4)\), this relocation may only decrease the distance of all vertices to their nearest facility, except for \(m^{i_1}_j\), as their distance to \(c_{i_2}\) is smaller than that to \(m^{i_1}_j\). In particular, since the star \(i_2\) originally has at most one facility, the distance of at least \(n-1\) neighbors of \(c_{i_2}\) decreases from 1 to \(1-\epsilon \). On the other hand, the only increase in distance can be for \(m^{i_1}_j\); however, we have \(d(m^{i_1}_j,c_{i_2}) \le d(m^{i_1}_j,c_{i_1}) + d(c_{i_1},c_{i_2}) \le 1+\epsilon \). Hence, the surrogate cost variation is bounded by

### Proof of Claim 5

Note that the surrogate cost terms due to the star \(i_1\) decrease by at least

On the other hand, the surrogate cost terms due to the auxiliary vertex *A* and the core vertex *C* may increase by at most 2. In addition, there is at most one additional star \(i_2\), not holding any facility, that incurs a surrogate cost variation of at most \(\epsilon (n+1)\). Indeed, the distance of its center \(c_{i_2}\) could increase by at most \(\epsilon \), since \(d(c_{i_2},f) \ge \epsilon \) and \(d(c_{i_2},c_{i_1}) = 2\epsilon \), and the distance of its neighbors \(m^{i_2}_1,\ldots ,m^{i_2}_n\) increases by at most \(\epsilon \) as well (by the case hypothesis, the neighbor vertices do not hold a facility in \({{\mathcal {F}}}\)). Finally, since \({{\mathcal {F}}}\) contains at least \(n-2\) star centers, there at most two remote vertices that can be affected by this transformation, leading to a surrogate cost variation of at most \(2\epsilon \). Overall,

## Rights and permissions

## About this article

### Cite this article

Aouad, A., Segev, D. The ordered *k*-median problem: surrogate models and approximation algorithms.
*Math. Program.* **177, **55–83 (2019). https://doi.org/10.1007/s10107-018-1259-3

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Location theory
- Approximation algorithms
- Surrogate model
- Local search

### Mathematics Subject Classification

- 68W25 Approximation Algorithms
- 90B80 Discrete location and assignment