 max planck institut
informatik # NetworkPrioritizer - Documentation

## Definitions

This section lists the general definitions of centrality measures, ranking distances, and rank aggregation algorithms that are implemented in NetworkPrioritizer.
Hint: The formulas on this page are written in LaTeX and rendered by MathJax. Please make sure Javascript is enabled in your browser and allow scripts from rackcdn.com to be executed. In case the formulas are not displayed correctly, you should be able to copy the displayed LaTeX code.

### Centrality Measures

Given a network $$G=(V,E)$$ consisting of a node set $$V$$ and an edge set $$E$$, the centrality of node $$v \in V$$ can be determined by the following measures:

The degree centrality ($$DC(v)$$) quantifies the direct connectivity of $$v$$. In unweighted networks, it is defined as the number of neighbors. In weighted networks $$DC(v)$$ is the sum of the weights of the edges between $$v$$ and its neighbours.

The shortest path betweenness of $$v$$ ($$SPB(v)$$) is the fraction of shortest paths between all pairs of nodes $$s, t \in V$$ that pass through $$v$$. $$SPB(v)$$ measures the influence $$v$$ has on the directed information flow in the network. Formally, $SPB(v) = \frac{2}{(|V|-1)(|V|-2)}\sum_{s,t\in V,s\neq t}\frac{\sigma_{st}(v)}{\sigma_{st}}$ where $$\sigma_{st}$$ is the number of shortest paths from $$s$$ to $$t$$ and $$\sigma_{st}(v)$$ is the number of shortest paths from $$s$$ to $$t$$ that pass through $$v$$.

The shortest path closeness of $$v$$ ($$SPC(v)$$) is defined as the inverted average distance of $$v$$ to all other nodes. This measure quantifies how fast information can spread from $$v$$ through the rest of the network in a directed manner. Formally, $SPC(v) = \frac{|V|-1}{\sum\limits_{u\neq v\in V}d_{SP}(v,u)}$ with distance function $$d_{SP}$$.

The random walk betweenness of $$v$$ ($$RWB(v)$$) quantifies how often $$v$$ is expected to be visited on a random walk between two node $$s,t \in V$$, averaged over all pairs of nodes in the network. This determines how much influence $$v$$ has on information randomly spreading through the network and be calculated as $RWB(v) =\frac{\sum\limits_{(s,t)\in V\times V}VF_{st}(v)}{|V|(|V|-1)}$ where $$VF_{st}(v)$$ is the expected visit frequency of $$v$$ on a random walk from $$s$$ to $$t$$.

The random walk transmitter closeness of $$v$$ ($$RWTC(v)$$) is a measure of how fast $$v$$ can spread randomly traveling information in the network. $$RWTC(v)$$ can be calculated as $RWTC(v) = \frac{\sum\limits_{t\neq v\in V}HT_v(t)}{|V|-1}$ where $$HT_v(t)$$ is the expected number of time steps until $$t$$ is reached for the first time by a random walk starting at $$v$$.

The random walk receiver closeness of $$v$$ ($$RWRC(v)$$) measures how fast $$v$$ can pick up information that travels randomly through the network. Formally, $RWRC(v) = \frac{\sum\limits_{s\neq v\in V}HT_s(v)}{|V|-1}$.

### Ranking Distances

Ranking distances are used to compare rankings and evaluate aggregations. In the general information retrieval setting, an aggregation is considered good if it minimizes some measure of distance between itself and the primary rankings that are aggregated. NetworkPrioritizer offers ranking distances also as a potential guideline for how to aggregate primary rankings. For example, consider the case that all primary rankings except the degree centrality ranking are similar. If there is no biological reason for one to be interested in proteins that have many direct interactions, it would then be advisable to put less weight on the degree centrality ranking or even exclude it completely from the aggregation.
NetworkPrioritizer provides two prominent measures for the distance between two rankings. Both of them are generally applicable and thus not restricted to rankings of genes or proteins. In the following, consider two rankings $$\phi$$ and $$\tau$$, both of which rank all $$v \in V_c$$, where $$V_c$$ is the set of nodes that are ranked (i.e. all nodes in the graph but the seed nodes). Let $$\phi(v)$$ and $$\tau(v)$$ denote the rank of $$v$$ in $$\phi$$ and $$\tau$$, respectively.

The Spearman footrule distance is the sum of absolute rank differences over all ranked nodes. It is calculated as $SFD(\phi,\tau)=\sum_{v \in V_c}|\phi(v)-\tau(v)|$

The Kendall tau distance counts the number of pairwise disagreements between two rankings. Formally, $KTD(\phi,\tau)=|\{(u,v) \in V_c\times V_c\,|\,\phi(u)<\phi(v) \textrm{ but } \tau(u)>\tau(v)\}|$

In NetworkPrioritizer, both distances are normalized by their maximal value so that $$SFD(v) \in [0,1]$$ and $$KTD(v) \in [0,1]$$.

### Rank Aggregation Algorithms

A rank aggregation algorithm is used to merge a set of primary rankings $$\Phi$$ of $$V_c$$ (the nodes to rank, i.e. the candidate disease genes) into one final ranking $$\tau$$. NetworkPrioritizer features three different rank aggregation algorithms. In case two or more nodes are tied for the same rank in the aggregated ranking, NetworkPrioritizer either assigns the same rank to all of them or ranks them in arbitrary order. This depends on the user's choice in the NetworkPrioritizer preferences that new rankings are non-injective or injective, respectively.

The Weighted Borda Fuse algorithm (WBF) performs an aggregation considering the weighted mean rank of a node in all primary rankings. This means that, if a node is ranked high in many primary rankings, it will also be ranked high in the aggregated ranking. This algorithm does not consider the actual scores of the nodes that underlie the primary rankings. It can thus be used to aggregate rankings that are based on scores that are not normalized to be on the same scale. Formally, given weights $$w(\phi), \forall \phi\in\Phi$$, WBF calculates for each $$v \in V_c$$ a new score $$sc(v)=\sum_{\phi\in\Phi}|V_c|+1-w(\phi)\phi(v)$$ and then ranks the nodes in decreasing order of $$sc$$. The lower $$sc$$, the higher the rank.

The Weighted AddScore Fuse algorithm (WASF) ranks nodes in the aggregation according to the weighted sum of their scores in the primary rankings. If the scores underlying the primary rankings are on comparable scales, this allows for a more accurate aggregation than the Weighted Borda Fuse algorithm. Formally, given weights $$w(\phi),\forall\phi\in\Phi$$, WASF calculates for each $$v \in V_c$$ a new score $$sc(v)=\sum_{\phi\in\Phi}w(\phi)\hat{\phi}(v)$$, where $$\hat{\phi}(v)$$ is the score of $$v$$ in $$\phi$$, and then ranks the nodes in increasing order of $$sc$$.

The MaxRank Fuse algorithm (MRF) assigns each node the highest rank it has achieved in any primary ranking. This means that, if $$v$$ achieved a high rank in any primary ranking, it will also be on a high rank in the aggregated ranking. MRF is an useful algorithm if one is interested in nodes that are very important with regard to some centrality measure but do not have to attain high values with regard to all measures.