Skip to content

Rank-turbulence divergence

The rank-sensitive comparison at the centre of keyflux.

keyflux.divergence.rtd.rtd(list1, list2, *, alpha=1.0 / 3.0, normalize=True)

Compute the rank-turbulence divergence between two ranked lists.

Parameters:

Name Type Description Default
list1 RankedList

The first ranked list.

required
list2 RankedList

The second ranked list.

required
alpha float

Tuning parameter (>= 0). Small values surface rare, low-rank churn; large values surface common-word shifts. alpha == 0 uses the logarithmic limit. The default 1/3 is the Dodds et al. recommendation for text.

1.0 / 3.0
normalize bool

If True, return divergence scaled to [0, 1]; the raw field always holds the un-normalised sum.

True

Returns:

Name Type Description
An RTDResult

class:RTDResult with the scalar divergence and the sorted per-type

RTDResult

contributions, each tagged with its shift direction.

Raises:

Type Description
ValueError

If alpha is negative, or either list is empty.

Contract
  • rtd(x, x).divergence == 0 (a list never diverges from itself).
  • 0 <= divergence <= 1 for every input and every alpha.
  • Symmetric: rtd(a, b).divergence == rtd(b, a).divergence.
  • The per-type contributions sum to divergence.
  • Exclusives (present in one list only) are placed at a tied-last rank.

Examples:

>>> from keyflux.ranking.rankedlist import RankedList
>>> r1 = RankedList.from_counts({"a": 20, "e": 14, "c": 8, "b": 7,
...                              "f": 4, "g": 2, "d": 1})
>>> r2 = RankedList.from_counts({"b": 24, "a": 16, "e": 5, "d": 4,
...                              "c": 3, "f": 2, "g": 1})
>>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
0.459248
>>> rtd(r1, r1, alpha=1.0).divergence
0.0
Source code in keyflux/divergence/rtd.py
def rtd(
    list1: RankedList,
    list2: RankedList,
    *,
    alpha: float = 1.0 / 3.0,
    normalize: bool = True,
) -> RTDResult:
    """Compute the rank-turbulence divergence between two ranked lists.

    Args:
        list1: The first ranked list.
        list2: The second ranked list.
        alpha: Tuning parameter (``>= 0``). Small values surface rare, low-rank
            churn; large values surface common-word shifts. ``alpha == 0`` uses
            the logarithmic limit. The default ``1/3`` is the Dodds et al.
            recommendation for text.
        normalize: If True, return ``divergence`` scaled to [0, 1]; the ``raw``
            field always holds the un-normalised sum.

    Returns:
        An :class:`RTDResult` with the scalar divergence and the sorted per-type
        contributions, each tagged with its shift direction.

    Raises:
        ValueError: If ``alpha`` is negative, or either list is empty.

    Contract:
        - ``rtd(x, x).divergence == 0`` (a list never diverges from itself).
        - ``0 <= divergence <= 1`` for every input and every ``alpha``.
        - Symmetric: ``rtd(a, b).divergence == rtd(b, a).divergence``.
        - The per-type contributions sum to ``divergence``.
        - Exclusives (present in one list only) are placed at a tied-last rank.

    Examples:
        >>> from keyflux.ranking.rankedlist import RankedList
        >>> r1 = RankedList.from_counts({"a": 20, "e": 14, "c": 8, "b": 7,
        ...                              "f": 4, "g": 2, "d": 1})
        >>> r2 = RankedList.from_counts({"b": 24, "a": 16, "e": 5, "d": 4,
        ...                              "c": 3, "f": 2, "g": 1})
        >>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
        0.459248
        >>> rtd(r1, r1, alpha=1.0).divergence
        0.0
    """
    if alpha < 0:
        msg = f"alpha must be non-negative, got {alpha}."
        raise ValueError(msg)
    if not len(list1) or not len(list2):
        msg = "Both ranked lists must be non-empty."
        raise ValueError(msg)

    types, ranks1, ranks2 = list1.aligned(list2)
    n1 = len(list1)
    n2 = len(list2)

    if alpha < _ALPHA_ZERO_TOL:
        deltas, norm = _elements_log_limit(ranks1, ranks2, n1, n2)
        prefactor = 1.0
    else:
        deltas, norm = _elements_general(ranks1, ranks2, n1, n2, alpha)
        prefactor = (alpha + 1.0) / alpha

    raw = prefactor * math.fsum(deltas)
    normalizer = prefactor * norm
    divergence = raw / normalizer if normalize and normalizer > 0 else raw
    if normalize and normalizer <= 0:
        divergence = 0.0

    share_denom = normalizer if normalize and normalizer > 0 else 1.0
    contributions = tuple(
        sorted(
            (
                Contribution(
                    type=t,
                    delta=d,
                    contribution=prefactor * d / share_denom,
                    rank1=r1,
                    rank2=r2,
                    direction=_direction(r1, r2),
                )
                for t, d, r1, r2 in zip(types, deltas, ranks1, ranks2, strict=True)
            ),
            key=lambda c: c.contribution,
            reverse=True,
        )
    )
    return RTDResult(
        divergence=divergence,
        raw=raw,
        alpha=alpha,
        contributions=contributions,
        labels=(list1.label, list2.label),
    )

keyflux.divergence.rtd.RTDResult dataclass

The result of a rank-turbulence divergence computation.

Attributes:

Name Type Description
divergence float

The normalised divergence in [0, 1].

raw float

The un-normalised weighted sum (matches the jkbren reference at alpha=1.0 when normalize=False).

alpha float

The tuning parameter used.

contributions tuple[Contribution, ...]

Per-type contributions, sorted by contribution descending.

labels tuple[str, str]

The two list labels.

Source code in keyflux/divergence/rtd.py
@dataclass(frozen=True, slots=True)
class RTDResult:
    """The result of a rank-turbulence divergence computation.

    Attributes:
        divergence: The normalised divergence in [0, 1].
        raw: The un-normalised weighted sum (matches the jkbren reference at
            ``alpha=1.0`` when ``normalize=False``).
        alpha: The tuning parameter used.
        contributions: Per-type contributions, sorted by contribution descending.
        labels: The two list labels.
    """

    divergence: float
    raw: float
    alpha: float
    contributions: tuple[Contribution, ...]
    labels: tuple[str, str]

keyflux.divergence.rtd.Contribution dataclass

One type's contribution to the total divergence.

Attributes:

Name Type Description
type str

The type.

delta float

The raw per-type divergence term (before normalisation).

contribution float

This type's additive share of divergence (the shares sum to the total divergence).

rank1 float

The type's rank in the first list (tied-last if absent).

rank2 float

The type's rank in the second list (tied-last if absent).

direction ShiftDirection

Which list the type leans toward (lower rank wins).

Source code in keyflux/divergence/rtd.py
@dataclass(frozen=True, slots=True)
class Contribution:
    """One type's contribution to the total divergence.

    Attributes:
        type: The type.
        delta: The raw per-type divergence term (before normalisation).
        contribution: This type's additive share of ``divergence`` (the shares
            sum to the total divergence).
        rank1: The type's rank in the first list (tied-last if absent).
        rank2: The type's rank in the second list (tied-last if absent).
        direction: Which list the type leans toward (lower rank wins).
    """

    type: str
    delta: float
    contribution: float
    rank1: float
    rank2: float
    direction: ShiftDirection