Rank-turbulence divergence¶

The rank-sensitive comparison at the centre of keyflux.

`keyflux.divergence.rtd.rtd(list1, list2, *, alpha=1.0 / 3.0, normalize=True)` ¶

Compute the rank-turbulence divergence between two ranked lists.

Parameters:

Name	Type	Description	Default
`list1`	`RankedList`	The first ranked list.	required
`list2`	`RankedList`	The second ranked list.	required
`alpha`	`float`	Tuning parameter (`>= 0`). Small values surface rare, low-rank churn; large values surface common-word shifts. `alpha == 0` uses the logarithmic limit. The default `1/3` is the Dodds et al. recommendation for text.	`1.0 / 3.0`
`normalize`	`bool`	If True, return `divergence` scaled to [0, 1]; the `raw` field always holds the un-normalised sum.	`True`

Returns:

Name	Type	Description
`An`	`RTDResult`	class:`RTDResult` with the scalar divergence and the sorted per-type
	`RTDResult`	contributions, each tagged with its shift direction.

Raises:

Type	Description
`ValueError`	If `alpha` is negative, or either list is empty.

Contract

rtd(x, x).divergence == 0 (a list never diverges from itself).
0 <= divergence <= 1 for every input and every alpha.
Symmetric: rtd(a, b).divergence == rtd(b, a).divergence.
The per-type contributions sum to divergence.
Exclusives (present in one list only) are placed at a tied-last rank.

Examples:

>>> from keyflux.ranking.rankedlist import RankedList
>>> r1 = RankedList.from_counts({"a": 20, "e": 14, "c": 8, "b": 7,
...                              "f": 4, "g": 2, "d": 1})
>>> r2 = RankedList.from_counts({"b": 24, "a": 16, "e": 5, "d": 4,
...                              "c": 3, "f": 2, "g": 1})
>>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
0.459248
>>> rtd(r1, r1, alpha=1.0).divergence
0.0

Source code in keyflux/divergence/rtd.py

def rtd(
    list1: RankedList,
    list2: RankedList,
    *,
    alpha: float = 1.0 / 3.0,
    normalize: bool = True,
) -> RTDResult:
    """Compute the rank-turbulence divergence between two ranked lists.

    Args:
        list1: The first ranked list.
        list2: The second ranked list.
        alpha: Tuning parameter (``>= 0``). Small values surface rare, low-rank
            churn; large values surface common-word shifts. ``alpha == 0`` uses
            the logarithmic limit. The default ``1/3`` is the Dodds et al.
            recommendation for text.
        normalize: If True, return ``divergence`` scaled to [0, 1]; the ``raw``
            field always holds the un-normalised sum.

    Returns:
        An :class:`RTDResult` with the scalar divergence and the sorted per-type
        contributions, each tagged with its shift direction.

    Raises:
        ValueError: If ``alpha`` is negative, or either list is empty.

    Contract:
        - ``rtd(x, x).divergence == 0`` (a list never diverges from itself).
        - ``0 <= divergence <= 1`` for every input and every ``alpha``.
        - Symmetric: ``rtd(a, b).divergence == rtd(b, a).divergence``.
        - The per-type contributions sum to ``divergence``.
        - Exclusives (present in one list only) are placed at a tied-last rank.

    Examples:
        >>> from keyflux.ranking.rankedlist import RankedList
        >>> r1 = RankedList.from_counts({"a": 20, "e": 14, "c": 8, "b": 7,
        ...                              "f": 4, "g": 2, "d": 1})
        >>> r2 = RankedList.from_counts({"b": 24, "a": 16, "e": 5, "d": 4,
        ...                              "c": 3, "f": 2, "g": 1})
        >>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
        0.459248
        >>> rtd(r1, r1, alpha=1.0).divergence
        0.0
    """
    if alpha < 0:
        msg = f"alpha must be non-negative, got {alpha}."
        raise ValueError(msg)
    if not len(list1) or not len(list2):
        msg = "Both ranked lists must be non-empty."
        raise ValueError(msg)

    types, ranks1, ranks2 = list1.aligned(list2)
    n1 = len(list1)
    n2 = len(list2)

    if alpha < _ALPHA_ZERO_TOL:
        deltas, norm = _elements_log_limit(ranks1, ranks2, n1, n2)
        prefactor = 1.0
    else:
        deltas, norm = _elements_general(ranks1, ranks2, n1, n2, alpha)
        prefactor = (alpha + 1.0) / alpha

    raw = prefactor * math.fsum(deltas)
    normalizer = prefactor * norm
    divergence = raw / normalizer if normalize and normalizer > 0 else raw
    if normalize and normalizer <= 0:
        divergence = 0.0

    share_denom = normalizer if normalize and normalizer > 0 else 1.0
    contributions = tuple(
        sorted(
            (
                Contribution(
                    type=t,
                    delta=d,
                    contribution=prefactor * d / share_denom,
                    rank1=r1,
                    rank2=r2,
                    direction=_direction(r1, r2),
                )
                for t, d, r1, r2 in zip(types, deltas, ranks1, ranks2, strict=True)
            ),
            key=lambda c: c.contribution,
            reverse=True,
        )
    )
    return RTDResult(
        divergence=divergence,
        raw=raw,
        alpha=alpha,
        contributions=contributions,
        labels=(list1.label, list2.label),
    )

`keyflux.divergence.rtd.RTDResult` `dataclass` ¶

The result of a rank-turbulence divergence computation.

Attributes:

Name	Type	Description
`divergence`	`float`	The normalised divergence in [0, 1].
`raw`	`float`	The un-normalised weighted sum (matches the jkbren reference at `alpha=1.0` when `normalize=False`).
`alpha`	`float`	The tuning parameter used.
`contributions`	`tuple[Contribution, ...]`	Per-type contributions, sorted by contribution descending.
`labels`	`tuple[str, str]`	The two list labels.

Source code in keyflux/divergence/rtd.py

@dataclass(frozen=True, slots=True)
class RTDResult:
    """The result of a rank-turbulence divergence computation.

    Attributes:
        divergence: The normalised divergence in [0, 1].
        raw: The un-normalised weighted sum (matches the jkbren reference at
            ``alpha=1.0`` when ``normalize=False``).
        alpha: The tuning parameter used.
        contributions: Per-type contributions, sorted by contribution descending.
        labels: The two list labels.
    """

    divergence: float
    raw: float
    alpha: float
    contributions: tuple[Contribution, ...]
    labels: tuple[str, str]

`keyflux.divergence.rtd.Contribution` `dataclass` ¶

One type's contribution to the total divergence.

Attributes:

Name	Type	Description
`type`	`str`	The type.
`delta`	`float`	The raw per-type divergence term (before normalisation).
`contribution`	`float`	This type's additive share of `divergence` (the shares sum to the total divergence).
`rank1`	`float`	The type's rank in the first list (tied-last if absent).
`rank2`	`float`	The type's rank in the second list (tied-last if absent).
`direction`	`ShiftDirection`	Which list the type leans toward (lower rank wins).

Source code in keyflux/divergence/rtd.py

@dataclass(frozen=True, slots=True)
class Contribution:
    """One type's contribution to the total divergence.

    Attributes:
        type: The type.
        delta: The raw per-type divergence term (before normalisation).
        contribution: This type's additive share of ``divergence`` (the shares
            sum to the total divergence).
        rank1: The type's rank in the first list (tied-last if absent).
        rank2: The type's rank in the second list (tied-last if absent).
        direction: Which list the type leans toward (lower rank wins).
    """

    type: str
    delta: float
    contribution: float
    rank1: float
    rank2: float
    direction: ShiftDirection

Rank-turbulence divergence¶

keyflux.divergence.rtd.rtd(list1, list2, *, alpha=1.0 / 3.0, normalize=True) ¶

keyflux.divergence.rtd.RTDResult dataclass ¶

keyflux.divergence.rtd.Contribution dataclass ¶

`keyflux.divergence.rtd.rtd(list1, list2, *, alpha=1.0 / 3.0, normalize=True)` ¶

`keyflux.divergence.rtd.RTDResult` `dataclass` ¶

`keyflux.divergence.rtd.Contribution` `dataclass` ¶