Skip to content

Datasets

Tiny bundled corpora and fixtures for docs and tests.

keyflux.datasets

Tiny bundled corpora and fixtures for docs and tests.

The data lives inline as Python dicts so it is always importable in doctests with no package-data or importlib.resources machinery.

load_demo_pair()

Return the bundled (focus, reference) demo corpus pair.

A tiny climate-discourse focus corpus versus a finance-discourse reference corpus, with shared function words and a couple of lockword-like overlaps.

Returns:

Type Description
tuple[Counter[str], Counter[str]]

(focus, reference) frequency Counters.

Examples:

>>> focus, reference = load_demo_pair()
>>> focus["climate"], reference["market"]
(42, 40)
Source code in keyflux/datasets/__init__.py
def load_demo_pair() -> tuple[Counter[str], Counter[str]]:
    """Return the bundled (focus, reference) demo corpus pair.

    A tiny climate-discourse focus corpus versus a finance-discourse reference
    corpus, with shared function words and a couple of lockword-like overlaps.

    Returns:
        ``(focus, reference)`` frequency Counters.

    Examples:
        >>> focus, reference = load_demo_pair()
        >>> focus["climate"], reference["market"]
        (42, 40)
    """
    return Counter(_DEMO_FOCUS), Counter(_DEMO_REFERENCE)

load_jkbren_example()

Return the jkbren rank-turbulence-divergence regression pair.

Two ranked lists over the same seven elements. Their rank-turbulence divergence at alpha=1.0 is 0.45924793111057804 — the regression anchor from the reference implementation.

Returns:

Type Description
tuple[RankedList, RankedList]

(list1, list2) as :class:keyflux.ranking.rankedlist.RankedList.

Examples:

>>> from keyflux.divergence import rtd
>>> r1, r2 = load_jkbren_example()
>>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
0.459248
Source code in keyflux/datasets/__init__.py
def load_jkbren_example() -> tuple[RankedList, RankedList]:
    """Return the jkbren rank-turbulence-divergence regression pair.

    Two ranked lists over the same seven elements. Their rank-turbulence
    divergence at ``alpha=1.0`` is ``0.45924793111057804`` — the regression
    anchor from the reference implementation.

    Returns:
        ``(list1, list2)`` as :class:`keyflux.ranking.rankedlist.RankedList`.

    Examples:
        >>> from keyflux.divergence import rtd
        >>> r1, r2 = load_jkbren_example()
        >>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
        0.459248
    """
    from keyflux.ranking.rankedlist import RankedList

    return (
        RankedList.from_counts(_JKBREN_FOCUS, label="system 1"),
        RankedList.from_counts(_JKBREN_REFERENCE, label="system 2"),
    )