Datasets¶
Tiny bundled corpora and fixtures for docs and tests.
keyflux.datasets
¶
Tiny bundled corpora and fixtures for docs and tests.
The data lives inline as Python dicts so it is always importable in doctests
with no package-data or importlib.resources machinery.
load_demo_pair()
¶
Return the bundled (focus, reference) demo corpus pair.
A tiny climate-discourse focus corpus versus a finance-discourse reference corpus, with shared function words and a couple of lockword-like overlaps.
Returns:
| Type | Description |
|---|---|
tuple[Counter[str], Counter[str]]
|
|
Examples:
Source code in keyflux/datasets/__init__.py
load_jkbren_example()
¶
Return the jkbren rank-turbulence-divergence regression pair.
Two ranked lists over the same seven elements. Their rank-turbulence
divergence at alpha=1.0 is 0.45924793111057804 — the regression
anchor from the reference implementation.
Returns:
| Type | Description |
|---|---|
tuple[RankedList, RankedList]
|
|
Examples:
>>> from keyflux.divergence import rtd
>>> r1, r2 = load_jkbren_example()
>>> round(rtd(r1, r2, alpha=1.0).divergence, 6)
0.459248