Skip to content

Two views of one comparison

A keyword/lockword table and an allotaxonograph are two views of the same thing. This is the unifying idea behind keyflux:

  • Lockwords are the diagonal. Types at comparable rank in both corpora sit on the equality line of the rank-rank histogram — the stable centre.
  • Keywords are the off-diagonal high-divergence contributors. Types that moved sit away from the diagonal and dominate the rank-turbulence-divergence contribution list.

The categorical (corpus-linguistics) view and the continuous (RTD) view describe one comparison from two angles.

Set up one comparison

from keyflux import Keyness, RankedList, rtd, allotaxonograph
from keyflux.datasets import load_demo_pair

focus, reference = load_demo_pair()

# Categorical view: keywords and lockwords
k = Keyness(focus, reference, min_focus_freq=1, min_reference_freq=1)
keywords = {r.type for r in k.keywords()}
lockwords = {r.type for r in k.lockwords()}

# Continuous view: rank-turbulence divergence over the same counts
r1 = RankedList.from_counts(focus, label="climate")
r2 = RankedList.from_counts(reference, label="finance")
result = rtd(r1, r2, alpha=1/3)

Lockwords sit near the diagonal

Lockwords have near-equal ranks in both lists, so their rank-rank points cluster on the centre line and they contribute almost nothing to the divergence:

contribution = {c.type: c.contribution for c in result.contributions}
sorted(lockwords, key=lambda t: contribution.get(t, 0.0))[:3]
# the lowest-contribution types are lockwords — they barely moved

Keywords are the off-diagonal contributors

The top divergence contributors are exactly the keywords — the types whose ranks diverge between the two corpora:

top_contributors = {c.type for c in result.contributions[:8]}
top_contributors & keywords        # substantial overlap
top_contributors & lockwords       # empty — lockwords don't drive the shift

See both at once

The allotaxonograph shows both views in one figure: the diagonal of the left panel is the lockword zone, and the bars of the right panel are the keyword contributions.

fig = allotaxonograph(r1, r2, alpha=1/3, labels=("climate", "finance"))
fig.savefig("two_views.png")

Reading the picture: function words such as the, of, and and sit at the top centre (shallow depth, on the diagonal) — the lockwords. climate, carbon, and market, stock fan out to the left and right edges — the positive and negative keywords — and they are the bars that dominate the contribution panel.