Skip to content

Troubleshooting

ValueError: Both corpora must be non-empty to compute keyness

Keyness needs both Counters to have at least one token. Check that your tokenisation produced counts, and that the focus and reference corpora are not swapped with an empty placeholder.

My keyword list is empty

The most common cause is the minimum-frequency cutoff. On small corpora the defaults (min_focus_freq=5, min_reference_freq=5) can exclude everything. Lower them:

Keyness(focus, reference, min_focus_freq=1, min_reference_freq=1)

A type enters the scored table only when it clears the minimum in at least one corpus, so raising the cutoffs is also how you exclude under-evidenced absent words.

Too many keywords

On large corpora the log-likelihood flags far too many types as significant — this is expected (Brezina 2018). Rank by effect size and take the top N:

k.keywords(top=50)        # significance-filtered, sorted by |log ratio|

My lockword list is empty

Lockwords must be frequent in both corpora (min_freq_both, default 5) and near parity (max_abs_log_ratio, default 0.5). On small corpora, lower min_freq_both:

k.lockwords(min_freq_both=2)

Simple Maths does not match a textbook value

keyflux computes Simple Maths from true relative frequencies per million words (formula 3.6 in Brezina). Some worked examples plug in absolute frequencies directly, which is only valid when both corpora are about a million tokens. For the war example, keyflux returns 1.94 (correct per-million) where a textbook may print 1.96 (absolute-frequency shortcut).

rtd returns 0 for two different corpora

rtd(x, x) is 0 by definition. If two genuinely different lists return 0, check that you did not build both RankedLists from the same Counter.

The allotaxonograph window does not appear

allotaxonograph never calls show() — it returns a Figure. In a script, save it (fig.savefig(...)); in Jupyter, make it the last expression in a cell so the inline backend renders it.

Figures fail to render on a headless server

keyflux builds figures with matplotlib's Figure API and does not require an interactive backend, so it works headless out of the box. If you also import matplotlib.pyplot elsewhere, set a non-interactive backend first:

import matplotlib
matplotlib.use("Agg")

Real linguistic tokenisation

keyflux ships only a simple word-character tokeniser, deliberately — it is about keyness and rank comparison, not tokenisation. For lemmatisation, multi-word handling, or non-English text, pre-tokenise (for example with kenon.Tokenizer) and pass the resulting Counter to Keyness and RankedList.from_counts.