Classify¶
Keyword and lockword categorisation helpers.
keyflux.keyness.classify
¶
Keyword and lockword categorisation (Baker 2011; Brezina Ch. 3).
Each compared type is one of three categories:
- Positive keyword (+): significantly more frequent in the focus corpus.
- Negative keyword (-): significantly more frequent in the reference corpus.
- Lockword (0): comparable relative frequency in both corpora.
This module owns the categorisation boundary (direction and band thresholds);
the numeric zero-cell flooring lives in :mod:keyflux.keyness.measures.
Category = Literal['keyword+', 'keyword-', 'lockword', 'other']
module-attribute
¶
The bucket a type falls into under :func:classify_row.
classify_direction(focus_rf, reference_rf)
¶
Decide keyness polarity from relative frequencies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
focus_rf
|
float
|
Relative frequency of the type in the focus corpus. |
required |
reference_rf
|
float
|
Relative frequency of the type in the reference corpus. |
required |
Returns:
| Type | Description |
|---|---|
Direction
|
|
Direction
|
more frequent in the reference corpus, |
Contract
- Swapping the two arguments swaps
"positive"and"negative"and leaves"neutral"unchanged.
Examples:
>>> classify_direction(0.003, 0.001)
'positive'
>>> classify_direction(0.001, 0.003)
'negative'
>>> classify_direction(0.002, 0.002)
'neutral'
Source code in keyflux/keyness/classify.py
classify_row(row, *, min_significance='p05', lockword_max_abs_log_ratio=0.5)
¶
Bucket a keyness row into keyword(+/-), lockword, or other.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
row
|
KeynessRow
|
A scored keyness row. |
required |
min_significance
|
Significance
|
The weakest band that counts as a keyword. |
'p05'
|
lockword_max_abs_log_ratio
|
float
|
A non-significant type is a lockword only if its absolute log ratio is at or below this (relative frequencies near parity). |
0.5
|
Returns:
| Type | Description |
|---|---|
Category
|
|
Category
|
non-significant type whose frequencies are too far apart to be stable). |
Contract
- Significant rows are keywords; their sign follows
row.direction. - A non-significant row is a lockword when its effect size is small,
otherwise
"other". - Frequency cutoffs (minimum evidence in both corpora) are applied by
:meth:
keyflux.keyness.keyness.Keyness.lockwords, not here.
Examples:
>>> from keyflux.keyness.keyness import KeynessRow
>>> kw = KeynessRow("war", 620, 267, 609.1, 265.0, 140.87, 1.2,
... "p0001", 140.87, "positive")
>>> classify_row(kw)
'keyword+'
>>> lock = KeynessRow("the", 59901, 58960, 58848.8, 58519.2, 1.5, 0.01,
... "ns", 1.5, "positive")
>>> classify_row(lock)
'lockword'
Source code in keyflux/keyness/classify.py
is_significant(significance, min_significance='p05')
¶
Whether a significance band reaches at least min_significance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
significance
|
Significance
|
The band to test. |
required |
min_significance
|
Significance
|
The weakest band that still counts as significant. |
'p05'
|
Returns:
| Type | Description |
|---|---|
bool
|
True if |
Contract
"ns"is never significant for anymin_significanceabove it.- Monotone in the band ordering ns < p05 < p01 < p001 < p0001.
Examples:
>>> is_significant("p001")
True
>>> is_significant("ns")
False
>>> is_significant("p05", min_significance="p01")
False
Source code in keyflux/keyness/classify.py
partition(rows, *, min_significance='p05', lockword_max_abs_log_ratio=0.5)
¶
Group rows by :func:classify_row category.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rows
|
Sequence[KeynessRow]
|
Scored keyness rows. |
required |
min_significance
|
Significance
|
The weakest band that counts as a keyword. |
'p05'
|
lockword_max_abs_log_ratio
|
float
|
Lockword effect-size ceiling. |
0.5
|
Returns:
| Type | Description |
|---|---|
dict[Category, list[KeynessRow]]
|
A dict mapping each category to its rows. Every input row appears in |
dict[Category, list[KeynessRow]]
|
exactly one bucket; absent categories map to an empty list. |
Contract
- The buckets partition the input: disjoint and exhaustive.
- All four category keys are always present.
Examples:
>>> from keyflux.keyness.keyness import KeynessRow
>>> rows = [
... KeynessRow("war", 620, 267, 609.1, 265.0, 140.87, 1.2,
... "p0001", 140.87, "positive"),
... KeynessRow("the", 59901, 58960, 58848.8, 58519.2, 1.5, 0.01,
... "ns", 1.5, "positive"),
... ]
>>> buckets = partition(rows)
>>> len(buckets["keyword+"]), len(buckets["lockword"])
(1, 1)