Keyness quickstart¶
Derive keywords and lockwords from two raw texts and print the reproducibility record.
What it shows¶
- Counting two texts with
counts_from_text - Positive and negative keywords, ranked by effect size
- Lockwords — the stable shared vocabulary
- The reproducibility record for the run
Run it¶
Positive keywords (more typical of the climate text):
climate log-ratio=+2.50 p05
Negative keywords (more typical of the finance text):
market log-ratio=-2.67 p05
Lockwords (stable across both texts):
the
and
Reproducibility record:
reference_id: finance-text
measure: log_likelihood
min_focus_freq: 1
min_reference_freq: 1
focus_total: 35
reference_total: 33
top_n: 10
floor: 0.5
smp_k: 100.0
keyflux_version: 0.1.0
Source¶
examples/keyness_quickstart.py
"""Keyness quickstart: keywords and lockwords from two raw texts.
Run with: uv run python examples/keyness_quickstart.py
"""
from keyflux import Keyness, counts_from_text
CLIMATE_TEXT = """
The climate report warns that carbon emissions keep rising and global warming
accelerates. Climate policy must cut emissions; renewable energy and carbon
pricing are the tools. The energy transition is a climate and policy question.
"""
FINANCE_TEXT = """
The market rallied as the stock index climbed and trade volumes rose. Investors
booked profit on energy shares; the market expects more trade. Stock policy and
the global market drive profit and shares.
"""
def main() -> None:
"""Build a Keyness comparison and print keywords and lockwords."""
focus = counts_from_text(CLIMATE_TEXT)
reference = counts_from_text(FINANCE_TEXT)
keyness = Keyness(
focus,
reference,
measure="log_likelihood",
min_focus_freq=1,
min_reference_freq=1,
reference_id="finance-text",
)
keywords = keyness.keywords(top=10)
print("Positive keywords (more typical of the climate text):")
for row in keywords.positive(5):
print(f" {row.type:<12} log-ratio={row.effect_size:+.2f} {row.significance}")
print("\nNegative keywords (more typical of the finance text):")
for row in keywords.negative(5):
print(f" {row.type:<12} log-ratio={row.effect_size:+.2f} {row.significance}")
print("\nLockwords (stable across both texts):")
for row in keyness.lockwords(min_freq_both=2):
print(f" {row.type}")
print("\nReproducibility record:")
for key, value in keywords.repro.to_dict().items():
print(f" {key}: {value}")
if __name__ == "__main__":
main()