Древна Тракия в цифри

Ancient History

Pliny the Elder, Naturalis Historia: Lemma Frequency Visualization

Within DigiThrace project, an innovative information extraction algorithm for Pliny’s “Natural History” has been developed. We used the state-of-the-art Python NLP library SpaCy and the Latin language models in LatinCy to develop a modern solution. The algorithm accepts a single lemma or a list of lemmas as input, producing a CSV dataset containing citations, context, and lemma variants.
This facilitates efficient linguistic analysis of Pliny’s work, initially focusing on Moesia and Thrace. We curated datasets on ethnonyms, places, mountains, and waterways. Using Streamlit and Matplotlib, we improved user interaction and visualization, aiding researchers in exploring ancient Thrace in Pliny’s writings.

The visualisation can be accessed at the following link:

https://huggingface.co/spaces/bestroi/PliniusNatHist

The csv dataset is available on FigShare: 

https://doi.org/10.6084/m9.figshare.27044578.v1

Author: Kristiyan Simeonov,
Researcher (R1),
Department of Classics,
Sofia University “St. Kliment Ohridski”