This page showcases research tools I've built for digital philology, Slavic text processing, and corpus linguistics.
An interactive visualization tool for exploring frequency patterns in the Russian religious corpora hosted at corpora.fisun.org. Built for quick exploratory analysis of lexical data across corpora, subcorpora, and time, with results normalized as IPM so that differently sized datasets can be compared directly.
The tool has two main modes. The diachronic chart mode plots yearly frequency lines for one or more query terms, letting you track change over time and compare trends across corpora or search types (lemma vs. wordform), with optional smoothing and raw data point overlays. The frequency comparison mode produces bar charts across corpora or subcorpora, with three layouts: multiple words in a single corpus (A), one or more words across all subcorpora of a corpus (B), or a custom multi-corpus matrix (С).
Both modes support comparison with the Russian National Corpus (RNC). You can overlay RNC frequency data from three subcorpora — MAIN (main corpus), PAPER (national press), and REGIONAL (regional press) — as an additional series alongside your corpus results. Because RNC calculates corpus size without punctuation while corpora.fisun.org includes it, an optional IPM correction factor (× 0.832) is available for accurate cross-corpus comparison.
Additional features include downloadable CSV export, a black-and-white print mode for publication figures, and interactive Plotly charts exportable as PNG.
Technical background: Built with Flask on top of the Manatee/NoSkE corpus engine. Queries are executed server-side via command-line corpus utilities; IPM values are computed and passed to the front-end, where charts are rendered with Plotly.js.
A search interface for syntactically annotated corpora from corpora.fisun.org. Unlike standard corpus interfaces, it allows querying syntactic dependency structures directly: for any word form or lemma, the interface retrieves all its occurrences and shows their syntactic heads and dependents, grouped by dependency relation type. This makes it possible to ask questions such as: what words typically govern a given lemma, or what words depend on it and in what syntactic capacity.
SyntSearch also includes a syntactic portrait mode: for a given lemma, it generates a profile of its syntactic behaviour across the corpus: which relations it typically enters as a head and as a dependent, and what the most frequent words in each relation slot are. Frequency statistics are available for individual dependency pairs and relation types.
Access Note: SyntSearch is not publicly open. To request access, please email me at corpora [at] fisun.org stating your name, affiliation, and research purpose.
Built with Flask; queries are executed via the Manatee API and the results are processed server-side to extract syntactic dependency information from the corpus annotation.
A full-text search interface for the works of Fyodor Dostoevsky, supporting exact word forms, lemmatized queries, and multi-word phrases. Results include numbered occurrences, expandable context windows, and detailed statistical analysis (IPM, CV, Gini coefficient, collocations, and word forms), along with navigation across structural text divisions. The interface also includes a single-text search mode, allowing users to open a full work and navigate directly between occurrences within the text.
Built with Flask and pymorphy2 for lemmatization and morphological analysis of Russian word forms. The system automatically detects text structure based on markup.
Reference edition: Достоевский, Ф. М. (1989–1996). Собрание сочинений в 15 томах. Наука, Ленинградское отделение.
A specialized phonemic Cyrillic keyboard layout designed for German Slavic Studies. It allows for seamless input of Cyrillic characters based on the standard German Latin (QWERTZ) layout. The system supports both modern Slavic languages and historical scripts, specifically tailored for Old Church Slavonic and Church Slavonic philological work.
A desktop tool to split large Word documents into separate files based on heading styles. It is designed for managing thesis chapters, corpus segments, or long linguistic manuscripts.
The utility preserves all original formatting, tables, and images, making it a reliable solution for granular text analysis and modular editing of complex academic works.
A companion tool for merging multiple .docx files into a single master document. It automates the assembly of fragmented research papers or edited chapters into a unified monograph.
It ensures consistent section breaks and page numbering across the entire batch, providing a structurally sound final document for publication or submission.
A browser-based tool for scholarly transliteration of Cyrillic text into Latin script. The service follows the scientific transliteration tradition commonly used in Slavic linguistics and widely recognized in German-speaking academic contexts.
Supported languages include Russian, Ukrainian, Belarusian, Bulgarian, Serbian, Macedonian, and Old Church Slavonic. The interface transliterates only the relevant Cyrillic characters and leaves other scripts unchanged.
The editor preserves basic formatting such as bold, italics, line breaks, and letter case, which makes it useful for philological work, teaching materials, and quick preparation of publication-ready text.
A Microsoft Word–based tool for transliterating Russian and Ukrainian text directly inside Word documents according to the scholarly transliteration convention widely used in German-language Slavic studies.
Distributed as a Word template with VBA macros, the tool is designed to preserve ordinary document formatting during transliteration, including character styling and paragraph layout.
This makes it suitable for revising existing documents, preparing handouts and teaching materials, and converting formatted text for academic use without moving it into a separate editor.
roman [at] fisun.org