A Python package to transliterate various Cyrillic alphabets to and from the Latin alphabet using the American Library Association-Library of Congress Romanization tables. Domovyk addresses some limitations in existing transliteration packages, providing multilingual functionality, support for composite Unicode characters, and support for languages not addressed in other packages, such as Church Slavonic and Carpatho-Rusyn. This package aims to increase the accessibility of transliteration technologies for users working in these languages, focusing on use cases that require thorough and accurate transliteration.
Rozha is a Python package to simplify and streamline a number of natural language processing processes and methods for a wide variety of languages, empowering users to use NLP on both non-English and English texts.
WordGoblin is a simple, lightweight word finder package that returns words containing letters specified by the user. English is supported within the package, and custom dictionaries or lists of words can easily be loaded to work with other languages.
Seshata is a streamlined, console-based journal/database program written in Python. Allows for the storage of images as well as text in the SQL database (and can print stored images in the console).
Non-English Natural Language Processing - View on GitHub
A suite of GitHub repositories dedicated to performing natural language processing (NLP) tasks on Russian and French texts. The tutorial portion of the suite, linked above, contains Jupyter notebooks with walkthroughs and sample code for performing analysis on non-English texts using various NLP methods, including Python scripts to clean, analyze, and visualize text. They were designed to accompany a workshop offered by the University of Texas Libraries.
A Python wrapper for the National Library of France's digital library platform, Gallica. My wrapper is featured on the official Gallica site, and can also be found on my GitHub account at the link above.
An easy-to-use, GUI web archiving tool in Python. Pages can be downloaded as files (HTML, CSS, etc) or as WARCs. Images from the pages can also be downloaded.
A variety of Python scripts to assist with searching and downloading full text records via the Europeana APIs, including newspaper records. These scripts allow you to search records in Europeana, parse the JSON returned to obtain various metadata, and download full text if it's available. The code was featured on the Europeana site, and is hosted on my GitHub account at the link above.
A package to assess the complexity of texts using a variety of readability formulas, written in Rust. The crate includes implementations of the Lix, Rix, Flesch, Flesch-Kincaid, Coleman-Liau, and Automated Readability Index methods.
A package to perform stylometry operations on given texts. The crate includes implementations of Mendenhall's graphing of word lengths, the Kilgariff chi-squared algorithm, and an algorithm to find words that occur in only one text out of two being compared (hapax legomena).
A lemmatizing package written in Rust. The crate supports English lemmatization out-of-the-box, but can easily be used with other languages if provided an appropriate lemma list.
A web app for performing OCR on images within your browser using Tesseract.js and client-side Javascript. Upload one or more images, select the language of the document(s), and then view and edit the OCR-ed text in your browser (or save it as a .txt file).
Music generated by moves on a chessboard in pure Javascript. Play your own moves or paste a PGN into the text box to generate a downloadable MIDI representation of the game. Each file of the board is associated with a major key, with each square in the file corresponding to one note in the scale.
A Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts. The app allows for PDF uploads, and will perform OCR on the PDFs and add the text to the database. NLP tasks include sentiment analysis (on individual posts or all posts combined as one text), returning word counts and average word lengths for posts, and generating a word cloud from the posts.
Minimal Computing Sites for European Studies
A suite of minimal computing sites highlighting varied aspects of history related to European Studies:
Discovering SukharevaThis website is about Grunya Efimovna Sukhareva, a pioneering researcher into autism. As the earliest known person to document autism in children, her work has been instrumental to the understanding and study of autism within scientific literature. Includes Sukhareva's pioneering early research articles in Russian and German, bibliographies of additional resources on her life and work, neurodiversity, and the history of autism, and a timeline of key moments in early autism research.
Bataille's JournalsThis site aims to highlight Bataille's editorial work, making the full runs of Acéphale and Documents available as easily browseable and downloadable PDFs. The .txt files from the OCRed text are provided in the hope that they may be of use to anyone interested in performing textual analysis on the journals.
The Négritude MovementThis website collects resources and information on the Négritude movement, an anti-colonial cultural and political movement founded in Paris in the 1930s. The movement, developed mainly by francophone intellectuals, writers, and politicians of the African diaspora, aimed at cultivating and promoting Black art, culture, and consciousness as a form of resistance against colonialism and racism.
The Ukrainian Anarchist MovementThis site aims to serve as a hub for the scholarly and academic study of anarchism in Ukraine, highlighting the history of the movement from its origins in the 19th-century through its expression in the Makhnovist movement of the early 20th-century. Includes an interactive timeline and bibliographies of online and print resources.
Nadezhda Krupskaya OnlineAn online home for English-language resources relating to Nadezhda Krupskaya, Deputy Education Commissar of the Soviet Union from 1929 to 1939, who played a vital role in the development and administration of Soviet librarianship and libraries.
A Raku module to lemmatize strings and lists, with built-in support for 24 languages. The package uses csv files containing predefined lemmas in a two column format, with the lemma on the left and its derivatives on the right. Any similarly formatted .csv can be used to run the code, allowing for easy use of custom lemma lists and additional languages.
A script to detect a text's language using stopwords in Raku. Simply pass a string to the check_string function to receive an output of detected languages and the language(s) with the highest percentage of detected words.
Socialist Pamphlets at UT Austin Exhibit - View Here
I supervised the digitization of rare and unique pamphlets held in the UT Austin collections, then created and wrote the text for the digital exhibit linked above. The exhibit is comprised of three sections, each focusing on pamphlets related to socialism and communism in, respectively, the USSR, France, and the United Kingdom.
I direct a large-scale and ongoing digital scholarship project for archival materials housed in the LBJ Presidential Library. Primary materials related to the Prague Spring and the broader Cold War are being digitized, having metadata created for them, and being made accessible via the UT Libraries’ open access, institutional repository. I created, designed, implemented, and publicized an online portal for the materials in Scalar.
I digitized, created metadata for, and created an Omeka site for a collection of rare and unique Soviet educational materials held by the University of Texas at Austin Libraries, ensuring the preservation and open access to the materials while making them more discoverable.
Digital Humanities Center at the Center for Russian, East European, and Eurasian Studies
I contributed technical, information science, and librarianship expertise to the development and implementation of the digital humanities space at UT Austin’s Center for Russian, East European, and Eurasian Studies. I taught workshops at the space, assisted in cooperation between the space and the UT Libraries, and provided technical and instructional expertise and support.