Automated Text Recognition for Ottoman Turkish
Our Handwritten Text Recognition (HTR) project is a cutting-edge initiative at the forefront of Ottoman Turkish transcription. The focus of this project is the application of HTR, an artificial intelligence-driven automatic transcription system, to Ottoman Turkish.
This initiative, currently focusing on HTR with Transkribus, pursues two overarching objectives:
firstly, to enhance the accessibility of Ottoman Turkish historical archives to researchers and the general public; secondly, to contribute to the digital research infrastructure creation for Ottoman Turkish.
Our ongoing work with Transkribus involves the creation of a generalized text recognition model for 19th-century Ottoman Turkish periodicals. As of June 2023, the Character Error Rate (CER) of the most recent HTR model stands at 7.20%, and we are diligently working to improve it. We have recently made this model publicly available on Transkribus, further cementing our commitment to open scholarship and to the digital study of Ottoman Turkish.
As the first HTR model publicly available and designed explicitly for Ottoman Turkish, we envision this project as a foundational infrastructure for digital studies centered around Ottoman Turkish text collections. The model's significance resides in its capacity to improve accessibility and analysis of the extensive Ottoman Turkish materials via automated transcription and sophisticated digital research methodologies.