Портал славістики


[root][dbs]

(Slav)CorpHub

Text corpus of the Church Slavonic prints

The corpus consists of several components that contain overviews and searchable widgets.

KSL-OldPrintsThe database of Church Slavonic prints at the SBB-PK provides an overview of the prints (currently 144 records) and the approx. 1000 pages obtained and verified with Transkribus in Ground Truth quality. The pages with verified full texts can be found under detailed views, which can also be displayed in parallel view "image-text".
Search in GT texts of KSL-OldPrints-digitalThe full-text search covers approx. 1000 pages of the SBB-PK's "Kirchenslavica Digital", which were processed using Transkribus with various (HTR+) models and then verified manually (line by line) up to ground truth quality. The full-text search is performed via a widget that accesses the SOLR-indexed text data. The displayed results are enriched with grammatical information via a PoS wrapper from "UDPipe". Regular expressions (RegEx) can be used to search the index.
Search in RAW texts of KSL-OldPrints-digitalThe full-text search covers approx. 90,000 pages of the SBB-PK's "Kirchenslavica Digital", which have been processed with Transkribus using various (HTR+) models and cleaned up automatically (and without manual checking) by scripting in so-called RAW format. The full-text search is performed via a widget that accesses the SOLR-indexed text data. The displayed results are enriched with grammatical information via a PoS wrapper from "Mystem". The search can be truncated or executed in N-Gram mode.
Kirchenslavica DigitalThe SBB-PK's Church Slavonic prints in digitised form can be viewed and downloaded in high-resolution quality via the SBB Digital Library.

Slavic corpora on the net

List of Slavic dictionaries that can be found online.

CorpSlavCollA collection of Slavic corpora on the net (as an excerpt from the Slavic Studies Guide).