EVA 2006 - Cruscle search engine

This paper presents the advanced search tecniques implemented for the Accademia della Crusca on-line vocabularies. The five vocabularies represent the origin and evolution of the Italian language during the last centuries. They have been completely transcripted and annotated in a standard XML/TEI format. The transcription has been indexed with a complex structure that allows advanced search queries, including: full text query, context and micro- context queries, case sensitiveness, accented characters, search of word roots and punctuation marks. The search algorithms are based on the open source indexing and search engine Apache Lucene with heavy modifications.

Download PDF