Skip to content

Commit

Permalink
turkish: Remove proper noun suffixes
Browse files Browse the repository at this point in the history
In modern Turkish orthography, an apostrophe is used to separate proper
names from any suffixes, so before we do anything else we now truncate
at the first apostrophe.

Fixes #188
  • Loading branch information
ojwb committed Oct 12, 2024
1 parent 9a81187 commit 94880d9
Showing 1 changed file with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion algorithms/turkish.sbl
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
/* Stemmer for Turkish
* author: Evren (Kapusuz) Çilden
* email: evren.kapusuz at gmail.com
* version: 1.0 (15.01.2007)
*
* stems nominal verb suffixes
* stems nominal inflections
Expand Down Expand Up @@ -69,6 +68,8 @@ routines (
post_process_last_consonants
postlude

remove_proper_noun_suffix

stem_nominal_verb_suffixes
stem_noun_suffixes
stem_suffix_chain_before_ki
Expand Down Expand Up @@ -439,6 +440,14 @@ backwardmode (
)
)

define remove_proper_noun_suffix as (
// https://en.wikipedia.org/wiki/Turkish_language says "In modern
// Turkish orthography, an apostrophe is used to separate proper names
// from any suffixes" with the example "Türkiye'dir ("it is Turkey")".
// Therefore we truncate at the first apostrophe.
do (goto '{'}' [ tolimit ] delete)
)

// Test if there is more than one syllable.
// In Turkish each vowel indicates a distinct syllable.
define more_than_one_syllable_word as (
Expand All @@ -454,6 +463,8 @@ define postlude as (
)

define stem as (
do remove_proper_noun_suffix

more_than_one_syllable_word

backwards (
Expand Down

0 comments on commit 94880d9

Please sign in to comment.