Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the default values of :func:~.reduce_repeated_substring parameters for general use #20

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mohamadmansourX
Copy link

Since this Repo is meant to be focusing on Arabic language, I'd like to suggest to change the minimum repeated character filter from 3 to 4 since in Arabic language 3 consecutive occurrences of the same character do exists.

(Note even with the use of Shadda, there will still exist three consecutive occurrences)

Check the below two examples:

Since this repo is meant to be focusing on Arabic language, I'd suggest the change the minimum repeated character from 3 to 4 since in Arabic 3 consecutive occurrences of the same character do exists.

Check the below two examples:
- تشتتت  https://www.almaany.com/ar/dict/ar-ar/%D8%AA%D8%B4%D8%AA%D8%AA%D8%AA/
- تتتابع     https://www.maajim.com/dictionary/%D8%AA%D8%AA%D8%AA%D8%A7%D8%A8%D8%B9
@mohammad-albarham
Copy link
Collaborator

mohammad-albarham commented Sep 20, 2021

@mohamadmansourX

Thanks for contributing to Maha.

  1. You are right, this library focuses mainly on the Arabic language and we can have words like these (تتتابع,تشتتت, تَشَتَّتَت).

I support that to keep like these words without changing.

  1. You have failed tests on your pull request, please follow the guidelines for contribution as stated in the documentation Contributing.

Thanks again for your contribution!

@TRoboto TRoboto added the enhancement General improvements and additions label Sep 21, 2021
@TRoboto
Copy link
Owner

TRoboto commented Sep 21, 2021

Thank you for pointing this out. I am in favor of changing the default value of min_repeated and reduce_to to 4 and 3 respectively as it will allow for general use. However, this should not take place without notifying the community that the default behavior has changed. For now, I would suggest to leave this PR open for a while so that more from the community can give their opinion on this change.

In the meanwhile, please fix the tests.

@TRoboto TRoboto changed the title Update remove_fn.py: Arabic special case Change the default values of :func:~.reduce_repeated_substring parameters for general use Sep 26, 2021
@TRoboto TRoboto force-pushed the main branch 23 times, most recently from 0db1663 to ca5be90 Compare November 9, 2021 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement General improvements and additions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants