Project Restructuring and Modularity Improvements #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Project Restructuring and Modularity Improvements
Overview
This pull request implements a significant restructuring of the CATT (Character-based Arabic Tashkeel Transformer) project to improve modularity, maintainability, and ease of use. The changes focus on reorganizing the codebase, introducing a new API module, and standardizing the project structure.
Key Changes
1. Project Structure Reorganization
catt/
packageapi/
directory for API-related functionalityconfigs/
,dataset/
,docs/
,models/
,scripts/
, andtests/
directories for better organization2. API Implementation
api/
module with FastAPI integrationmain.py
,models.py
, andcatt_service.py
for API functionality/tashkeel
endpoint for diacritization requests3. Modularity Improvements
model_types.py
for better model type managementutils/
submodule4. Configuration Management
configs/
directory with separate configuration files for Encoder-Decoder and Encoder-Only modelsSample_config.yaml
for easier customization5. Dependency Management
pyproject.toml
andpoetry.lock
for better dependency management using Poetry6. Documentation and Testing
docs/
directory for future documentationtests/
directory for unit tests (to be implemented)7. Simplified Prediction and Training Scripts
predict_ed.py
andpredict_eo.py
into a singlepredict_catt.py
train_catt.py
for improved clarity and consistencyBefore and After Structure Comparison
Before
After
Benefits
These changes lay the groundwork for easier collaboration, maintenance, and future improvements to the CATT project.