This project utilizes Flask, Swagger, and Connexion to create a REST API for obtaining summaries of news articles from the BBC website. The API endpoints are defined following the OpenAPI specification and leverage a Language Model (LLM) running in the backend to generate the summaries. The full documentation is available using Swagger UI.
These instructions will help you set up the project on your local machine for development and testing purposes.
Make sure you have the following installed on your system:
- Python 3.x
- Pip (Python package installer)
-
Clone the repository:
git clone https://github.com/giovanni-gatti/news-summarizer-api.git
-
Navigate to the project directory:
cd news-summarizer-api
-
(Optional, recommended) Create and activate a virtual environment:
python -m venv venv source venv/bin/activate
-
Install the required dependencies (using pip):
make install
Start the API server with the following command (the entrypoint to the application is located in flaskr/app.py):
make run
The API endpoints are documented using Swagger UI. Once the server is running, you can access the documentation at http://127.0.0.1:8000/api/ui/. The documentation is interactive and allows users to directly play with the endpoints.
The backend of this application is designed to run on both CPUs and GPUs, leveraging the capabilities of the Hugging Face and LangChain libraries. It is specifically built to support Transformer Encoder-Decoder (Seq2Seq) Models. A wide range of such models is available on the Hugging Face Hub, including small models already finetuned on news data such as:
When choosing a model, consider your device specifications, including hardware accelerators and available RAM. To speed up inference on CPUs, the applications supports models also in ONNX format and allows to run inference with the accelerated ONNX Runtime and graph optimizations. To convert and optimize an Hugging Face model to ONNX format, run the following command from a terminal:
optimum-cli export onnx --model model_name --optimize O2 --framework pt --task text2text-generation-with-past local_model_folder
To customize your model selection when running the application, navigate to the .env
file and specify the fields appropriately:
model_path: either the path to a local directory or the identifier name of a pre-trained model on Hugging Face, to be loaded from cache
model_onnx: set to True if the selected model is in ONNX format