Skip to content

LabAsim/scrape_tpp_gui

Repository files navigation

ThePressProject Scraper GUI

Description

This is my first project in order to learn Python. I have built it to access the news faster and in a more aggregated way than just reading the site. It scrapes the news categories of thepressproject.gr site.

Any contributions or ideas are welcome!

It has been tested in Python 3.10 and Windows 10.

Important Note: Unfortunately, the updater.exe inside the one-file executable scraper_tpp_gui.exe is flagged as a threat by Avira antivirus, although it does not contain, of course, any malware.

Table of Contents

Dependencies

Usage

The usage is pretty straightforward.

The GUI automatically loads all the news titles and their date. The user can renew the titles through menu>renew titles.

If no news was loaded, try to renew titles via menu>renew titles(bypass). It requires Chromedriver and Chrome in order to bypass Cloudflare bot protection. A Chrome window will be launched off the screen to access the news (headless mode gets detected by Cloudflare).

There are 8 themes.

The default theme is Azure dark. If the user clicks again on the azure theme, it will switch to Azure light and vice versa.

The GUI:

alt text

Convert to executable

The script can be converted to an .exe by running in your terminal:

cd {path/to/scrape_tpp_gui_folder}
py scrape_tpp_gui_pyinstaller.py 

You should also convert updater.py to updater.exe to use check for updates command in the menu. Currently, auto-updating does not work as a py script.

cd {path/to/updater_folder}
py pyinstaller_updater.py

TODO

  • SQLite Database

    • Save to db option in menu

    • Periodically autosave to db

    • Let the user choose the frequency of autosaving

    • Let the user whether to autosave or not

    • Create a toplevel window containing all the news from the database

      • Add search option based on date (Greek format DD-MM-YY)

      • Add advanced search (author, category, date)

  • Code refactoring

    • Move the files from ./classes to ./source/classes dir & consequently, fix the paths for the rest of the code
    • Move images to source

Credits

Thanks to all the 3rd party packages maintainers and the StackOverflow users.

Donate

Do not forget to donate monthly to ThePressProject team. Recurrent monthly donation/funding is the only way for a truly independent journalism to exist.

License

ThePressProject Trademark, name and all of its content belong to the ThePressProject team. The 3rd party packages have their own licenses. All the code written by me is released under the MIT license.