Skip to content

Latest commit

 

History

History
208 lines (161 loc) · 15.8 KB

README.md

File metadata and controls

208 lines (161 loc) · 15.8 KB

IntelliMeet

Repository for CS 5704 Software Engineering project.

IntelliMeet: AI-Enabled Decentralized Video Conferencing Application.

Introduction:

‘IntelliMeet’ is a peer-to-peer video conferencing application that is decentralized and combines federation learning techniques [3] with machine learning techniques to robustly automate minutes of meetings and analyze participant attentiveness. IntelliMeet enhances user privacy through peer-to-peer federation learning training mechanisms. Meeting attendees multitask while attending remote or virtual meetings, which can negatively affect teamwork and collaboration. IntelliMeet addresses this issue by using machine learning and pose-aware computer vision techniques to identify whether participants are paying attention, thereby encouraging more active participation and reducing the likelihood of multitasking. Furthermore, the application implements a log-mel transformer to automatically convert meeting speech to text in real-time; and GPT-based natural language processing techniques to analyze transcribed meeting notes to generate minutes of meetings via email triggers to meeting participants. This reduces participant workloads and increases productivity. The features of IntelliMeet, along with its architecture efficiency and automation capabilities, will provide a secure and private video conferencing experience that will enhance teamwork and collaboration. With the increase in remote work, IntelliMeet can play a crucial role in helping organizations and individuals achieve their productivity goals.

DEVELOPED by THE FORE-MEMBERS

  • Premith Kumar Chilukuri (VTID: cpremithkumar) (GitHub ID: chpk)
  • Nikhil Narra (VTID: nikhilnarra) (GitHub ID: niknarra)
  • Krishna vamsi Dhulipalla (VTID: kdhulipalla13) (GitHub ID: krishna-creator)
  • Siva sagar Kolachina (VTID: sivasagar) (GitHub ID: siva-sagar)

APPLICATION PIPELINE:

alt text

DESCRIPTION of ML FEATURE 1

The ML feature 1 utilizes facial detection and pose estimation algorithms, which are based on the RetinaFace architecture. The RetinaFace architecture [1] comprises a multi-stage convolutional neural network that uses a multi-task loss function. It has a context module that aids in effectively detecting and localizing faces and facial features within images or videos. To ensure efficient face detection, the multi-task loss function employs four different loss functions. The face detection algorithm in IntelliMeet locates the face within the input by identifying its x and y coordinates, while the pose estimation algorithm analyzes video frames and localized face coordinates to estimate the pose angles of the detected faces in real-time. By analyzing the facial coordinates and pose angles of participants in the video frame, ML feature 1 calculates their attention levels throughout the meeting. After the scheduled meeting concludes, participants receive an automatic attention analysis report via email. alt text

DESCRIPTION of ML FEATURE 2

IntelliMeet uses two modules for speech-to-text transcription and text analysis. The speech-to-text transcription module is modeled after OpenAI's Whisper [2] and adopts a transformer-based CNN-RNN architecture with CTC loss function. This module processes audio sequences into log-mel spectrograms, and utilizes a beam search algorithm for real-time transcription. The speech transcription model used in IntelliMeet has ~78M parameters and takes up 470 MB of memory space. After transcription, the NLTK based text pre-processing algorithm is applied to remove text noises and correct any grammatical errors. The processed text is then sent to the text analysis module to generate the minutes of the meeting (MoM), which includes meeting notes, task reminders, and alerts. Once the MoM is created, it is automatically sent to each participant via email. alt text

USE-CASES IMPLMENTED

Use Case 1:

The camera streams of all participants are directed to the ML Feature 1 pipeline to undergo facial analysis and pose estimation. The facial analysis algorithm identifies each participant, while the pose estimation algorithm analyzes each individual's attention to create a customized attention report based on meeting timestamps. This report details the timestamps where the participant was not attentive and provides information on the meeting context that they may have missed. The personalized meeting attention reports are then emailed to each participant. alt text

Use Case 2 (PARTIALLY IMPLEMENTED):

The audio streams of all participants are directed to the ML Feature 2 pipeline for speech-to-text transcription and subsequent text analysis. The text-to-speech feature in ML Feature 2 transcribes the audio streams of each participant to text based on their participant ID. These transcripts are then processed by a text analysis module to generate a meeting summary. Additionally, text scraping is carried out on the transcripts to generate an email that consolidates all the issues, dependencies, tasks, or events that were discussed during the meeting. alt text

NOTE:

Execution/building IntelliMeet project requires high computational power. Also, as the application is not deployed, it will be bit tricky to run the application, because the server and client codes should have to be executed simultaneously in-order to work properly during data transmission. Moreover, the following are the requirements to run IntelliMeet:

  • 20+ dependencies
  • client-server architecure support
  • CUDA
  • NVIDIA GTX/RTX GPU (with VRAM capacity <= 6GB)
  • greater than Intel i7 9th gen processors for CPU
  • RAM capacity <8 GB
  • Proper Internet Connection.

ENVIRONMENT SETUP:

DEPENDENCIES INSTALLATION (Python Packages):

Install the following python libraries that are required to run the application (we are implementing the application in python3)

# [important note:] INSTALL PyTORCH library (installation process will be difffrent for every OS and GPU support) by following the documentationof in this link "https://pytorch.org/get-started/locally/"

# install json library
pip install json # if this command dosent work try the below one
pip3 install json

# install argparse library
pip install argparse # if this command dosent work try the below one
pip3 install argparse

# install numpy library
pip install numpy
pip3 install numpy

# install OpenCV library
pip install opencv-python # if this command dosent work try the below one
pip3 install opencv-python

# install Sockets library
pip install sockets # if this command dosent work try the below one
pip3 install sockets

# install struct library
pip install supyr-struct # if this command dosent work try the below one
pip3 install supyr-struct

# install imutils library
pip install imutils # if this command dosent work try the below one
pip3 install imutils

# install time library
pip install python-time # if this command dosent work try the below one
pip3 install python-time

# Manually install portaudio library

# install pyaudio library
pip install PyAudio # if this command dosent work try the below one
pip3 install PyAudio

# install wave library
pip install Wave # if this command dosent work try the below one
pip3 install Wave
pip install PyWave # if this command dosent work try the below one
pip3 install PyWave

# install PIP library
pip install --upgrade Pillow # if this command dosent work try the below one
pip3 install --upgrade Pillow

install whisper library
pip install -U openai-whisper # if this command dosent work try the below one
pip3 install -U openai-whisper
pip install setuptools-rust # if this command dosent work try the below one
pip install setuptools-rust

# install transformers library
pip install transformers # if this command dosent work try the below one
pip3 install transformers

# install matplotlib library
pip install matplotlib  # if this command dosent work try the below one
pip3 install matplotlib

# install smtplib library
pip install secure-smtplib  # if this command dosent work try the below one
pip3 install secure-smtplib

# install mime library
pip install mime  # if this command dosent work try the below one
pip3 install mime

# install email library
pip install email  # if this command dosent work try the below one
pip3 install email

# install flask library for UI
pip install Flask  # if this command dosent work try the below one
pip3 install Flask

# install tKinter library for UI
pip install tk  # if this command dosent work try the below one
pip3 install tk

# install dotenv library for UI
pip install python-dotenv  # if this command dosent work try the below one
pip3 install python-dotenv

STEPS TO RUN THE APPLICATION

How to send meeting invites using the meeting registration page

  • Clone the repository using https://github.com/niknarra/SE-Project---IntelliMeet.git
  • Navigate to ./front-end/ directory.
  • run the app.py using python app.py command (if that didn't work, try this) python3 app.py command
  • open a browser and then enter the following [url] (http://localhost:5000/)
  • enter the meeting details (ADD a total of 2 participants only)
  • click on "send invite" button, an email containing "form_data.json" will be mailed to you.
  • place the "form_data.json" in SE-Project---IntelliMeet repository.
  • Further details regarding the meeting registration page can be found [here] (https://github.com/niknarra/SE-Project---IntelliMeet/tree/main/front-end)

Steps to run the application at server-side (i.e computational HUB, note this device should have high computational power.)

  • Clone the repository using https://github.com/niknarra/SE-Project---IntelliMeet.git (Ignore if already done)
  • Navigate to SE-Project---IntelliMeet repository using cd ./SE-Project---IntelliMeet/
  • Download the latest "form_data.json" meeting invite from your email
  • Make sure your have the latest "form_data.json" file locally in the SE-Project---IntelliMeet repository
  • Download the ML feature 1 (facial analysis) model weights by following the instructions specified in this [link] (https://github.com/niknarra/SE-Project---IntelliMeet/tree/main/weights)
  • place the downloaded weight files in the ./weights/ directory itself.
  • Make sure if your have "audio_results.json", "error.jpg", "face_detection_results.json", "form_data.json", "graph_plotting.py", "meeting_notes.txt", "mic_video_buttons.jpg", "video_face_detect.py" files locally in the ./SE-Project---IntelliMeet/ directory.
  • Now, run the "Modified_UI_computational_video_audio_server.py" file using python Modified_UI_computational_video_audio_server.py command, (if that didn't work, try this) python Modified_UI_computational_video_audio_server.py command
  • Now wait till the meeting UI pops up, once its pops up start your meeting with the other participant (i.e. the person on the client side (steps on how to run the application at client side is provided below))
  • close the UI by clicking the cross button (located on top-right), and then stop the code execution in terminal to completely stop the execution of IntelliMeet.

Steps to run the application at server-side (i.e computational HUB, note this device should have high computational power.)

  • Clone the repository using https://github.com/niknarra/SE-Project---IntelliMeet.git (Ignore if already done)
  • Navigate to SE-Project---IntelliMeet repository using cd ./SE-Project---IntelliMeet/
  • Download the latest "form_data.json" meeting invite from your email
  • Make sure your have the latest "form_data.json" file locally in the SE-Project---IntelliMeet repository
  • Now, run the "UI_Video_Audio_client_transmisson.py" file using python UI_Video_Audio_client_transmisson.py command, (if that didn't work, try this) python UI_Video_Audio_client_transmisson.py command
  • Now wait till the meeting UI pops up, once its pops up start your meeting with the other participant (i.e. the person on the server side (steps on how to run the application at siver side is provided above))
  • close the UI by clicking the cross button (located on top-right), and then stop the code execution in terminal to completely stop the execution of IntelliMeet.

APPLICATION TESTING DETAILS

Unit Testing

Black-Box testing

FOR INFORMATION ON FACIAL ANALYSIS MODULE

FOR INFORMATION ON SPEECH-TO-TEXT TRANSCRIPTION MODULE

REFERENCES

  • [1] J Deng, J Guo, E Ververas, I Kotsia, and S Zafeiriou. 2020. RetinaFace: Single- Shot Multi-Level Face Localisation in the Wild. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5202–5211.
  • [2] Alec Radford, Jong Wook Kim, et al. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. In Arxiv. https://doi.org/10.48550/arXiv.2212. 04356Focustolearnmore.
  • [3] Yang, Qiang, Liu, et al. 2019. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–19. 10.1145/3298981

Issue Tracking and Project Planning

Link for Project DEMO