Engineering Manager with 10+ years of experience in Software Development. Passionate about AI, data, and fascinated by all things tech
Built my first (terrible) website at 12 and quickly realized front-end development wasn’t for me. After a brief hiatus from web design, I explored hardware and robotics programming during college, but ultimately discovered my true passion in data and AI—and I haven’t looked back since.
- 🌍 I'm based in São Paulo - BR
- ✉️ You can contact me at https://www.linkedin.com/in/arthur-marcal/
- 🎓 MSc in Artificial Intelligence applied to NLP - University of São Paulo (USP)
- 🚀 I'm currently working on Gabriel Money
- 🧠 I'm currently diving into 🦀 Rust Programming to push the boundaries of Serverless technologies. Some might say that I have a crush on functions.
- 🤖 Also, I’ve been exploring multi-agent systems to create automations that simulate collaborative tasks. I enjoy experimenting with frameworks like CrewAI to build workflows where agents interact dynamically to achieve complex objectives
-
📱 Banking Mobile App (2024)
Led the development of a financial mobile app for a Atlanta(US)-based startup, launching it from scratch in just 6 months. The app achieved thousands of downloads and users within its first year. -
🔧 Auto-Parts Catalog Data Pipeline (2023)
Led the first successful project at a San Francisco(US)-based startup by designing and building a data pipeline to normalize sparse data from hundreds of auto-parts manufacturer catalogs. This solution significantly reduced e-commerce customer return rates, improving overall product accuracy and users satisfaction. -
🏦 Fintech Data Strategy (2022)
Managed the data strategy and roadmap at a fintech and real estate startup, driving the company’s growth from early stages to Series A, scaling from 10s to 100s employees. Led key initiatives in Data Science, Data Engineering, BI, and RPA, designing scalable data pipelines and integrating new technologies to support rapid expansion.
Here are a few GitHub projects that highlight my skills and experience:
- Fully Automated Infrastructure and Deploy for ETL Serverless Application
A data engineering project originally built in February 2022 to demonstrate my ability to create scalable data pipelines using AWS services. The project integrates AWS Lambda, AWS RDS, and AWS Wrangler for data ingestion and processing. It also includes automation through Terraform and Makefile to manage the infrastructure and dependencies. Recently, I revisited the project to enhance the deployment process and improve the code organization based on new insights and best practices I've gained. - ReadMeGenie CrewAI
This project leverages the CrewAI framework to build a multi-agent system that reads a GitHub repository, interprets the content of its files, and generates a detailed README.md file automatically.
-
PDF Quality Classifier
A PDF classification tool that provides a simple GUI for reviewing and labeling PDFs as "Good" or "Bad." Users can navigate through documents, classify them, and export the results to a CSV file. Beyond manual classification, the tool serves as a valuable resource for building labeled datasets, which can be used for training machine learning models. Additionally, it can be distributed as a standalone Windows executable for easy sharing with non-technical users. The source code can also be easily modified to support new labels or categories beyond "Good" or "Bad," allowing users to adapt it for various classification needs. -
PDF Pre-Processing Before OCR
This project demonstrates how to convert PDF files to images and apply preprocessing techniques to optimize them for OCR. Using OpenCV, the process includes grayscale conversion, noise removal (via dilation and erosion), Gaussian blurring, and binarization. This ensures better accuracy for OCR engines like Tesseract, making it a crucial step in text recognition workflows.