Machine Learning junior MLOps engineer junior
More About Me My Portfolio My Text Generation Topic Modeling Text SummarizationYoung AI engineering graduate with a strong passion for machine learning, computer vision and data science. Motivated to work on innovative projects and solve complex problems using advanced AI techniques.
Young graduate.
I am a young and skilled data scientist with a background in training and optimizing NLP, LLM, and CNN models. My expertise lies in fine-tuning hyperparameters to achieve superior model performance. With a fresh perspective and a commitment to innovation, I am poised to make a significant impact in the field of data science.
As a recent graduate in data science, I gained practical experience in applying data analysis and statistical modeling techniques to solve real problems in a professional context.
Freelance
2024/03 - present
• Context: Led a software development project for a small company in the construction supplies sector, tasked with extracting product data from supplier PDFs to generate a comprehensive Excel dataset.
• Technical Expertise: Use of Retrieval-Augmented Generation (RAG), employing advanced LLM models such as ChatGPT and Mistral, to extract and accurately transform complex data from various PDF formats into structured Excel sheets.
• GPT-4 | 3.5-turbo - Pricing - OpenAI
• Mistral - Pricing
• Software solution: Development of "PDFToExcel", a tailor-made application designed to streamline the process of converting data and creating datasets to improve the website of the company.
• Agile Methodology: Collaborate closely with the Manuquip team using Agile practices, facilitating rapid development cycles, continuous feedback and iterative improvements to meet evolving project requirements effectively.
• Results: Successfully delivered a high-quality, user-friendly software solution that significantly improved the efficiency of data extraction and processing, enhancing the company's operational performance and customer service.
Project
2023 - present
• Transformer - Encoder-Decoder model training up to 200 epochs on "XSum" dataset from scratch. Notebook
• Fine-Tuning Facebook/Llama-2-7b model with Ludwig packages (Failed to deploy due to multiple errors in packages between Transformers and Ludwig). Notebook
• Fine-Tuning Google/mT5 model with HuggingFace packages Notebook | HuggingFace
• Use of a cloud service: https://vast.ai (NVIDIA GPU - 4090 RTX)
Summary: " machine learning is a branch of artificial intelligence that focuses on the development of computer programs that can access data and use it to learn for themselves. ... "
project
2023
• Transformer-decoder model training up to 255k epochs on a French corpus (10 GB). Notebook
• Optimization of hyper-parameters, corpus cleaning...
• Use of a cloud service: https://vast.ai (NVIDIA GPU)
• Implementation of several models: tokenizer by character (10M parameters) and tiktoken (50M and 119M parameters)
Text generated: " Sous un soleil, les magasins de fer français débordent d'une multitude de fleurs mais sans encombre, "la France nourrit une résistance passive à plusieurs tirs d'explosifs". ... "
End-of-studies internship
July 2022 - December 2022
6 months
inagua.ch
• Development of an educational chatbot web application (Angular Ng).
• Generation of MCQs from any topic on Wikipedia (Wikidata).
• "Topic modeling to highlight the most relevant topics in a text.
• Extractive and abstract text summarization.
• Use of spaCy, Transformer-HuggingFace and Bert model libraries.
• Deployment on Heroku, then GCP. Use of Kubernetes, Docker.
• ML pipeline prototyping with Kubeflow.
Graduation project
2021 - 2022
• Development of a CNN neural network (ResNet, Xception...) to recognize the user's identity through palm veins.
• Deployment of the model with Flask, Docker and Keras.
What I learned:
• Design an ML production system end-to-end: project scoping, data needs, modeling
strategies, and deployment requirements
• Establish a model baseline, address concept drift, and prototype how to develop, deploy,
and continuously improve a productionized ML application
• Build data pipelines by gathering, cleaning, and validating datasets
• Implement feature engineering, transformation, and selection with TensorFlow Extended
• Establish data lifecycle by leveraging data lineage and provenance metadata tools and
follow data evolution with enterprise data schemas
• Apply techniques to manage modeling resources and best serve offline/online inference
requests
• Use analytics to address model fairness, explainability issues, and mitigate bottlenecks
• Deliver deployment pipelines for model serving that require different infrastructures
• Apply best practices and progressive delivery techniques to maintain a continuously
operating production system
coursera.org | DeepLearning.AI & Stanford online
2021 | 6 months
What I Learned:
• Build and train deep neural networks, identify key architecture parameters, implement
vectorized neural networks and deep learning to applications
• Train test sets, analyze variance for DL applications, use standard techniques and
optimization algorithms, and build neural networks in TensorFlow
• Build a CNN and apply it to detection and recognition tasks, use neural style transfer to
generate art, and apply algorithms to image and video data
• Build and train RNNs, work with NLP and Word Embeddings, and use HuggingFace tokenizers
and transformer models to perform NER and Question Answering
Artificial intelligence major
Graduated in 2023
I followed a five-year educational path, including three years of general engineering to acquire a solid foundation, followed by two years of specialization in artificial intelligence where I deepened my knowledge in this constantly evolving field. My background has enabled me to develop solid technical expertise as well as an in-depth understanding of the key concepts of artificial intelligence.
September 2020 - April 2021
Programmation Control and Instrumentation.
2017
I've worked on exciting artificial intelligence projects ranging from advanced text generation to vision-based identity recognition, using techniques such as natural language processing and computer vision to solve complex problems and provide innovative solutions.
As a recent graduate junior data scientist specializing in deep learning and MLOps, I can contribute by developing high-performance deep learning models and setting up MLOps pipelines to ensure the efficient production and maintenance of these models in an operational environment.
As a recent graduate specializing in machine learning, I am able to actively participate in case studies using machine learning techniques to solve complex problems, proposing models and solutions tailored to specific business needs. I'm keen to put my skills into practice and contribute to real-world projects using machine learning.
Thanks to my NLP skills, I'm able to develop advanced natural language processing models, such as text classification, feature extraction and text generation, to extract valuable information from text data and provide effective solutions in various fields. I'm passionate about harnessing the power of NLP to solve real-world problems and improve human-computer interactions.
Thanks to my skills in convolutional neural networks (CNN), I have the ability to develop high-performance model architectures for computer vision, performing tasks such as object detection, image classification and semantic segmentation, to extract meaningful visual information and provide accurate solutions in the field of image analysis and visual recognition. I'm passionate about using CNN to solve complex problems and push back the boundaries of computer vision.
Thanks to my (little) skills in MLOps, I'm able to set up machine learning pipelines, automate model deployment, monitor performance and guarantee the stability of production systems, enabling the smooth integration of machine learning into operational workflows. I'm passionate about continuously improving model deployment processes and creating scalable, reliable environments to support the lifecycle of machine learning projects.
Thanks to my in-depth skills in Python, I'm able to develop robust and efficient solutions, using the most popular libraries and frameworks, to solve complex problems and automate tasks related to data analysis, machine learning and application development. I'm passionate about the simplicity, readability and power of Python, which enables me to deliver high-quality solutions in a wide range of fields.
I'd love to hear from you and discuss how my skills and expertise in data science can contribute to your projects and drive meaningful insights and solutions.
Paris
île-de-france
75018 - FR
kenan@gonnot.net
Phone: (+33) 6 ** ** ** **