Ahmed Omar

About Me

Full Name:Ahmed Omar Salim Adnan
Phone:+ 203 640 3669
Email:aos.adnan98@gmail.com
Address:89 Woolsey St, West Haven, Connecticut CT 06513

Hello There!

I am a passionate Data Scientist and Analyst. Currently pursuing an M.S. in Data Science at the University of New Haven, I have hands-on experience in predictive modeling, data visualization, and influencer marketing analytics. I have previous experience working as a Data Scientist, where I have optimized data-driven marketing strategies. I am always eager to learn, collaborate, and contribute to impactful projects.

My Resume

Work Experience
Graduate Research Assistant
University of New Haven - Aug 2024 - Current
I am working as a Graduate Research Assistant under Dr. Shivanjali Khare as part of my Provost’s Assistantship. The program requires me to:
- Create the official website for the lab.
- Create a video game for senior citizens to raise awareness regarding cyber scams and integrate data analysis to it.
- Assisting with research projects by performing experiments, collecting data, and analyzing results.
- Reviewing and summarizing existing research literature to support the research project.
- Preparing reports, research papers, and presentations based on the findings.
Senior Coding Instructor
Codingal - Sep 2023 - Aug 2024
Codingal is an online coding class for kids and teens with students from over 70 countries. The position required me to:
- Conduct trial classes of students to encourage them to enroll.
- Conduct group classes (3-4 students) as well as 1 to 1 class.
- Assist in doing class activities and check class projects done by the students.
- Provide valuable feedback after each class and after checking each project.
Data Scientist
Startell - Jul 2023 - Oct 2023
Startell is a startup that aims to harness the power of data to maximize influencer impact. Their primary goal is to have impactful connections through data-driven insights. The position required me to:
- Conduct in-depth data analysis, employing advanced statistical techniques and data mining methodologies.
- Collaborate with cross-functional teams to understand business objectives and define data requirements for research projects.
- Develop and implement cutting-edge models, leveraging machine learning and predictive analytics to uncover hidden patterns and make accurate forecasts.
- Evaluate and refine existing models, utilizing performance metrics and feedback to enhance their efficiency and effectiveness.
- Work closely with stakeholders to translate data findings into clear, concise, and actionable recommendations.
Web Developer
Canvas Developers - May 2023 - Jul 2023
Canvas Devs provides various IT-related services, from providing white labels to custom build software and web applications. The position required me to:
- Conduct meetings with clients abroad to understand their requirements and suggest plans accordingly.
- Work in both front-end and back-end in projects while collaborating with cross-functional teams.
- Create the new official website of the company with the latest technology.
Undergraduate Teaching Assistant
North South University - Jan 2022 - Jan 2023
During the final year of my BSc program in Computer Science and Engineering, I served as a Teaching Assistant at the ECE department. The program required me to:
- Conduct one-on-one teaching sessions with students during office hours.
- Review and grade quizzes and assignments to provide constructive feedback.
- Assist students in understanding project guidelines and guide them throughout the course project duration.
- Invigilate midterms and finals, ensuring a fair and secure testing environment while upholding academic integrity.

Education
M.S. in Data Science
University of New Haven - Aug 2024 - Present

CGPA: 4.00/4.00
BSc. in Computer Science and Engineering
North South University - Oct 2018 - Sep 2022

CGPA: 3.69/4.00

Publications
Predicting Audience Interests from Social Media Captions Using a Semi-Supervised Approach (1st Author)
IEEE Xplore - Nov 2024

The aim of this paper is to predict the interests of social media influencers’ audiences. This paper presents a novel approach to the multi-label classification of interests in six categories: sports and fitness, travel, fashion, electronics, photography, and food. Out of the five classifiers, the Multilayer Perceptron (MLP) achieves the highest accuracy of 98.22% on the single-labeled dataset. Regardless, evaluation solely based on single labels only partially captures the complexities of multilabel classification. The Random Forest Classifier appears the most accurate among glass box models in labeling sentences belonging to one category and also showed promise in categorizing captions belonging to multiple categories. On the other hand, the SVM and KNN classifiers struggle with multi-label classification and often mislabel or fail to capture the labels in complex sentences. Despite lower accuracy on the single-labeled dataset, the Gradient Boosting Classifier demonstrates the most promising performance in labeling complex sentences, whereas, although achieving the highest accuracy for single-labeling, the MLP faces challenges with complex sentences containing multiple categories, indicating the need for a significantly larger dataset to improve its performance.
Exploring the use of Machine-learning and Non-Radial DEA for Transit efficiency score prediction (3rd Author)
IEEE Xplore - Nov 2024

A data-driven bus efficiency prediction model is a useful tool for transport planners to optimize the current and plan efficient new routes. This study proposes a novel approach for predicting efficiency scores by leveraging non radial DEA model and machine learning (ML) techniques. A labeled dataset is developed using a non-radial DEA method that considers interrelationships between operational and service efficiency and the selected features. Two machine learning models, Linear Regression (LR) and Support Vector Regression (SVR) are trained on the labeled dataset. The trained model can be used to predict the efficiency scores of a new bus routes based on decision-makers’ preferences on input parameters and without requiring a full DEA analysis. The methodology is experimented on CyRide, a real-world dataset provided by the Ames transit Agency. The effectiveness of both models in predicting efficiency is also evaluated using R2, MSE, and residual plots with detailed discussion on exploration analysis of selected features and overall efficiency score. The proposed methodology can be generalized on any bus route dataset and used by transportation authorities for improved decision-making.
Premature Birth Prediction Using Machine Learning Techniques (3rd Author)
Springer - Jul 2022

Premature babies are the most vulnerable to neonatal mortality, and their birth process is emotionally and physically painful for the mother and the family. The well-being of a premature baby can also be financially burdensome. This paper looks into the maternal factors associated with prematurity. The surveys took place in public maternity hospitals at the Western Brazilian Amazon. This research aims to predict whether a baby will be born prematurely using numerous distinct models. In order to serve this purpose, various machine learning classification algorithms (Decision tree, Naive Bayes, Random Forest, Extreme Gradient Boosting, and K-NearestNeighbors) were applied to the preprocessed data. The paper proposes models capable of predicting premature birth with an accuracy of about 80%. This research aids in developing a usable model that can detect premature births at an early stage, which will allow early treatment to prevent premature birth, substantially reducing child mortality and reducing the economic stress on families bearing a premature child.

My Services

Data Science

Expert in Python, R, and SQL. Skilled in data preprocessing, analysis, and model development to uncover actionable insights.

Data Analytics

Experienced in analyzing and visualizing datasets using Tableau, Power BI, Matplotlib, and Seaborn, while effectively presenting findings to clients and stakeholders.

Statistical Analysis

Proficient in applying statistical methods for hypothesis testing, time-series analysis, and data-driven decision-making, supporting clients in strategic planning.

API Integration

Adept at integrating APIs for dynamic and real-time data processing, ensuring seamless communication and fulfilling client needs.

ETL Processes

Specialist in building and automating ETL pipelines with Azure Data Factory and Databricks, designed to align with client workflows.

Consultation

Skilled in collaborating with clients to understand their goals, gathering requirements, and delivering tailored data-driven solutions that address their challenges.

Skills

Python95%

R90%

SQL85%

JavaScript80%

Java75%

C++70%

Projects

Credit Card Transactions Fraud Detection: Machine learning-based system to detect fraudulent credit card transactions using advanced data analysis and real-time prediction.

Generated insightful data from the dataset and preprocessed as per requirement.
Performed time-based, demographic-based and context-based analysis.
Visualized the data in various forms of charts.
Created a web-app using streamlit to deploy the dashboard.
Trained Gradient Boosting Model that achieved 70% prediction accuracy.
Integrated the model with the streamlit app for live prediction.

GitHub
Live

Pathchola: A SQL-based database for managing and exploring Dhaka's bus services, including routes, companies, and fare details

Designed and deployed a scalable database schema using Azure SQL Database to manage data on bus companies, routes, stops, and fares.
Automated data ingestion and preprocessing using Azure Data Factory to ensure an accurate and up-to-date dataset.
Visualized key insights on bus routes, fare structures, and service trends using interactive dashboards created with Azure Power BI.
Developed and deployed a web application using Azure App Service to enable users to query and explore Dhaka's bus services.
Secured database operations with Azure Key Vault and enhanced performance and scalability through Elastic Pools.

GitHub
Live

Bangla PoS Tagging with no Bangla Training Data: Bangla Part-of-Speech (PoS) tagging system using transfer learning and multilingual resources without requiring Bangla-specific training data

Designed a pipeline leveraging pre-trained multilingual language models to handle Bangla text without requiring Bangla-specific annotated datasets.
Preprocessed Bangla text data, including tokenization, stemming, and lemmatization, using multilingual NLP libraries.
Evaluated the system using cross-lingual benchmarks, achieving about 60% tagging accuracy.
Fine-tuned the multilingual model with cross-lingual embeddings to predict PoS tags for Bangla text.

GitHub
Live

Data Scientist & Analyst

About Me

Hello There!

My Resume

Work Experience

Graduate Research Assistant

Senior Coding Instructor

Data Scientist

Web Developer

Undergraduate Teaching Assistant

Education

M.S. in Data Science

BSc. in Computer Science and Engineering

Publications

Predicting Audience Interests from Social Media Captions Using a Semi-Supervised Approach (1st Author)

Exploring the use of Machine-learning and Non-Radial DEA for Transit efficiency score prediction (3rd Author)

Premature Birth Prediction Using Machine Learning Techniques (3rd Author)