I am a passionate Data Scientist and Analyst. Currently pursuing an M.S. in Data Science at the University of New Haven, I have hands-on experience in predictive modeling, data visualization, and influencer marketing analytics. I have previous experience working as a Data Scientist, where I have optimized data-driven marketing strategies. I am always eager to learn, collaborate, and contribute to impactful projects.
I am working as a Graduate Research Assistant under Dr. Shivanjali Khare as part of my Provost’s Assistantship. The program requires me to:
Codingal is an online coding class for kids and teens with students from over 70 countries. The position required me to:
Startell is a startup that aims to harness the power of data to maximize influencer impact. Their primary goal is to have impactful connections through data-driven insights. The position required me to:
Canvas Devs provides various IT-related services, from providing white labels to custom build software and web applications. The position required me to:
During the final year of my BSc program in Computer Science and Engineering, I served as a Teaching Assistant at the ECE department. The program required me to:
CGPA: 4.00/4.00
CGPA: 3.69/4.00
The aim of this paper is to predict the interests of social media influencers’ audiences. This paper presents a novel approach to the multi-label classification of interests in six categories: sports and fitness, travel, fashion, electronics, photography, and food. Out of the five classifiers, the Multilayer Perceptron (MLP) achieves the highest accuracy of 98.22% on the single-labeled dataset. Regardless, evaluation solely based on single labels only partially captures the complexities of multilabel classification. The Random Forest Classifier appears the most accurate among glass box models in labeling sentences belonging to one category and also showed promise in categorizing captions belonging to multiple categories. On the other hand, the SVM and KNN classifiers struggle with multi-label classification and often mislabel or fail to capture the labels in complex sentences. Despite lower accuracy on the single-labeled dataset, the Gradient Boosting Classifier demonstrates the most promising performance in labeling complex sentences, whereas, although achieving the highest accuracy for single-labeling, the MLP faces challenges with complex sentences containing multiple categories, indicating the need for a significantly larger dataset to improve its performance.
A data-driven bus efficiency prediction model is a useful tool for transport planners to optimize the current and plan efficient new routes. This study proposes a novel approach for predicting efficiency scores by leveraging non radial DEA model and machine learning (ML) techniques. A labeled dataset is developed using a non-radial DEA method that considers interrelationships between operational and service efficiency and the selected features. Two machine learning models, Linear Regression (LR) and Support Vector Regression (SVR) are trained on the labeled dataset. The trained model can be used to predict the efficiency scores of a new bus routes based on decision-makers’ preferences on input parameters and without requiring a full DEA analysis. The methodology is experimented on CyRide, a real-world dataset provided by the Ames transit Agency. The effectiveness of both models in predicting efficiency is also evaluated using R2, MSE, and residual plots with detailed discussion on exploration analysis of selected features and overall efficiency score. The proposed methodology can be generalized on any bus route dataset and used by transportation authorities for improved decision-making.
Premature babies are the most vulnerable to neonatal mortality, and their birth process is emotionally and physically painful for the mother and the family. The well-being of a premature baby can also be financially burdensome. This paper looks into the maternal factors associated with prematurity. The surveys took place in public maternity hospitals at the Western Brazilian Amazon. This research aims to predict whether a baby will be born prematurely using numerous distinct models. In order to serve this purpose, various machine learning classification algorithms (Decision tree, Naive Bayes, Random Forest, Extreme Gradient Boosting, and K-NearestNeighbors) were applied to the preprocessed data. The paper proposes models capable of predicting premature birth with an accuracy of about 80%. This research aids in developing a usable model that can detect premature births at an early stage, which will allow early treatment to prevent premature birth, substantially reducing child mortality and reducing the economic stress on families bearing a premature child.
Expert in Python, R, and SQL. Skilled in data preprocessing, analysis, and model development to uncover actionable insights.
Experienced in analyzing and visualizing datasets using Tableau, Power BI, Matplotlib, and Seaborn, while effectively presenting findings to clients and stakeholders.
Proficient in applying statistical methods for hypothesis testing, time-series analysis, and data-driven decision-making, supporting clients in strategic planning.
Adept at integrating APIs for dynamic and real-time data processing, ensuring seamless communication and fulfilling client needs.
Specialist in building and automating ETL pipelines with Azure Data Factory and Databricks, designed to align with client workflows.
Skilled in collaborating with clients to understand their goals, gathering requirements, and delivering tailored data-driven solutions that address their challenges.