About Me

Hello, I'm Truc!
I'm a Computer Science student with a strong passion for data
and AI. My experience spans data analytics and machine learning through academic
projects and internships. I've worked with key AI domains including Natural Language
Processing
(NLP), chatbot development, and computer vision.
Proficient in SQL, Python, Google Sheets, Microsoft Excel, statistics, and Power
BI I specialize in transforming raw data into meaningful insights
through
end-to-end analysis and interactive visualizations. I'm eager to contribute to
impactful,
data-driven solutions in dynamic and collaborative environments.
Technical Skills
Programming
Languages
- SQL
- Python
Data Analytic
Tools
-
Microsoft Excel
- Google Sheets
- Google Collab
Data Visualization
Tools
- Power BI
Data Analytic
Methods
- EDA
- Segmentation / Clustering
- Cohort
- Linear Regression
- Logistic Regression
- Statistic
- A/B Testing
- ANOVA
- (Post-Hoc) T-Test
Education & Experience Background
EDUCATION

Bachelor of Computer Science
GPA: 8.34/10.00
Ho Chi Minh City, Vietnam
Sep 2022 - Sep 2025(Expected)
Data Analyst
Ho Chi Minh City, Vietnam
Aug 2023 - May 2024
EXPERIENCE

Data Analyst Intern
Remote
Feb 2025 - May 2025
Projects
Data Science

1 USER RETENTION & LOYALTY PREDICTION IN E-COMMERCE
This project focuses on identifying and analyzing customer behavior patterns to
improve retention and loyalty strategies within an e-commerce platform. The key
customer segment for the platform consists of married middle-aged women
(31-45 years old)
, who exhibit consistent purchasing power and a unique loyalty pattern—they remain
loyal
to product categories rather than specific items, creating an ideal opportunity for
cross-selling.
This purchasing behavior is strongly driven by major campaigns like 11/11, during
which even
previously churned customers return. Additionally, shopping activity peaks at the end of
the
month and on weekdays.
However, a critical challenge has been identified—a bottleneck in the purchase
journey:
while users frequently click to explore, the conversion rate from "add to cart" to
actual
purchases remains low. This suggests that the platform needs to optimize the
shopping
experience and build trust to encourage customers to complete their transactions.
- Tools: Python(Kaggle), SQLite
- Analysis: Exploratory Data Analysis (EDA), Customer Segmentation, RFM and Time
Series.

2 DATA SCIENCE SALARY PREDICTION
The primary objective of this project is to analyze the 2024 data science job market.
The
analysis reveals an average salary range of $74,700 to $128,200, with the
Data Scientist
role commanding the highest compensation. Foundational skills include Python
and SQL are essential, while
AWS and Spark are in high demand for technical positions. Job opportunities
are primarily
concentrated in high-cost-of-living states like California, Massachusetts, and New
York.
Experience level is a key determinant of salary, with Senior and Principal
roles earning the
most.
Beyond this market overview, a secondary objective was to train machine learning
models
(including XGBoost, Random Forest, SVR, and Linear Regression) to predict salaries.
This
provides a valuable tool for both companies and candidates to benchmark
compensation. The
XGBoost model achieved the highest accuracy with an R² score of 82.86%,
enabling precise
salary predictions based on selected factors like location, skills, and experience.
- Tools: Python(Kaggle), Excel, Power BI
- Techniques: Data Preprocessing, EDA, Feature Engineering, Machine Learning
Modeling, Market Analysis.

3 LUNG CANCER PREDICTION
This project began with an in-depth analysis of lung cancer incidence, comparing
high- and
low-pollution groups to identify key risk factors from hospital survey data.
Through
Exploratory Data Analysis (EDA), 12 out of 20 significant contributors were
identified. To
further understand their impact, patients were categorized into low, medium, and
high-risk groups based on factor severity. This segmentation enabled evaluation of
the most critical
drivers, such as air pollution, smoking, genetic risk, and chronic
lung disease.
Leveraging these insights, multiple machine learning models were developed and tested
to
predict lung cancer risk, including Logistic Regression, Decision Tree, and
KNN. The Random
Forest model emerged as the top performer, achieving an outstanding accuracy of
99%. Based
on these robust findings, actionable recommendations were provided to promote respiratory
health, including minimizing pollution exposure, improving indoor air quality,
and
advocating for early screening in high-risk individuals.
- Tools: Python
- Techniques: Risk Segmentation, EDA, Feature Engineering, Machine Learning
Modeling.

4 CUSTOMER CHURN ANALYSIS
The goal of this analysis is to deeply investigate the root causes of customer
churn in the telecommunications sector and develop a strategic framework to reduce
churn and
enhance customer retention.
The analysis precisely identifies the highest-risk customer segments: over 55% of
users on
month-to-month contracts are likely to churn, alongside senior citizens.
Regarding services,
the report highlights a critical weakness, with over 44% of Fiber Optic internet
users
showing a tendency to leave, while high-spending customers (>$70) also record a
significant
churn rate. The analysis also identifies a high-potential yet volatile segment of new
customers (0-20 months tenure), who constitute 43% of the customer base and
have a 9.8%
churn rate.
The core significance of this project extends beyond problem diagnosis; it provides a clear,
data-driven action plan. The proposed recommendations such as enhancing incentives
for
long-term contracts to target the 55% high-risk group or improving service
value to retain
high-spending (>$70) customers—are designed to directly address these quantified issues.
- Tools: Power BI
- Techniques: Customer Segmentation, EDA, Retention Strategy Development.

5 PIZZA SALES ANALYSIS
This analysis provides a comprehensive roadmap for Plato's Pizza to optimize revenue and
operational efficiency based on sales data. Temporally, revenue peaks in Q3
(June-August),
with July recording the highest sales at $72,557, reflecting a surge in demand
during the
peak tourist season. The analysis also pinpoints golden hours during lunchtime
(12-2 PM) and
on weekends, necessitating more effective staff allocation. In terms of product mix,
Traditional Pizza is the top seller (42 units/day), driven by its
affordability, while sizes
L and XL dominate sales volume. Conversely, the XXL size sold a mere
28 units annually,
indicating an inefficient product that should be discontinued to reduce inventory
costs.
Based on these insights, strategic recommendations include focusing resources on peak
seasons and hours, eliminating underperforming products, optimizing the supply chain by
concentrating on core ingredients like garlic and tomatoes, and adjusting pricing strategies
to boost sales for high-potential yet underexploited items.
- Tools : Power BI, SQL
- Techniques: Time Series Analysis, Product Performance Review, Sales Forecasting.
Computer Science

1 CHATBOT Music Recommender System Based on User Emotion
Brief Description:
This project involves the development of an intelligent chatbot designed to provide
personalized music recommendations by discerning the user's emotional state. By leveraging
Natural Language Processing (NLP) techniques, the system analyzes the user's text input in a
conversational context to identify their current mood and curate a tailored playlist that
matches or complements their feelings.
Key Features:
- Emotion Detection: Utilizes NLP models to analyze text input in real-time and classify the
user's emotion (e.g., happiness, sadness).
- Personalized Music Recommendation: Generates and suggests music playlists from a
pre-tagged
database or via a music streaming API, tailored specifically to the detected emotion.
- Interactive Conversational Interface: Provides a natural and engaging chatbot interface,
allowing for a seamless and intuitive user experience.
Technology Stack:
- Core Logic & NLP: Python, with libraries such as NLTK, spaCy, or Transformers (Hugging
Face)
for sentiment/emotion analysis.
- Chatbot Framework: a custom implementation.
- Recommendation Engine: Rule-based matching logic or content-based filtering.
- Data Source/API: Integrated with a music API (e.g., Spotify API)
- Database: SQLite for storing user interaction history or
music metadata.

2 Face Recognition Attendance System
Brief Description:
This project entails the development of a fully automated attendance system leveraging
facial recognition technology. The primary objective is to streamline and secure the process
of tracking attendance for environments such as educational institutions, corporate offices,
or events, eliminating the need for manual check-ins and reducing administrative overhead.
Key Features:
- User Enrollment: Allows for the registration of new individuals by capturing their facial
data and associating it with a unique ID and name.
- Real-Time Face Recognition: Utilizes a webcam to detect and identify registered
individuals
in real-time as they enter the designated area.
- Automated Attendance Logging: Automatically records the check-in time of identified
individuals into a database, creating an accurate and tamper-proof attendance log.
- Administrative Dashboard: A comprehensive dashboard for administrators to view attendance
records, generate reports, filter data by date, and visualize statistics.
- User Management: Provides functionalities to view, search, and delete registered user
profiles from the system.
- Secure Access Control: Features a robust login system to ensure that only authorized
administrators can access the management dashboard and sensitive data.
Technology Stack:
- Core Logic: Python
- Computer Vision & Face Recognition: OpenCV, Dlib, face_recognition library.
- Database: SQLite.
- User Interface/Dashboard: HTML, CSS, JavaScript for the web-based dashboard.
3 Lib Management
Brief Description:
This project aims to develop a comprehensive Library Management System designed to enhance the
quality of library services and streamline administrative workflows for librarians. The software
provides a robust, user-friendly solution for managing daily operations with greater ease and
efficiency.
Key Features:
- Patron Management: Manage reader profiles, registration, and history.
- Book Inventory Management: Catalog and track all book titles, copies, and their status.
- Circulation Management: Process book check-outs (lending) and check-ins (returns), and handle associated issues such as fines for overdue items.
- Statistical Reporting: Generate reports and statistics on borrowing/returning trends to support decision-making.
- Architectural Pattern: 3-Layer Architecture (Presentation, Business Logic, Data Access) implemented in a single-tier desktop application.
- Core Technology: Developed using C# on the .NET Framework 4.7.2 with Windows Forms.
- User Interface: Siticone UI/UX Framework for a modern and intuitive user experience.
- Database: Built and managed using MS SQL Server, with data access handled by Entity Framework 6.4.4 and administration via SQL Server Management Studio (SSMS).
4 Expert System for the Diagnosis and Treatment of Chronic Obstructive Pulmonary Disease (COPD)
Brief Description:
The objective of this project is to develop an expert system that diagnoses a user's health
condition and provides corresponding medical diagnoses and treatment recommendations. The system
operates by guiding the user through a series of structured diagnostic questionnaires and
analyzing their responses.
Technology Stack:
- Inference Engine: Built using the Experta library (a rule-based expert system shell for Python).
- User Interface (UI): Developed with PyQt6.
- Database: SQLite used for information storage and management.