2024-06-19 07:54:08
DESCRIPTION
This dataset contains 100,000 rows of data capturing key aspects of employee performance, productivity, and demographics in a corporate environment. It includes details related to the employee’s job, work habits, education, performance, and satisfaction. The dataset is designed for various purposes such as HR analytics, employee churn prediction, productivity analysis, and performance evaluation. I started with mysql analysis and then predicting employee churn with Python.
TABLES
Column | Description |
---|---|
Employee_ID | Unique identifier for each employee. |
Department | The department in which the employee works (e.g., Sales, HR, IT). |
Gender | Gender of the employee (Male, Female, Other). |
Age | Employee’s age (between 22 and 60). |
Job_Title | The role held by the employee (e.g., Manager, Analyst, Developer). |
Hire_Date | The date the employee was hired. |
Years_At_Company | The number of years the employee has been working for the company. |
Education_Level | Highest educational qualification (High School, Bachelor, Master, PhD). |
Performance_Score | Employee’s performance rating (1 to 5 scale). |
Monthly_Salary | The employee’s monthly salary in USD, correlated with job title and performance score. |
Work_Hours_Per_Week | Number of hours worked per week. |
Projects_Handled | Total number of projects handled by the employee. |
Overtime_Hours | Total overtime hours worked in the last year. |
Sick_Days | Number of sick days taken by the employee. |
Remote_Work_Frequency | Percentage of time worked remotely (0%, 25%, 50%, 75%, 100%). |
Team_Size | Number of people in the employee’s team. |
Training_Hours | Number of hours spent in training. |
Promotions | Number of promotions received during their tenure. |
Employee_Satisfaction_Score | Employee satisfaction rating (1.0 to 5.0 scale). |
Resigned | Boolean value indicating if the employee has resigned. |
Approach to Dataset Analysis
Understanding the Dataset
I began by thoroughly reviewing the dataset’s schema, identifying 20 well-defined columns. This is documented what each column represents — e.g., demographics (Age, Gender), job information (Department, Job_Title, Hire_Date), performance and productivity metrics (Performance_Score, Projects_Handled, Training_Hours), and outcome indicators (Resigned, Employee_Satisfaction_Score). Then I ensured I got a clear understanding of the available variables and their relevance to business questions.
Defining the Business Use Cases
I aligned the analysis with four key HR and business use cases:
- Churn Prediction: to identify risk factors leading to employee resignation.
- Productivity Analysis: to uncover drivers of employee output.
- Performance Evaluation: to assess how employees’ performance relates to various factors.
- HR Analytics: to gain a demographic and behavioral view of the workforce for strategic planning.
These use cases served me as the guiding questions for what to measure and analyze. This focus on the business problem definition.
Designing Insightful Queries
I crafted SQL queries**, grouped by topic, designed to extract actionable insights from the dataset. These queries focused on:
- Key metrics and distributions (e.g., resignation rate by department or age group, average tenure of resigned vs. retained employees).
- Correlations and patterns (e.g., performance vs. salary, training hours vs. productivity, remote work frequency vs. projects handled).
- Identifying extremes and top performers (e.g., employees with highest projects handled, highest satisfaction & performance).
This approach ensures me that insights are not only descriptive but also diagnostic and predictive where possible.
Documenting and Presenting Results
To make the work reproducible and readable:
- I documented the dataset columns and their definitions in a Markdown table, creating clear metadata for stakeholders.
- I grouped the SQL queries under meaningful categories, making it easier for others to navigate and use them.
- I prepared the queries to be directly executable in MySQL, avoiding unnecessary complexity and ensuring efficiency.
Innovative Touches
I also considered:
- Advanced queries that check potential correlations (e.g., salary vs. satisfaction).
- Segmenting results (e.g., by age group, education level, or department) to allow more granular insights.
- Preparing the foundation for dashboards or reports by aligning queries to metrics that could feed into visual tools like Tableau or Power BI.
Outcome
With this structured and business-driven approach I ensured that HR manager and leadership can:
- Detect early warning signs of churn.
- Optimize workforce productivity.
- Reward and recognize high performers.
- Build strategic workforce plans grounded in data.
Structure of the approach:
Use Case | Description |
---|---|
Churn Prediction | Identifying patterns that lead to employee resignation. |
Productivity Analysis | Understanding the factors that drive productivity, such as remote work, overtime, training. |
Performance Evaluation | Analyzing how performance scores correlate with salary, team size, education level. |
HR Analytics | Providing insights into workforce demographics and behavior for strategic decision-making. |
Insights
- first
Recommendations
- first
DISCLAIMER
- To the best of my knowledge, this data is fabricated and it does not correspond to real people. Any similarity to existing people is purely coincidental.
LICENSE
- This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.