Salary-Prediction-with-Machine-Learning
1. Business Problem
Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?
2. Dataset Story
- This dataset was originally taken from the StatLib library at Carnegie Mellon University.
- The data set is part of the data used in the 1988 ASA Graphics Section Poster Session.
- Salary data originally taken from Sports Illustrated, April 20, 1987.
- 1986 and career statistics, Collier Books, Macmillan Publishing Company.
3. Variables
- AtBat: Number of hits with a baseball bat during the 1986-1987 season
- Hits: the number of hits in the 1986-1987 season
- HmRun: Most valuable hits in the 1986-1987 season
- Runs: The points he earned for his team in the 1986-1987 season
- RBI: The number of players a batter had jogged when he hit
- Walks: Number of mistakes made by the opposing player
- Years: Player's playing time in major league (years)
- CAtBat: The number of times the player hits the ball during his career
- CHits: The number of hits the player has made throughout his career
- CHmRun: The player's most valuable number during his career
- CRuns: The number of points the player has earned for his team during his career
- CRBI: The number of players the player has made during his career
- CWalks: The number of mistakes the player has made to the opposing player during his career
- League: A factor with levels A and N, showing the league in which the player played until the end of the season
- Division: a factor with levels E and W, indicating the position played by the player at the end of 1986
- PutOuts: Helping your teammate in-game
- Assits: The number of assists the player made in the 1986-1987 season
- Errors: the number of errors of the player in the 1986-1987 season
- Salary: The salary of the player in the 1986-1987 season (over thousand)
- NewLeague: a factor with levels A and N indicating the league of the player at the beginning of the 1987 season
TASK
Salary using data preprocessing and feature engineering techniques develop a forecasting model