NBA_logo

Predicting the 2020 NBA Playoffs


NBA_logo

Introduction


NBA_logo https://shesasportsfan.wordpress.com/2012/04/23/nba-standings-101/

For our final project, we decided to predict the teams that would have made it do the playoffs in 2020. We used historical data of games statistics since the 1980 playoffs to base our prediction.

Every regular season, there are 30 teams divided into two conferences: Western and Eastern. Each team plays 82 games per season - 41 home and 41 away. At the end of the regular season, the eight teams with the most wins in each conference qualify for the playoffs.

We decided to compare the results of 3 different models to do our predictions, and see which model would be the most accurate.

Link: Project Repository

NBA History

Why we chose 1980 - major change in the gameplay

Starting in the 1979-80 NBA season the 3-point line was implemented with the intention of increasing the chances for shorter players to earn more points and to expand defense away from the basket.

NBA rules timeline

Source Data

To build our models, we wanted to be able to pull as much data as possible to optimize machine learning. After reviewing various NBA data sites, we chose https://www.basketball-reference.com/.

Reasoning:

To scrape the data we used HTML parsing with Python’s Beautifulsoup to build a JSON table per our specified date-range to go year-by-year (e.g. page-by-page) to scrape and parse the data needed from two separate tables housed on each year’s summary page (example: https://www.basketball-reference.com/leagues/NBA_2020.html):

This data was then written into a CSV file for Machine Learning analysis.

Link: Scrape, Data

Building the Models

We developed three different types of models, Logistic Regression, Random Forest, and Support Vector Machine (SVM).

Data Preprocessing

Each performance statistic of the regular season data was scaled using a quantile transformer by each year (all columns except RK, Team, Year, Games, and Conference) and normalized. This transformed the data so the performance statistic was comparable for each season.

We used two variations of game wins - count of wins and losses versus win rate.

Training and Testing

We split the regular season data by Eastern and Western conferences. From the split data, 1980 to 2018 was used to train the model. We tested the trained models against the 2019 season. Finally, we predicted 2020 playoff teams based on the statistics from the incomplete season.

Link: Preprocessing (WL) , Preprocessing (W%) , Train/Test Models (WL) , Train/Test Models (W%) , Saved Models

Model Scores

Each model was run with all performance statistics, but with varied structure for game wins. Each model was trained twice: first, with the scaled and and normalized data, then with the count of game wins and losses, and once with the % of wins.

We found that using the scaled and normalized count of wins and losses generally improved the model scores.

Western Conference
Western_logo Logistic Random Forest SVM
WL W% WL W% WL W%
Train 0.927 0.910 1.000 1.000 0.933 0.927
Test 0.933 0.933 1.000 1.000 0.933 0.867
Eastern Conference
Eastern_logo Logistic Random Forest SVM
WL W% WL W% WL W%
Train 0.910 0.888 1.000 1.000 0.926 0.913
Test 0.933 0.867 1.000 1.000 1.000 1.000

Feature Importance

Game outcomes (whether wins, losses, or w%) is the most important feature of all the models.

Western Conference
Western_logo Logistic Random Forest SVM
Order WL W% WL W% WL W%
1 L (-) W% (+) W W% L (-) W%(+)
2 W (+) TOV (-) L 2P% W (+) STL (+)
3 STL (+) STL (+) 2P% FG% STL (+) DRB (+)
4 TOV (-) DRB (+) FG% DRB TOV (-) TOV (-)
5 DRB (+) FG% (+) DRB AST FTA (+) FTA (+)
Eastern Conference
Eastern_logo Logistic Random Forest SVM
Order WL W% WL W% WL W%
1 L (-) W% (+) L W% L (-) W% (+)
2 W (+) DRB (+) W 2P% W (+) 2PA (-)
3 DRB(+) 2PA (-) 2P% FG% 2PA(-) DRB (+)
4 2PA (-) TOV (-) FG% DRB DRB (+) STL (+)
5 TOV (-) STL (+) DRB 2PA ORB (+) ORB (+)

2P - 2 Pointers

2PA - 2 Pointers Attempted

2P% - 2 Pointers Percentage

3P - 3 Pointers Made

3PA - 3 Pointers Attempted

3P% - 3 Pointers Percentage

AST - Assists

BLK - Blocks

Conf - Conference/Region (East or West)

DRB - Defensive Rebounds

FG - Field Goals Made

FGA - Field Goals Attempted

FG% - Field Goals Percentage

FT - Free Throws

FTA - Free Throws Attempted

FT% - Free Throws Percentage

G - Number of Games

L - Losses

MP - Minutes Played

ORB - Offensive Rebounds

PF - Personal Fouls

PTS - Points

RK - End of Regular Season Ranking

STL - Steals

TOV - Turnovers

W - Wins

W% - Win Percentage

Year - Year of Playoffs

2019 Predictions

The model predictions for the 2019 season showed that the Eastern Conference models were better at predicting the teams more precisely than the Western Conference models. The teams that made it to the playoffs are listed in rank order under the "2019 Actual" column.

Teams with higher than .5 probability are listed below in descending order. If there were fewer than 8 teams that meet this criteria, the team with the next highest probability was listed until 8 teams were selected.

Red indicates false predictions.

Western Conference
Western_logo Logistic Random Forest SVM
Rank 2019 Actual WL W% WL W% WL W%
1 GSW DEN (.98) DEN (.95) DEN (.99) GSW (.99) DEN (.99) DEN (.99)
2 DEN GSW (.95) GSW (.95) GSW (.98) UTA (.93) POR (.99) POR (.99)
3 POR UTA (.95) UTA (.94) UTA (.95) DEN (.90) UTA (.99) UTA (.99)
4 HOU POR (.95) POR (.91) HOU (.94) OKC (.86) GSW (.99) GSW (.99)
5 UTA OKC (.81) OKC (.79) POR (.89) POR (.85) OKC (.93) OKC (.95)
6 OKC LAC (.74) LAC (.72) OKC (.89) HOU (.84) HOU (.82) LAC (.80)
7 SAS SAS (.71) SAS (.72) SAS (.86) LAC (.80) LAC (.78) HOU (.77)
8 LAC HOU (.71) HOU (.66) LAC (.85) SAS (.80) SAS (.69) SAS (.74)
9
NOP (.53)
NOP (.62)
SAC (.52)
SAC (.51)
10
NOP (.50)
Eastern Conference
Eastern_logo Logistic Random Forest SVM
Rank 2019 Actual WL W% WL W% WL W%
1 MIL TOR (.99) MIL (.99) MIL (1.0) MIL (1.0) TOR (1.0) TOR (1.0)
2 TOR MIL (.99) TOR (.99) TOR (.99) PHI (.96) MIL (1.0) MIL (1.0)
3 PHI PHI (.98) PHI (.97) PHI (.98) TOR (.95) PHI (.99) PHI (.99)
4 BOS BOS (.96) BOS (.94) BOS (.96) BOS (.93) BOS (.99) BOS (.99)
5 IND IND (.75) IND (.74) IND (.93) IND (.93) IND (.85) IND (.84)
6 BKN ORL (.68) ORL (.65) ORL (.90) BKN (.84) BKN (.85) ORL (.75)
7 ORL BKN (.65) BKN (.60) BKN (.88) ORL (.84) ORL (.80) BKN (.74)
8 DET
CHA (.49)
CHA (.52)
DET (.80) DET (.82) DET (.67) DET (.50)

Western Conference

DAL - Dallas Mavericks

DEN - Denver Nuggets

GSW - Golden State Warriors

HOU - Houston Rockets

LAC - Los Angeles Clippers

LAL - Los Angeles Lakers

MEM - Memphis Grizzlies

MIN - Minnesota Timberwolves

NOP - New Orleans Pelicans

OKC - Oklahoma City Thunder

PHX - Phoenix Suns

POR - Portland Trail Blazers

SAC - Sacramento Kings

SAS - San Antonio Spurs

UTA - Utah Jazz

Eastern Conference

ATL - Atlanta Hawks

BKN - Brooklyn Nets

BOS - Boston Celtics

CHA - Charlotte Hornets

CHI - Chicago Bulls

CLE - Cleveland Cavaliers

DET - Detroit Pistons

IND - Indiana Pacers

MIA - Miami Heat

MIL - Milwaukee Bucks

NYK - New York Knicks

ORL - Orlando Magic

PHI - Philadelphia 76ers

TOR - Toronto Raptors

WAS - Washington Wizards

2020 Predictions

Given that only four teams had clinched a playoff spot before the season was cancelled, there is no ground truth for this set of predictions. However, we thought it would be interesting to see what could have been.

The teams with higher than .5 probability are listed below in descending order.

* indicates teams that have made it to the playoffs.

Western Conference
Western_logo Logistic Random Forest SVM
Rank 2020 Partial WL W% WL W% WL W%
1 LAL* LAL (.98) LAL (.97) LAL (.99) LAL (.97) LAL (.99) LAL (.99)
2 LAC DEN (.98) DEN (.95) DEN (.98) LAC (.95) DEN (.99) DEN (.99)
3 DEN LAC (.92) LAC (.89) LAC (.97) DEN (.94) LAC (.98) LAC (.98)
4 UTA OKC (.87) SAS (.87) OKC (.97) DAL (.92) OKC (.98) OKC (.96)
5 OKC SAS (.83) OKC (.83) UTA (.96) MEM (.91) SAS (.87) SAS (.93)
6 HOU DAL (.82) DAL (.81) DAL (.90) UTA (.89) DAL (.85) DAL (.84)
7 DAL HOU (.71) HOU (.69) MEM (.90) OKC (.89) MEM (.80) HOU (.79)
8 MEM MEM (.70) MEM (.67) HOU (.89) HOU (.86) HOU (.79) MEM (.77)
9 POR PHX (.59) PHX (.62)
10 NOP UTA (.50)
Eastern Conference
Eastern_logo Logistic Random Forest SVM
Rank 2020 Partial WL W% WL W% WL W%
1 MIL* MIL (.99) MIL (.99) PHI (1.0) MIL (.98) TOR (1.0) TOR (1.0)
2 BOS* TOR (.99) TOR (.99) TOR (.99) MIA (.95) BOS (1.0) MIL (1.0)
3 TOR* BOS (.99) BOS (.99) MIL (.98) TOR (.93) MIL (1.0) BOS (1.0)
4 MIA PHI (.95) PHI (.94) MIA (.98) PHI (.92) PHI (.99) PHI (.99)
5 IND MIA (.65) MIA (.89) BOS (.97) BOS (.89) MIA (.99) MIA (.98)
6 PHI ORL (.64) ORL (.67) IND (.94) IND (.88) BKN (.84) BKN (.78)
7 BKN BKN (.64) BKN (.63) ORL (.92) ORL (.86) IND (.74) IND (.70)
8 ORL IND (.64) IND (.60) BKN (.86) BKN (.80) ORL (.70) ORL (.68)

Western Conference

DAL - Dallas Mavericks

DEN - Denver Nuggets

GSW - Golden State Warriors

HOU - Houston Rockets

LAC - Los Angeles Clippers

LAL - Los Angeles Lakers

MEM - Memphis Grizzlies

MIN - Minnesota Timberwolves

NOP - New Orleans Pelicans

OKC - Oklahoma City Thunder

PHX - Phoenix Suns

POR - Portland Trail Blazers

SAC - Sacramento Kings

SAS - San Antonio Spurs

UTA - Utah Jazz

Eastern Conference

ATL - Atlanta Hawks

BKN - Brooklyn Nets

BOS - Boston Celtics

CHA - Charlotte Hornets

CHI - Chicago Bulls

CLE - Cleveland Cavaliers

DET - Detroit Pistons

IND - Indiana Pacers

MIA - Miami Heat

MIL - Milwaukee Bucks

NYK - New York Knicks

ORL - Orlando Magic

PHI - Philadelphia 76ers

TOR - Toronto Raptors

WAS - Washington Wizards

2020 FINAL PREDICTIONS

We averaged all six models to determine which teams we think would have made it into the playoffs for 2020. All four teams that had already clinched playoff spots (marked with an asterisk) were at the top of the lists (though not necessarily in the same order).

Western Conference
Western_logo
Rank 2020 Partial Predicted
1 LAL* LAL (.98)
2 LAC DEN (.97)
3 DEN LAC (.95)
4 UTA OKC (.92)
5 OKC SAS (.88)
6 HOU DAL (.86)
7 DAL MEM (.79)
8 MEM HOU (.79)
Eastern Conference
Eastern_logo
Rank 2020 Partial Predicted
1 MIL* MIL (.99)
2 BOS* TOR (.99)
3 TOR* PHI (.99)
4 MIA BOS (.95)
5 IND IND (.84)
6 PHI BKN (.78)
7 BKN ORL (.77)
8 ORL DET (.62)

Conclusion

Using Logistic, Random Forest and Support Vector Machine as tools of Machine Learning (ML) we found that all provided similar predictions, whether the importance of the independent values were the same or not. Though, if you look at our section ‘Feature Importance’ you’ll see the strongest values are related to either ‘Win’, ‘Loss’, or ‘W%’ (win percentage); which we believe should be expected in predicting a teams likelihood of earning a place within the Playoffs.

When we removed the WL/W% from the models, we found that defensive stats played a much larger role in the model. Steals (STL), defensive rebounds (DRB), turnovers (TOV) all increase in importance while providing similar predictive results and accuracy.

We ran numerous tests for previous years showing similar results, with a high rate of accuracy (with some inconsistency in 1999 and 2011 during NBA lockouts). With that being said, with the amount of data we currently collect and model, we feel that we have built three models that can predict playoff contention with a ‘strong’ level of accuracy of telling who from each conference will participate in the NBA Playoffs.

Next Steps

Improving the Model

If we had more time with these models, there are several more things we’d like to try.

Continued Testing of Accuracy

We could use these models year-over-year with mid-season statistics to prove accuracy.

Future Models

The natural next step would be to make a model that uses regular season and playoff game data to predict not only who makes it to the playoffs, but who wins the whole thing.