Web Scrapping Popular Youtube Tech Channels with Selenium
Data Mining, Data Wrangling, and Exploratory Data Analysis
About the Data
Web scraping was performed on the Top 10 Tech Channels on Youtube using Selenium (an automated browser (driver) controlled using python, which is often used in web scraping and web testing). Web scrapped Youtube channels were were determined using a Top 10 Tech Youtubers list from blog.bit.ai.
All data was saved to multiple CSV files to aid in further analyze on a Google Colab notebook. Please see my for more more details.
Sample of Data Collected
The average number of videos per channel was around 200. In total, the data from 2000 videos was scrapped.
Word Cloud of Word Frequency in Video Titles
Take Aways
-
Video Comment numbers have very little correlation to any data that was obtained in this project.
-
The following seem to be seems to be highly correlated.
- Channel Views and Subscribers
- Interactions and Video Views
-
Video titles fall into 5 topic groups.
Kmeans and PCA used to create clusters for video titles
- Iphone (kmeans 0)
- Samsung (kmeans 1)
- Reviews (kmeans 2)
- Unboxing (kmeans 3)
- How-to (kmeans 4)
-
70% of the the most viewed videos are about phones.
-
Join Date (Date a Youtube Channel was created) does not seem to have any relationship to number of subscribers or overall cha