Graphing communities on Twitch.tv in a visually intuitive way

Kiran Gershenfeld

Last update: Jan 7, 2023

Related tags

Overview

VisualizingTwitchCommunities

This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch API and visualized in Gephi.

Results

I wrote an article on TowardsDataScience here

If your curious about how to read this graph and why I made it, check out the article.

How can I mess with the graph?

I made the graph in a free data visualization tool called Gephi. Download it here The data set is in Visulization/GephiData . In gephi go to laboratory and import the edge file as an edgelist. Then import the label file as a node list. From there you can go to overview and run a modularity analysis on the nodes to detect communities.

How can I collect more data?

The DataCollection folder has a script called main that can be ran to collect the top 100 streams and all their viewers and save it to a csv. You can use the windows task scheduler to run this task at any time interval you like and build up data over long periods of time.

Comments

Huge refactor
Hello, your idea is really cool and I am glad everything worked for you, but when I looked at your code I was exremely disgusted by a lot of factors.

I spent like an hour trying to describe everything that was wrong in like a non-toxic or passive-agressive manor, but kind of failed to do so. Then I just spent half of a day fixing your code so that it looks and works nice and squeaky-clean.

I am now doing this as a PR and below will describe what I did in each commit, I do not expect you to merge it, but at least check out the huge async optimization (the important paragraph, as well as point 9 below, e405644)

If you don't want to read all of this or find it derogatory in any way that I did all of this, please at least check out how it looks in the final version, it became much smaller, simpler, faster, more readable and just correct, thank you.

IMPORTANT: one huge thing was that I rewrote part where you get a list of viewers to be asynchronous, meaning it makes all 100 requests simultaneously, which is a crazy speedup, I was very tired of sitting there waiting for 100 streams to be checked one-by-one heh. Also I did some other micro-optimizations but they aren't a huge deal or anything.

95d2e31 - remove output images: When I cloned your repository, I had to download ~200MB which is very very huge for the amount of code you have - this is because you upload images here, and when you make a commit which replaces an image with another one - old one does not get deleted (it is still there, in the git history), this is just how git works without going into details. So unless the image is like a completely static part of the readme or something, it is considered a bad practice, I'd store them somewhere else. As in this commit I just removed the current images (just for my clone of the repo to be cleaner) and everything still stays in the history, you'd either make another repository or have fun with hard-resets and force-pushes if you want to decrease the size back.

ace3f30 - temp better credentials blah-blah-blah Made the code just run for myself, the 'store credentials or anything in a separate source file and then import it' is a very bad practice, as you can see, I just load them from a json file there, simple. This commit was like a quick patch for it to at least work for me, later I will touch them again

c90ee59 - remove stdout.flush I am guessing you had some weird windows-related utf8 console issues because you printed streamer names which might've contained like japanese symbols in them or whatever, removed all of that and then later used streamer logins instead of their names.

ca3f92c - format everything, the huge one So, naming conventions: A lot of people who don't do programming daily (and I assume you're a great data scientist, but a bad python programmer) don't get why is it such a big deal and why can't they just name things however they like (you seem to like PascalCase which is ugly just by itself, but let me continue). But the thing is, there is a convention of writing and calling methods (in python) in snake_case, and everyone does that and sees that everyday, and then it becomes an enormous pain in the eyes when you look at something like your code where you did whatever you liked. Other than that, some other standard python formatting was applied, here is how it looked when I opened your project in an IDE: Yeah I recommend you to use like PyCharm or intellij with a python plugin or whatever instead of the notepad you were using, it is a great learning tool even because it tells you why are those and those things underlined with warning squiggly lines

6a2ba36 - replace simple concats with string interpolation A really small one, but nobody ever does 'some text ' + str(some_number) since like forever, huh

b244a76 - remove intermediate collections Another small one, this is those micro-optimizations I was talking about, you converted the same lists (of viewers, which are like huge btw) to sets multiple times, I think it made it noticeably faster

702aef7 - no intermediate json in twitch api Yeah that was kind of a bad api design or your part, you return raw json from one method and then use it in another method, can't even describe how is this a smelly code, it just is. Anyway, was actually wery simple fixed by moving the streamer list extraction to the method where you receive that json :shrug:

6d64e65 - move credential reading out of the api This is what I mentioned in point 2, the api now it not dependent on files or where or how you store the credentials, you just give them to the api

e405644 - this is the huge one, the async rewrite So I used aiohttp instead of requests and wrote everything HTTP-request related in asynchronous manner, so that you can make a ton of requests at the same time, instead of waiting for each of them individually to complete, this made it real fast heh.

0debcb4 - fix nan checks Somewhere before I rewrote your omega-weird str(x) == 'nan' checks to use proper thing, but never tested it and turns out it was pandas special nans which required pandas special 'pd.isna' check, just a fix for that.

03e206c - another microoptimization So set.contains is programmed to be much faster than list.contains at the expense of other things, so if you need to have a 'already did that' check you always use sets, here you had a list so your optimization was kind of meh

25a8261 - change the folder structure Yeah your folder structure was like completely random, I changed that, and also I renamed the files here too because in python files are not PascalCase either. Also I've merged two files that uses the same duplicate remove_nans function and did other things here. I removed your csv files here too because I was unsure where to put them and also similarly to removing images idk, again, not expecting you to actually merge this.

2e93585 - refactor the analysis module Well here I progressively made that joined file better and better by giving meaningful yet short names to methods and variables and so on and so forth. Also I completely removed pandas and all the nan-related weirdness you had as you seemed to only use pandas (a huge fken math library) only to store data to weirdly formated csv files. Instead, I am just storing and loading the {streamer->[viewers]} mapping to and from json, no weirdness, its smaller and better idk. Could also use python pickle library to store it in a binary format to be even smaller/faster. And also even the json could use like a simple gzip compression to noticeably decrease taken up size, which is not that big to begin with.

244ba6f - more refactor, rewrite commends as docstrings, had fun with loggers Yeah loggers were overkill, for small script projects like this one prints are just fine, I just had fun I guess.

d254e9f - rewrite the dict merging blah-blah Was looking at it a github already and finally understood that I only optimized your interestingly-written combine_dicts, while it is a common thing that has a common optimal (and very small) solution, so now everything is even smaller. Also added a check for the token because I kept forgetting to set it to test stuff.
opened by necauqua 4
Typo in README.md

Hi, while reading your readMe I found a typo in line 51: "Gephi has a bunch of layour". That should be "layout" i think. Sorry if that is not the common way to submit this. I´m new to issues. If so, just delete this one :D.

opened by fschoenitz 1

KeyError: 'data' in GetDictOfStreamersAndViewers

Hey, I've been playing around with the DataCollection in your legacy branch, and I've encountered a KeyError issue whenever I run the main.py. Do you know why data might not be referencing any key here? Could it be an authorisation issue with the credentials I'm using or something else?

Getting a list of top live streams...
Creating dictionary of streamers and viewers...
Traceback (most recent call last):
  File "C:/Users/alano/PycharmProjects/TestTwitchData/main.py", line 15, in <module>
    main()
  File "C:/Users/alano/PycharmProjects/TestTwitchData/main.py", line 10, in main
    newerDict = GetTwitchData.GetDictOfStreamersAndViewers(json)
  File "C:\Users\alano\PycharmProjects\TestTwitchData\GetTwitchData.py", line 33, in GetDictOfStreamersAndViewers
    streamers = [element['user_name'] for element in j['data']]
KeyError: 'data'

Process finished with exit code 1

opened by ghost 0

Outliers on graph

I just visualised the graph of the twitch streamers on Gephi (using your legacy branch) and I noticed that there are many nodes who seemingly have no edges connecting them. Just wanted to ask if you knew why these existed?

P.S. Really loved this project!

opened by JustinPoon366 0
Date for map 1 should be 2020

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities/blob/13e732d339a19382e68b2e4ba11195121a996285/README.md#L18

(Sorry for the small issue)

opened by glitchroy 1

GetTwitchData.py Streamer name doesn't match Twitch url

GetTwitchData.py has an issue with some user names and will run into an error.

Getting viewers for 악어...
Traceback (most recent call last):
  File "C:/Scripts/VisualizingTwitchCommunities/DataCollection/main.py", line 17, in <module>
    main()
  File "C:/Scripts/VisualizingTwitchCommunities/DataCollection/main.py", line 11, in main
    dict = GetTwitchData.GetDictOfStreamersAndViewers(json) #Create a dictionary of {streamer:[viewers]} from those 100 streams
  File "C:\Scripts\VisualizingTwitchCommunities\DataCollection\GetTwitchData.py", line 44, in GetDictOfStreamersAndViewers
    viewers = getCurrentViewersForChannel(streamer.lower()) #Get viewers for a particular streamer
  File "C:\Scripts\VisualizingTwitchCommunities\DataCollection\GetTwitchData.py", line 29, in getCurrentViewersForChannel
    r = requests.get('http://tmi.twitch.tv/group/user/'+ channel.lower() +'/chatters').json()
  File "C:\Users\%User%\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)

I was able to fix it by just changing line 41 in GetTwitchData.py from user_name to user_login. user_login uses the streamers url name in cases where the streamers stream name and url are different

    streamers = [element['user_login'] for element in j['data']] #Get just the list of streamers

opened by APoorDev 0

Searching in Image

Can you create images in a format where text in the image is searchable (pdf or eps)? That would make it a lot easier to search for a streamer someone is looking for.

opened by utkarshmall13 1
Gephi Layout

Hi there, I'm having some issues with the visualization of results and I'm not sure how to solve them. When running Force Atlas on Gephi, the nodes basically just converge on the center of the graph. I've tried with different parameters but the result is pretty much the same. Do you have any ideas why?

opened by giambaJ 3
"import Credentials as cr" gives problems when running code

Hello. I'm trying to run your code to try and generate the CSV file with the necessary data for later on. I've been trying to resolve the issues with one of the modules that are required in both main.py and GetTwitchData.py : import Credentials as cr.

I've tried many methods to get it to work. I tried reinstalling and removing older python versions to prevent any problems with them. I installed the other necessary modules such as requests & pandas and pip doesn't seems to give much issue to installing credentials as well. I've tried running it through the command prompt as well and through PyCharm and I do still get an error.

File "FILE_LOCATION\GetTwitchData.py", line 5, in <module> import Credentials as cr ModuleNotFoundError: No module named 'Credentials'

I thought there wasn't supposed to be a capital for Credentials, so I tried import credentials as cr, which changed the type of problem. Instead I get errors like this:

File "FILE_LOCATION\GetTwitchData.py", line 16, in GetTopStreams Headers = {'Client-ID': cr.clientID, 'Authorization': "Bearer " + cr.clientSecret} AttributeError: module 'credentials' has no attribute 'clientID'

At this point, I'm sure that it is Credentials and not all lowercase. In this case, I'm not sure what to do now. Pip doesn't have that package and PyCharm still highlights it as non-existant.

Could you shed some light on this? This might be something on my end though I'm not so sure.

opened by TheJollyDuck 2

Owner

Kiran Gershenfeld

Making Lives Easier With Computers

GitHub

The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

12.7k Jan 5, 2023

The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

8.9k Feb 18, 2021

An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

2.3k Dec 31, 2022

PanGraphViewer -- show panenome graph in an easy way

PanGraphViewer -- show panenome graph in an easy way Table of Contents Versions and dependences Desktop-based panGraphViewer Library installation for

16 Dec 17, 2022

Voilà, install macOS on ANY Computer! This is really and magic easiest way!

OSX-PROXMOX - Run macOS on ANY Computer - AMD & Intel Install Proxmox VE v7.02 - Next, Next & Finish (NNF). Open Proxmox Web Console -> Datacenter > N

654 Jan 9, 2023

Process dataframe in a easily way.

Popanda Written by Shengxuan Wang at OSU. Used for processing dataframe, especially for machine learning. The name is from "Po" in the movie Kung Fu P

1 Dec 24, 2021

Pydrawer: The Python package for visualizing curves and linear transformations in a super simple way

pydrawer ?? The Python package for visualizing curves and linear transformations in a super simple way. ✏️ Installation Install pydrawer package with

56 Dec 30, 2022

Curvipy - The Python package for visualizing curves and linear transformations in a super simple way

55 Dec 28, 2022

Automatically send commands to send Twitch followers to any Twitch account.

Automatically send commands to send Twitch followers to any Twitch account. You just need to be in a Twitch follow bot Discord server!

6 Nov 27, 2022

Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile notification system

Faster Twitch Alerts What is "Faster Twitch Alerts"? Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile

6 Dec 22, 2022

Eulera Dashboard is an easy and intuitive way to get a quick feel of what’s happening on the world’s market.

an easy and intuitive way to get a quick feel of what’s happening on the world’s market ! Eulera dashboard is a tool allows you to monitor historical

4 Nov 25, 2022

The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

12.7k Jan 5, 2023

The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

8.9k Feb 18, 2021

Glyph-graph - A simple, yet versatile, package for graphing equations on a 2-dimensional text canvas

Glyth Graph Revision for 0.01 A simple, yet versatile, package for graphing equations on a 2-dimensional text canvas List of contents: Brief Introduct

2 Oct 21, 2022

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

9.2k Jan 2, 2023

toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities

toldium The easy multiplatform bot toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities, from a commun

5 Nov 3, 2021

MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

57 Dec 21, 2022

pyglet is a cross-platform windowing and multimedia library for Python, for developing games and other visually rich applications.

pyglet pyglet is a cross-platform windowing and multimedia library for Python, intended for developing games and other visually rich applications. It

1.3k Jan 1, 2023

Display tabular data in a visually appealing ASCII table format

PrettyTable Installation Install via pip: python -m pip install -U prettytable Install latest development version: python -m pip install -U git+https

924 Jan 5, 2023

Audio book player for senior visually impaired.

PI Zero W Audio Book Motivation and requirements My dad is practically blind and at 80 years has trouble hearing and operating tiny or more complicate

29 Dec 25, 2022