Graphing communities on Twitch.tv in a visually intuitive way

Overview

VisualizingTwitchCommunities

This project maps communities of streamers on Twitch.tv based on shared viewership. The data is collected from the Twitch API and visualized in Gephi.

Results

Twitch-Communities-High-Res.png

I wrote an article on TowardsDataScience here

If your curious about how to read this graph and why I made it, check out the article.

How can I mess with the graph?

I made the graph in a free data visualization tool called Gephi. Download it here The data set is in Visulization/GephiData . In gephi go to laboratory and import the edge file as an edgelist. Then import the label file as a node list. From there you can go to overview and run a modularity analysis on the nodes to detect communities.

How can I collect more data?

The DataCollection folder has a script called main that can be ran to collect the top 100 streams and all their viewers and save it to a csv. You can use the windows task scheduler to run this task at any time interval you like and build up data over long periods of time.

Comments
  • Huge refactor

    Huge refactor

    Hello, your idea is really cool and I am glad everything worked for you, but when I looked at your code I was exremely disgusted by a lot of factors.

    I spent like an hour trying to describe everything that was wrong in like a non-toxic or passive-agressive manor, but kind of failed to do so. Then I just spent half of a day fixing your code so that it looks and works nice and squeaky-clean.

    I am now doing this as a PR and below will describe what I did in each commit, I do not expect you to merge it, but at least check out the huge async optimization (the important paragraph, as well as point 9 below, e405644)

    If you don't want to read all of this or find it derogatory in any way that I did all of this, please at least check out how it looks in the final version, it became much smaller, simpler, faster, more readable and just correct, thank you.

    IMPORTANT: one huge thing was that I rewrote part where you get a list of viewers to be asynchronous, meaning it makes all 100 requests simultaneously, which is a crazy speedup, I was very tired of sitting there waiting for 100 streams to be checked one-by-one heh. Also I did some other micro-optimizations but they aren't a huge deal or anything.

    1. 95d2e31 - remove output images: When I cloned your repository, I had to download ~200MB which is very very huge for the amount of code you have - this is because you upload images here, and when you make a commit which replaces an image with another one - old one does not get deleted (it is still there, in the git history), this is just how git works without going into details. So unless the image is like a completely static part of the readme or something, it is considered a bad practice, I'd store them somewhere else. As in this commit I just removed the current images (just for my clone of the repo to be cleaner) and everything still stays in the history, you'd either make another repository or have fun with hard-resets and force-pushes if you want to decrease the size back.

    2. ace3f30 - temp better credentials blah-blah-blah Made the code just run for myself, the 'store credentials or anything in a separate source file and then import it' is a very bad practice, as you can see, I just load them from a json file there, simple. This commit was like a quick patch for it to at least work for me, later I will touch them again

    3. c90ee59 - remove stdout.flush I am guessing you had some weird windows-related utf8 console issues because you printed streamer names which might've contained like japanese symbols in them or whatever, removed all of that and then later used streamer logins instead of their names.

    4. ca3f92c - format everything, the huge one So, naming conventions: A lot of people who don't do programming daily (and I assume you're a great data scientist, but a bad python programmer) don't get why is it such a big deal and why can't they just name things however they like (you seem to like PascalCase which is ugly just by itself, but let me continue). But the thing is, there is a convention of writing and calling methods (in python) in snake_case, and everyone does that and sees that everyday, and then it becomes an enormous pain in the eyes when you look at something like your code where you did whatever you liked. Other than that, some other standard python formatting was applied, here is how it looked when I opened your project in an IDE: image Yeah I recommend you to use like PyCharm or intellij with a python plugin or whatever instead of the notepad you were using, it is a great learning tool even because it tells you why are those and those things underlined with warning squiggly lines

    5. 6a2ba36 - replace simple concats with string interpolation A really small one, but nobody ever does 'some text ' + str(some_number) since like forever, huh

    6. b244a76 - remove intermediate collections Another small one, this is those micro-optimizations I was talking about, you converted the same lists (of viewers, which are like huge btw) to sets multiple times, I think it made it noticeably faster

    7. 702aef7 - no intermediate json in twitch api Yeah that was kind of a bad api design or your part, you return raw json from one method and then use it in another method, can't even describe how is this a smelly code, it just is. Anyway, was actually wery simple fixed by moving the streamer list extraction to the method where you receive that json :shrug:

    8. 6d64e65 - move credential reading out of the api This is what I mentioned in point 2, the api now it not dependent on files or where or how you store the credentials, you just give them to the api

    9. e405644 - this is the huge one, the async rewrite So I used aiohttp instead of requests and wrote everything HTTP-request related in asynchronous manner, so that you can make a ton of requests at the same time, instead of waiting for each of them individually to complete, this made it real fast heh.

    10. 0debcb4 - fix nan checks Somewhere before I rewrote your omega-weird str(x) == 'nan' checks to use proper thing, but never tested it and turns out it was pandas special nans which required pandas special 'pd.isna' check, just a fix for that.

    11. 03e206c - another microoptimization So set.contains is programmed to be much faster than list.contains at the expense of other things, so if you need to have a 'already did that' check you always use sets, here you had a list so your optimization was kind of meh

    12. 25a8261 - change the folder structure Yeah your folder structure was like completely random, I changed that, and also I renamed the files here too because in python files are not PascalCase either. Also I've merged two files that uses the same duplicate remove_nans function and did other things here. I removed your csv files here too because I was unsure where to put them and also similarly to removing images idk, again, not expecting you to actually merge this.

    13. 2e93585 - refactor the analysis module Well here I progressively made that joined file better and better by giving meaningful yet short names to methods and variables and so on and so forth. Also I completely removed pandas and all the nan-related weirdness you had as you seemed to only use pandas (a huge fken math library) only to store data to weirdly formated csv files. Instead, I am just storing and loading the {streamer->[viewers]} mapping to and from json, no weirdness, its smaller and better idk. Could also use python pickle library to store it in a binary format to be even smaller/faster. And also even the json could use like a simple gzip compression to noticeably decrease taken up size, which is not that big to begin with.

    14. 244ba6f - more refactor, rewrite commends as docstrings, had fun with loggers Yeah loggers were overkill, for small script projects like this one prints are just fine, I just had fun I guess.

    15. d254e9f - rewrite the dict merging blah-blah Was looking at it a github already and finally understood that I only optimized your interestingly-written combine_dicts, while it is a common thing that has a common optimal (and very small) solution, so now everything is even smaller. Also added a check for the token because I kept forgetting to set it to test stuff.

    opened by necauqua 4
  • Typo in README.md

    Typo in README.md

    Hi, while reading your readMe I found a typo in line 51: "Gephi has a bunch of layour". That should be "layout" i think. Sorry if that is not the common way to submit this. I´m new to issues. If so, just delete this one :D.

    opened by fschoenitz 1
  • KeyError: 'data' in GetDictOfStreamersAndViewers

    KeyError: 'data' in GetDictOfStreamersAndViewers

    Hey, I've been playing around with the DataCollection in your legacy branch, and I've encountered a KeyError issue whenever I run the main.py. Do you know why data might not be referencing any key here? Could it be an authorisation issue with the credentials I'm using or something else?

    Getting a list of top live streams...
    Creating dictionary of streamers and viewers...
    Traceback (most recent call last):
      File "C:/Users/alano/PycharmProjects/TestTwitchData/main.py", line 15, in <module>
        main()
      File "C:/Users/alano/PycharmProjects/TestTwitchData/main.py", line 10, in main
        newerDict = GetTwitchData.GetDictOfStreamersAndViewers(json)
      File "C:\Users\alano\PycharmProjects\TestTwitchData\GetTwitchData.py", line 33, in GetDictOfStreamersAndViewers
        streamers = [element['user_name'] for element in j['data']]
    KeyError: 'data'
    
    Process finished with exit code 1
    
    opened by ghost 0
  • Outliers on graph

    Outliers on graph

    I just visualised the graph of the twitch streamers on Gephi (using your legacy branch) and I noticed that there are many nodes who seemingly have no edges connecting them. Just wanted to ask if you knew why these existed?

    P.S. Really loved this project!

    Screen Shot 2021-09-07 at 10 54 42 pm

    opened by JustinPoon366 0
  • Date for map 1 should be 2020

    Date for map 1 should be 2020

    https://github.com/KiranGershenfeld/VisualizingTwitchCommunities/blob/13e732d339a19382e68b2e4ba11195121a996285/README.md#L18

    (Sorry for the small issue)

    opened by glitchroy 1
  • GetTwitchData.py Streamer name doesn't match Twitch url

    GetTwitchData.py Streamer name doesn't match Twitch url

    GetTwitchData.py has an issue with some user names and will run into an error.

    Getting viewers for 악어...
    Traceback (most recent call last):
      File "C:/Scripts/VisualizingTwitchCommunities/DataCollection/main.py", line 17, in <module>
        main()
      File "C:/Scripts/VisualizingTwitchCommunities/DataCollection/main.py", line 11, in main
        dict = GetTwitchData.GetDictOfStreamersAndViewers(json) #Create a dictionary of {streamer:[viewers]} from those 100 streams
      File "C:\Scripts\VisualizingTwitchCommunities\DataCollection\GetTwitchData.py", line 44, in GetDictOfStreamersAndViewers
        viewers = getCurrentViewersForChannel(streamer.lower()) #Get viewers for a particular streamer
      File "C:\Scripts\VisualizingTwitchCommunities\DataCollection\GetTwitchData.py", line 29, in getCurrentViewersForChannel
        r = requests.get('http://tmi.twitch.tv/group/user/'+ channel.lower() +'/chatters').json()
      File "C:\Users\%User%\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 898, in json
        return complexjson.loads(self.text, **kwargs)
    

    I was able to fix it by just changing line 41 in GetTwitchData.py from user_name to user_login. user_login uses the streamers url name in cases where the streamers stream name and url are different

        streamers = [element['user_login'] for element in j['data']] #Get just the list of streamers
    
    opened by APoorDev 0
  • Searching in Image

    Searching in Image

    Can you create images in a format where text in the image is searchable (pdf or eps)? That would make it a lot easier to search for a streamer someone is looking for.

    opened by utkarshmall13 1
  • Gephi Layout

    Gephi Layout

    Hi there, I'm having some issues with the visualization of results and I'm not sure how to solve them. When running Force Atlas on Gephi, the nodes basically just converge on the center of the graph. I've tried with different parameters but the result is pretty much the same. Do you have any ideas why?

    opened by giambaJ 3
  • "import Credentials as cr" gives problems when running code

    Hello. I'm trying to run your code to try and generate the CSV file with the necessary data for later on. I've been trying to resolve the issues with one of the modules that are required in both main.py and GetTwitchData.py : import Credentials as cr.

    I've tried many methods to get it to work. I tried reinstalling and removing older python versions to prevent any problems with them. I installed the other necessary modules such as requests & pandas and pip doesn't seems to give much issue to installing credentials as well. I've tried running it through the command prompt as well and through PyCharm and I do still get an error.

    File "FILE_LOCATION\GetTwitchData.py", line 5, in <module> import Credentials as cr ModuleNotFoundError: No module named 'Credentials'

    I thought there wasn't supposed to be a capital for Credentials, so I tried import credentials as cr, which changed the type of problem. Instead I get errors like this:

    File "FILE_LOCATION\GetTwitchData.py", line 16, in GetTopStreams Headers = {'Client-ID': cr.clientID, 'Authorization': "Bearer " + cr.clientSecret} AttributeError: module 'credentials' has no attribute 'clientID'

    At this point, I'm sure that it is Credentials and not all lowercase. In this case, I'm not sure what to do now. Pip doesn't have that package and PyCharm still highlights it as non-existant.

    Could you shed some light on this? This might be something on my end though I'm not so sure.

    opened by TheJollyDuck 2
Owner
Kiran Gershenfeld
Making Lives Easier With Computers
Kiran Gershenfeld
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 5, 2023
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 8.9k Feb 18, 2021
An intuitive library to add plotting functionality to scikit-learn objects.

Welcome to Scikit-plot Single line functions for detailed visualizations The quickest and easiest way to go from analysis... ...to this. Scikit-plot i

Reiichiro Nakano 2.3k Dec 31, 2022
PanGraphViewer -- show panenome graph in an easy way

PanGraphViewer -- show panenome graph in an easy way Table of Contents Versions and dependences Desktop-based panGraphViewer Library installation for

null 16 Dec 17, 2022
Voilà, install macOS on ANY Computer! This is really and magic easiest way!

OSX-PROXMOX - Run macOS on ANY Computer - AMD & Intel Install Proxmox VE v7.02 - Next, Next & Finish (NNF). Open Proxmox Web Console -> Datacenter > N

Gabriel Luchina 654 Jan 9, 2023
Process dataframe in a easily way.

Popanda Written by Shengxuan Wang at OSU. Used for processing dataframe, especially for machine learning. The name is from "Po" in the movie Kung Fu P

ShawnWang 1 Dec 24, 2021
Pydrawer: The Python package for visualizing curves and linear transformations in a super simple way

pydrawer ?? The Python package for visualizing curves and linear transformations in a super simple way. ✏️ Installation Install pydrawer package with

Dylan Tintenfich 56 Dec 30, 2022
Curvipy - The Python package for visualizing curves and linear transformations in a super simple way

Curvipy - The Python package for visualizing curves and linear transformations in a super simple way

Dylan Tintenfich 55 Dec 28, 2022
Automatically send commands to send Twitch followers to any Twitch account.

Automatically send commands to send Twitch followers to any Twitch account. You just need to be in a Twitch follow bot Discord server!

Thomas Keig 6 Nov 27, 2022
Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile notification system

Faster Twitch Alerts What is "Faster Twitch Alerts"? Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile

null 6 Dec 22, 2022
Eulera Dashboard is an easy and intuitive way to get a quick feel of what’s happening on the world’s market.

an easy and intuitive way to get a quick feel of what’s happening on the world’s market ! Eulera dashboard is a tool allows you to monitor historical

Salah Eddine LABIAD 4 Nov 25, 2022
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 12.7k Jan 5, 2023
The interactive graphing library for Python (includes Plotly Express) :sparkles:

plotly.py Latest Release User forum PyPI Downloads License Data Science Workspaces Our recommended IDE for Plotly’s Python graphing library is Dash En

Plotly 8.9k Feb 18, 2021
Glyph-graph - A simple, yet versatile, package for graphing equations on a 2-dimensional text canvas

Glyth Graph Revision for 0.01 A simple, yet versatile, package for graphing equations on a 2-dimensional text canvas List of contents: Brief Introduct

Ivan 2 Oct 21, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities

toldium The easy multiplatform bot toldium is a modular, fast, reliable and customizable multiplatform bot library for your communities, from a commun

Stockdroid Fans 5 Nov 3, 2021
MICOM is a Python package for metabolic modeling of microbial communities

Welcome MICOM is a Python package for metabolic modeling of microbial communities currently developed in the Gibbons Lab at the Institute for Systems

null 57 Dec 21, 2022
pyglet is a cross-platform windowing and multimedia library for Python, for developing games and other visually rich applications.

pyglet pyglet is a cross-platform windowing and multimedia library for Python, intended for developing games and other visually rich applications. It

null 1.3k Jan 1, 2023
Display tabular data in a visually appealing ASCII table format

PrettyTable Installation Install via pip: python -m pip install -U prettytable Install latest development version: python -m pip install -U git+https

Jazzband 924 Jan 5, 2023
Audio book player for senior visually impaired.

PI Zero W Audio Book Motivation and requirements My dad is practically blind and at 80 years has trouble hearing and operating tiny or more complicate

Andrej Hosna 29 Dec 25, 2022