Hello, your idea is really cool and I am glad everything worked for you, but when I looked at your code I was exremely disgusted by a lot of factors.
I spent like an hour trying to describe everything that was wrong in like a non-toxic or passive-agressive manor, but kind of failed to do so. Then I just spent half of a day fixing your code so that it looks and works nice and squeaky-clean.
I am now doing this as a PR and below will describe what I did in each commit, I do not expect you to merge it, but at least check out the huge async optimization (the important paragraph, as well as point 9 below, e405644)
If you don't want to read all of this or find it derogatory in any way that I did all of this, please at least check out how it looks in the final version, it became much smaller, simpler, faster, more readable and just correct, thank you.
IMPORTANT: one huge thing was that I rewrote part where you get a list of viewers to be asynchronous, meaning it makes all 100 requests simultaneously, which is a crazy speedup, I was very tired of sitting there waiting for 100 streams to be checked one-by-one heh.
Also I did some other micro-optimizations but they aren't a huge deal or anything.
-
95d2e31 - remove output images:
When I cloned your repository, I had to download ~200MB which is very very huge for the amount of code you have - this is because you upload images here, and when you make a commit which replaces an image with another one - old one does not get deleted (it is still there, in the git history), this is just how git works without going into details.
So unless the image is like a completely static part of the readme or something, it is considered a bad practice, I'd store them somewhere else.
As in this commit I just removed the current images (just for my clone of the repo to be cleaner) and everything still stays in the history, you'd either make another repository or have fun with hard-resets and force-pushes if you want to decrease the size back.
-
ace3f30 - temp better credentials blah-blah-blah
Made the code just run for myself, the 'store credentials or anything in a separate source file and then import it' is a very bad practice, as you can see, I just load them from a json file there, simple.
This commit was like a quick patch for it to at least work for me, later I will touch them again
-
c90ee59 - remove stdout.flush
I am guessing you had some weird windows-related utf8 console issues because you printed streamer names which might've contained like japanese symbols in them or whatever, removed all of that and then later used streamer logins instead of their names.
-
ca3f92c - format everything, the huge one
So, naming conventions:
A lot of people who don't do programming daily (and I assume you're a great data scientist, but a bad python programmer) don't get why is it such a big deal and why can't they just name things however they like (you seem to like PascalCase which is ugly just by itself, but let me continue).
But the thing is, there is a convention of writing and calling methods (in python) in snake_case, and everyone does that and sees that everyday, and then it becomes an enormous pain in the eyes when you look at something like your code where you did whatever you liked.
Other than that, some other standard python formatting was applied, here is how it looked when I opened your project in an IDE:
Yeah I recommend you to use like PyCharm or intellij with a python plugin or whatever instead of the notepad you were using, it is a great learning tool even because it tells you why are those and those things underlined with warning squiggly lines
-
6a2ba36 - replace simple concats with string interpolation
A really small one, but nobody ever does 'some text ' + str(some_number)
since like forever, huh
-
b244a76 - remove intermediate collections
Another small one, this is those micro-optimizations I was talking about, you converted the same lists (of viewers, which are like huge btw) to sets multiple times, I think it made it noticeably faster
-
702aef7 - no intermediate json in twitch api
Yeah that was kind of a bad api design or your part, you return raw json from one method and then use it in another method, can't even describe how is this a smelly code, it just is. Anyway, was actually wery simple fixed by moving the streamer list extraction to the method where you receive that json :shrug:
-
6d64e65 - move credential reading out of the api
This is what I mentioned in point 2, the api now it not dependent on files or where or how you store the credentials, you just give them to the api
-
e405644 - this is the huge one, the async rewrite
So I used aiohttp instead of requests and wrote everything HTTP-request related in asynchronous manner, so that you can make a ton of requests at the same time, instead of waiting for each of them individually to complete, this made it real fast heh.
-
0debcb4 - fix nan checks
Somewhere before I rewrote your omega-weird str(x) == 'nan'
checks to use proper thing, but never tested it and turns out it was pandas special nans which required pandas special 'pd.isna' check, just a fix for that.
-
03e206c - another microoptimization
So set.contains
is programmed to be much faster than list.contains
at the expense of other things, so if you need to have a 'already did that' check you always use sets, here you had a list so your optimization was kind of meh
-
25a8261 - change the folder structure
Yeah your folder structure was like completely random, I changed that, and also I renamed the files here too because in python files are not PascalCase either.
Also I've merged two files that uses the same duplicate remove_nans
function and did other things here.
I removed your csv files here too because I was unsure where to put them and also similarly to removing images idk, again, not expecting you to actually merge this.
-
2e93585 - refactor the analysis module
Well here I progressively made that joined file better and better by giving meaningful yet short names to methods and variables and so on and so forth.
Also I completely removed pandas and all the nan-related weirdness you had as you seemed to only use pandas (a huge fken math library) only to store data to weirdly formated csv files.
Instead, I am just storing and loading the {streamer->[viewers]} mapping to and from json, no weirdness, its smaller and better idk.
Could also use python pickle library to store it in a binary format to be even smaller/faster.
And also even the json could use like a simple gzip compression to noticeably decrease taken up size, which is not that big to begin with.
-
244ba6f - more refactor, rewrite commends as docstrings, had fun with loggers
Yeah loggers were overkill, for small script projects like this one prints are just fine, I just had fun I guess.
-
d254e9f - rewrite the dict merging blah-blah
Was looking at it a github already and finally understood that I only optimized your interestingly-written combine_dicts
, while it is a common thing that has a common optimal (and very small) solution, so now everything is even smaller.
Also added a check for the token because I kept forgetting to set it to test stuff.