Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Feature Request): Dislike prediction through view and like ratio #99

Open
1 of 2 tasks
NurRaaa opened this issue Nov 30, 2021 · 11 comments
Open
1 of 2 tasks

(Feature Request): Dislike prediction through view and like ratio #99

NurRaaa opened this issue Nov 30, 2021 · 11 comments
Labels
enhancement New feature or request

Comments

@NurRaaa
Copy link

NurRaaa commented Nov 30, 2021

Extension or Userscript?

Extension

Request or suggest a new feature!

This extension could stay for now. But it shouldn't relying on only one source forever.
We still had another data even after dislike data actually disappear from Youtube API that could be used as predicting dislike count by using view count and like count data.

Ways to implement this!

Using view number and dislike number ratio as dislike prediction

Can you work on this?

  • Yes
  • No
@NurRaaa NurRaaa added the enhancement New feature or request label Nov 30, 2021
@aryavsaigal
Copy link
Collaborator

aryavsaigal commented Nov 30, 2021

The prediction will be really inaccurate

@d0gkiller87
Copy link

d0gkiller87 commented Nov 30, 2021

That's the same technique used in this extension. Check the implementation:
https://github.com/Anarios/return-youtube-dislike/blob/129328fc64/Extensions/chrome/return-youtube-dislike.background.js#L96

@aryavsaigal
Copy link
Collaborator

That's the same technique used in this extension. Check the implementation: https://github.com/Anarios/return-youtube-dislike/blob/129328fc64/Extensions/chrome/return-youtube-dislike.background.js#L96

That's using something called averageRating, not just the like and view count.

@d0gkiller87
Copy link

That's using something called averageRating, not just the like and view count.

Fair enough. I agree with you in that case it'll be inaccurate.

@tvelk
Copy link

tvelk commented Dec 1, 2021

Please see my comment in another thread where I explore the possibility of using different video metrics to estimate dislikes as opposed to a one size fits all equation, which may or may not be accurate. It's possible that even using a dynamic model the estimation will still be inaccurate, but it might be the best shot and get close enough to perhaps warn end users that a video is suspect of having a high number of dislikes.

#114 (comment)

@RyannDaGreat
Copy link

I've been gathering a dataset of over 1 million youtube videos' dislike/like ratios, if anybody would like to use this to predict dislikes based on this benchmark let me know and I'll send it!

@RyannDaGreat
Copy link

From my data, it shows that predicting the dislike ratio from the like/view ratio isn't very accurate; BUT; it's still a decent metric. The best correlation I could get between the two is about .45 (when using logs and stuff), and videos with below a 70% like/(like+dislike) ratio could usually be detected from the like/view ratio.

@ChristophGeske
Copy link

From my data, it shows that predicting the dislike ratio from the like/view ratio isn't very accurate; BUT; it's still a decent metric. The best correlation I could get between the two is about .45 (when using logs and stuff), and videos with below a 70% like/(like+dislike) ratio could usually be detected from the like/view ratio.

It would be nice to know if the ratio gets better when only applying it to videos with a certain number of views. I am thinking of only using the ratio on videos with lets say 30.000 or more clicks since at some point the mathematical "Law Of Large Numbers" comes into play making the like count more reliable.

Also the result might get better when taking into account the comments (I guess one doesn't need to save the comments since they are still there one only needs to save the date when the likes/dislikes where saved and then you know if the comment was posted before or after). But this might also make the result worse, because comments can mean approval or disapproval and putting bad data in the prediction would make it worse.

A more sophisticated solution could use machine learning to use as many data points as possible. The ML program could look for key words like "love it", "awesome", "interesting" in the comments and maybe finds other data points which affect the like/dislike ratio like video length or key words in the headline, like "Corona"-videos seem to get a lot of dislikes on YouTube lately.

@tvelk
Copy link

tvelk commented Dec 4, 2021

I've been gathering a dataset of over 1 million youtube videos' dislike/like ratios, if anybody would like to use this to predict dislikes based on this benchmark let me know and I'll send it!

@RyannDaGreat Your initial look is promising that at least something can be done reliably! I'd be interested in having a look at this dataset, if you could share that I'd greatly appreciate it!

It would be nice to know if the ratio gets better when only applying it to videos with a certain number of views.

@ChristophGeske I was thinking along these lines as well, that there may be ranges of views which have better correlation than others to use ratios, and some view ranges where it's just not possible, and another method would need to be found.

Also the result might get better when taking into account the comments

I'm personally hesitant to spend to much time on this route. Many forum commenters seem to like the idea of someone commenting "dislike" and having people "like" that comment to show dislikes. These comments could be deleted I would imagine, and their use might not be consistent enough. It also means that there may be a huge shift in how users comment on the site, meaning any model training done now would not hold up well over time as users shift how they comment on videos. Additionally, as there would not be an easy way to monitor the deletion of negative comments, a "like estimator" based on comments could be manipulated by those who control deletion of negative comments. That reason alone makes me nervous to make a model reliant on comments at all.

@RyannDaGreat
Copy link

@tvelk Sure! I'll get a google drive link posted soon...or perhaps a github repo. I'll put it in this thread once I do.

And yes-your hypothesis is right, I actually only included videos with over 1000 likes+dislikes in the analysis because anything less than that was a really weird distribution (it looks like a really strange shape in a scatterplot). I'll post my results soon

@NurRaaa
Copy link
Author

NurRaaa commented Dec 11, 2021

but in someway, i think it could be mislead
i have two kind different videos

"The Matrix Awakens: An Unreal Engine 5 Experience" by Unreal Engine - https://www.youtube.com/watch?v=WU0gvPcc3jQ
"Cheap USB Microphones." by DankPods - https://www.youtube.com/watch?v=V2Yn_pyCllI
both posted on Dec 10, 2021
and almost have same but slightly different number of likes (36k and 34k, 2k different)
but the view number is really different between two (1M+ and 400k+)

maybe, it needs tune little bit by not just likes and view count data, but other data. Such as video category, tags, and where the video came from..
Which where the video came from i mean is video made by corporation and video made by independent in someway is different to engage their audience, and audience have different relationships between this two. I couldn't get described it correctly, but the point is video from independent channel > video from corporations channel in terms likes count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants