Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mixch] mixch extractor is broken #9536

Closed
11 tasks done
nipotan opened this issue Mar 26, 2024 · 1 comment · Fixed by #9608
Closed
11 tasks done

[mixch] mixch extractor is broken #9536

nipotan opened this issue Mar 26, 2024 · 1 comment · Fixed by #9608
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@nipotan
Copy link

nipotan commented Mar 26, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

No response

Provide a description that is worded well enough to be understood

Since around noon Japan time today, the format and content of the content information that MixChannel's server responds to has been updated, and it is no longer possible to extract video information using yt-dlp (stable@2024.03.10).
Previously, the information was extracted from the JSON information in the script tag embedded in the page, but with this update, it seems that Server Side Rendering is no longer performed, and the information is retrieved by calling a different URL that is assembled from the path.

However, since I am not very familiar with python programming, I am not confident in writing proper fixes and proper test code, and I do not know the manner of submitting a pull request. I am not at all familiar with python programming, and I am not confident in writing proper fixes and proper test code.

For that reason, here is a patch of the changes I made

--- yt_dlp/extractor/mixch.py	2020-02-02 09:00:00
+++ yt_dlp/extractor/mixch.py	2024-03-26 17:15:25
@@ -25,10 +25,9 @@
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
-        webpage = self._download_webpage(f'https://mixch.tv/u/{video_id}/live', video_id)
+        webpage = self._download_webpage(f'https://mixch.tv/api-web/users/{video_id}/live', video_id)
 
-        initial_js_state = self._parse_json(self._search_regex(
-            r'(?m)^\s*window\.__INITIAL_JS_STATE__\s*=\s*(\{.+?\});\s*$', webpage, 'initial JS state'), video_id)
+        initial_js_state = self._parse_json(webpage, video_id)
         if not initial_js_state.get('liveInfo'):
             raise UserNotLive(video_id=video_id)
 

I would appreciate it if you could review it and if it looks OK, make a pull request based on a more appropriate manner and incorporate it.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', 'https://mixch.tv/u/17461187/live']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2024.03.10 from yt-dlp/yt-dlp [615a84447] (pip)
[debug] Python 3.12.2 (CPython arm64 64bit) - macOS-14.4-arm64-arm-64bit (OpenSSL 3.2.1 30 Jan 2024)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, sqlite3-3.45.2, urllib3-2.2.1, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1803 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: stable@2024.03.10 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2024.03.10 from yt-dlp/yt-dlp)
[mixch] Extracting URL: https://mixch.tv/u/17461187/live
[mixch] 17461187: Downloading webpage
ERROR: [mixch] 17461187: Unable to extract initial JS state; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/opt/homebrew/Cellar/yt-dlp/2024.03.10/libexec/lib/python3.12/site-packages/yt_dlp/extractor/common.py", line 732, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/yt-dlp/2024.03.10/libexec/lib/python3.12/site-packages/yt_dlp/extractor/mixch.py", line 30, in _real_extract
    initial_js_state = self._parse_json(self._search_regex(
                                        ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/yt-dlp/2024.03.10/libexec/lib/python3.12/site-packages/yt_dlp/extractor/common.py", line 1280, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
@nipotan nipotan added site-bug Issue with a specific website triage Untriaged issue labels Mar 26, 2024
@pukkandan pukkandan added the patch-available There is patch available that should fix this issue. Someone needs to make a PR with it label Mar 26, 2024
@bashonly
Copy link
Member

use self._download_json instead of self._download_webpage+self._parse_json

@bashonly bashonly removed the triage Untriaged issue label Mar 26, 2024
@bashonly bashonly mentioned this issue Apr 3, 2024
5 tasks
bashonly added a commit that referenced this issue Apr 3, 2024
Closes #9536
Authored by: bashonly, nipotan
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024
Closes yt-dlp#9536
Authored by: bashonly, nipotan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants