New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic site detection: support sites based on "sheeta" #9541
Comments
For devs: See these extractors for how to implement a embed-only extractor |
This is too few markers to start a js download on a generic page imo. Are there any other clues we can use to identify? |
Hi, @pukkandan!
Well, maybe these:
. |
One of these should be sufficient to avoid false positives |
Unfortunately, I'm unable to download the webpage because of a fatal yt-dlp/yt_dlp/extractor/generic.py Lines 2392 to 2395 in e5d4f11
Is there a way to make it |
would something like this work? try:
full_response = self._request_webpage(url, video_id, headers=filter_dict({
'Accept-Encoding': 'identity',
'Referer': smuggled_data.get('referer'),
}))
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 404:
full_response = e.cause.response
first_bytes = full_response.read(512)
if not is_html(first_bytes):
raise
self._downloader.write_debug('Got HTTP Error 404, looking for embeds in response body')
webpage = self._webpage_read_content(
full_response, url, video_id, prefix=first_bytes)
embeds = list(self._extract_embeds(original_url, webpage, urlh=full_response))
if len(embeds) == 1:
return embeds[0]
elif embeds:
return self.playlist_result(embeds)
raise obviously there's a lot of code duplication happening in the or maybe there's a better way of doing it altogether |
Or Might we want to be able to carry on despite any HTTP error response, not just 404? But any general solution to that (say, a |
This is such absurd behavior! While we decide on a proper solution, you can temporarily add
I like this idea. We can set a default behavior of
It is impossible by any extractor-specific hack since the error is before the generic extractor hands request over to IE |
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Region
any
Example URLs
Provide a description that is worded well enough to be understood
Summary
Official website: sheeta | ファンが集まる・成長する次世代のファンクラブシステム (Japanese)
To find more sites: "登録" "(C) DWANGO Co., Ltd." - Google Search
Characteristics
Webpage:
JavaScript:
The CSS is nothing special from my point of view.
.
Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)'verbose': True
toYoutubeDL
params instead[debug] Command-line config
) and insert it belowComplete Verbose Output
The text was updated successfully, but these errors were encountered: