Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WDRPage] No suitable extractor (WDR) found for URL / wdrmaus.de #9918

Open
10 of 11 tasks
joshinils opened this issue May 13, 2024 · 4 comments
Open
10 of 11 tasks

[WDRPage] No suitable extractor (WDR) found for URL / wdrmaus.de #9918

joshinils opened this issue May 13, 2024 · 4 comments
Labels
site-bug Issue with a specific website

Comments

@joshinils
Copy link

joshinils commented May 13, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

No response

Provide a description that is worded well enough to be understood

Simply entering yt-dlp https://www.wdrmaus.de/aktuelle-sendung/ worked for a long time.
The last episode my cronjob downloaded is from Sunday the 21st of April 2024.

The current episode (Aktuelle Sendung) from the 12th of May is also available from the ardmediathek: https://www.ardmediathek.de/video/die-sendung-mit-der-maus/die-sendung-vom-12-05-2024/das-erste/Y3JpZDovL3dkci5kZS9CZWl0cmFnLXNvcGhvcmEtMTkyYjZlZmUtZjdlYS00OTdmLWE4MmQtYmYwOTdjZWVkNzcx?isChildContent

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', 'https://www.wdrmaus.de/aktuelle-sendung/']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2024.04.09 from yt-dlp/yt-dlp [ff0779267] (pip)
[debug] Python 3.11.9 (CPython x86_64 64bit) - Linux-6.5.0-5-amd64-x86_64-with-glibc2.38 (OpenSSL 3.2.2-dev , glibc 2.38)
[debug] exe versions: ffmpeg 6.1.1-4 (setts), ffprobe 6.1.1-4
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.11.17, mutagen-1.46.0, requests-2.31.0, sqlite3-3.45.3, urllib3-1.26.18, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1810 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: stable@2024.04.09 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2024.04.09 from yt-dlp/yt-dlp)
[debug] Using fake IP 53.206.151.65 (DE) as X-Forwarded-For
[WDRPage] Extracting URL: https://www.wdrmaus.de/aktuelle-sendung/
[WDRPage] aktuelle-sendung: Downloading webpage
[WDRPage] aktuelle-sendung: Downloading JSON metadata
[download] Downloading playlist: aktuelle-sendung
[WDRPage] Playlist aktuelle-sendung: Downloading 1 items of 1
[download] Downloading item 1 of 1
ERROR: No suitable extractor (WDR) found for URL https://deviceids-medp.wdr.de/ondemand/ora-192b6efe-f7ea-497f-a82d-bf097cee/ora-192b6efe-f7ea-497f-a82d-bf097ceed771.js
  File "/home/jl/.local/bin/yt-dlp", line 8, in <module>
    sys.exit(main())
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/__init__.py", line 1072, in main
    _exit(*variadic(_real_main(argv)))
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/__init__.py", line 1062, in _real_main
    return ydl.download(all_urls)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 3572, in download
    self.__download_wrapper(self.extract_info)(
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 3547, in wrapper
    res = func(*args, **kwargs)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1595, in extract_info
    return self.__extract_info(url, self.get_info_extractor(key), download, extra_info, process)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1606, in wrapper
    return func(self, *args, **kwargs)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1762, in __extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1891, in process_ie_result
    return self.__process_playlist(ie_result, download)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 2035, in __process_playlist
    entry_result = self.__process_iterable_entry(entry, download, collections.ChainMap({
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1606, in wrapper
    return func(self, *args, **kwargs)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 2067, in __process_iterable_entry
    return self.process_ie_result(
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1841, in process_ie_result
    return self.extract_info(
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1598, in extract_info
    self.report_error(f'No suitable extractor{format_field(ie_key, None, " (%s)")} found for URL {url}',
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1073, in report_error
    self.trouble(f'{self._format_err("ERROR:", self.Styles.ERROR)} {message}', *args, **kwargs)
  File "/home/jl/.local/lib/python3.11/site-packages/yt_dlp/YoutubeDL.py", line 1001, in trouble
    tb_data = traceback.format_list(traceback.extract_stack())

[download] Finished downloading playlist: aktuelle-sendung
@joshinils joshinils added site-bug Issue with a specific website triage Untriaged issue labels May 13, 2024
@joshinils joshinils changed the title wdrmaus https://www.wdrmaus.de/aktuelle-sendung/ wdrmaus https://www.wdrmaus.de/aktuelle-sendung/ no longer works May 13, 2024
@joshinils
Copy link
Author

I know that WDR as such is supported, I don't know if the actual URL I used is, but I see no reason it shouldn't be (anymore?).

@bashonly bashonly changed the title wdrmaus https://www.wdrmaus.de/aktuelle-sendung/ no longer works [WDRPage] No suitable extractor (WDR) found for URL / wdrmaus.de May 13, 2024
@bashonly bashonly removed the triage Untriaged issue label May 13, 2024
@joshinils
Copy link
Author

I've also now noticed that ardmediathek.de provides a thumbnail, where wdrmaus.de never did.

@dirkf
Copy link
Contributor

dirkf commented May 15, 2024

The Maus pages nowadays have the URL of an "assetjsonp" file as the value of the data-extension-ard (-ard appeared last year) attribute of the player element (class videoButton/audioButton).

The assetjsonp file has the structure of a JS function call whose one argument is a JSON object. The JSON may have two main branches: under .trackerData is useful metadata including a trackerClipId; under .mediaResource there may be HLS and mp4 media links as well as subtitle links.

Until the episode now linked at sendung_4.php5 (which will roll over each week), the trackerClipId was like mdb-3104116 where the numeric part was a valid ID for the WDRIE extractor. The Maus extractor pulled that value and redirected to the main extractor, where a similar but different assetjsonp resource (eg, http://deviceids-medp-id1.wdr.de/ondemand/310/3104116.js) was fetched and from which the media links were extracted.

With the new Maus pages, the trackerClipId is like sophora-192b6efe-f7ea-497f-a82d-bf097ceed771. sophora was also seen with NDR, but we don't have an extractor for that ID (I think?).

For the Maus pages, there are two options:

  • currently the media links in the assetjsonp embed the WDR ID like .../310/3104116/..., so we can extract that and continue as before;
  • otherwise, we can make the assetjsonp extraction a method of WDRBase and call it directly.

Additional metadata including thumbnail and upload_date are available in the page's ld+json, with the thumbnail supplied as a 1-item list: this should be extracted here, as it's a valid representation of the VideoObject schema according to the schema.org validator, but wouldn't currently be extracted upstream. A thumbnail template is in the assetjsonp data but with no guide as to the value that needs to be substituted into %%FORMAT%% (eg, 904).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

No branches or pull requests

3 participants