Post by RSThe resultant .mp4 file can be played in VLC,
but MediaInfo shows no metadata.
Hello Richard :-)
If you ended up, for whatever reason,
with an untagged file, you can always (re-)tag
post download with the --tag-only switch:
get_iplayer --type=video --pid=b00gmlrx --tag-only --tag-podcast-tv --tag-only-filename="path\to\Suspicion.mp4"
(I assume you renamed the "Suspicion.partial.mp4" to just "Suspicion.mp4")
Post by RSThe programme is the editorial version of
the 1941 Hitchcock film Suspicion, b00gmlrx
pid=b00gmlrx => vpid=b09c79wx (needed later...)
Post by RSDoes anyone have any idea what causes a parser error?
Answered by Colin; some further analysis below...
Post by RSI'm glad I asked because I hadn't realised
that was where subtitles came from.
I had assumed there was a ready-made .srt
file to download.
On-line media portals (like iPlayer) rarely use the .srt
(subrip text) format, because it's usually incompatible
with their embedded player (Flash based/HTML5 one);
I'm certainly not an expert on this subject, but Flash
based players usually require an XML caption file
(referred to also as DFXP), while HTML5 ones
may use the WebVTT (.vtt) format.
DFXP is s a timed-text format that was developed by W3C
(stands for "Distribution Format Exchange Profile"); it is
currently referred to as TTML, read more at:
https://en.wikipedia.org/wiki/Timed_Text_Markup_Language
GiP will use mediaselector URLs (which contain the vpid string)
to retrieve the URIs pointing to the iPlayer ttml files;
PC/iptv-all/apple-ipad-hls mediasets are tried. The URI
you included in your original post will be found, e.g., in
http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/b09c79wx
(geo-filtered)
in the <media expires="2017-11-21T14:05:00Z" kind="captions"
XML element; this URI is a legacy format, not geo-blocked,
supplier="sis", never expires...
You'll also notice two other URIs for the same subtitles file, these
are the Video Factory flavours; they are served from Akamai/Limelight,
are UK-only and tokenised, with limited lifespans;
but ALL 3 URIs point to the same file!
GiP fetches the XML subs file (which is referred to as "raw"
in GiP terminology) and then, through a dedicated perl subroutine
("ttml_to_srt", line 6588 of 3.05 script) converts it to .srt;
--subsraw flag will let you also keep the original file...
Post by RSI see from --info there are three subtitle modes.
I used GiP 3.05 and the following command:
perl get_iplayer-305w.pl --type=tv --pid=b00gmlrx -i --streaminfo >
Streams.txt 2>&1
and yes, there are 3 captions modes identified,
but, alas, I can sure tell there's a bug in the
detection scheme somewhere; no sign of the
legacy format, plus there's duplication, as
subtitles3=subtitles1
==================================
stream: subtitles1
bitrate:
expires: 2017-11-21T14:05:00Z
ext: srt
priority: 20
size: 118212
streamer: http
streamurl:
http://vod-sub-uk-live.bbcfmt.hs.llnwd.net/iplayer/subtitles/ng/moda
v/bUnknown-591e0c64-779b-4f16-9582-bd3bc6c441bd_b09c79wx_1508034118207.xml?s=150
8878211&e=1508921411&h=c1d8bb45cd85f418d83103af0ef1979a
type: (captions) http stream (CDN: mf_limelight_uk_plain/20)
stream: subtitles2
bitrate:
expires: 2017-11-21T14:05:00Z
ext: srt
priority: 10
size: 118212
streamer: http
streamurl:
http://vod-sub-uk-live.akamaized.net/iplayer/subtitles/ng/modav/bUnk
nown-591e0c64-779b-4f16-9582-bd3bc6c441bd_b09c79wx_1508034118207.xml?__gda__=150
8921411_8042e3b62cef7eb303c0b44d69225c99
type: (captions) http stream (CDN: mf_akamai_uk_plain/10)
stream: subtitles3
bitrate:
expires: 2017-11-21T14:05:00Z
ext: srt
priority: 20
size: 118212
streamer: http
streamurl:
http://vod-sub-uk-live.bbcfmt.hs.llnwd.net/iplayer/subtitles/ng/moda
v/bUnknown-591e0c64-779b-4f16-9582-bd3bc6c441bd_b09c79wx_1508034118207.xml?s=150
8878211&e=1508921411&h=c1d8bb45cd85f418d83103af0ef1979a
type: (captions) http stream (CDN: mf_limelight_uk_plain/20)
==================================
but all three point to the same file!
Now, if you load the legacy URL
http://www.bbc.co.uk/iplayer/subtitles/ng/modav/bUnknown-591e0c64-779b-4f16-9582-bd3bc6c441bd_b09c79wx_1508034118207.xml
Post by RSXML Parsing Error: not well-formed
Location: (The URI)
SUSPICION
Right-click -> View Page Source
and you'll be able to view the file contents
and actually visualise the corruption:
Loading Image...With the aid of Fx's Page Source and
a Text Editor, I managed to reconstitute
a proper TTML file, then used SubtitleEdit
to convert to (monochrome) .srt.
If you're in need of it, contact me off-list...
Post by RSIf I try --subtitles-only --tvmode=subtitles2
it tells me No media streams found.
I don't think subtitle mode user selection is supported;
legacy GiP code assumed only one captions mode,
so this could be a new requested feature; I see no
reason for it though; all modes point to the same file,
negligible speed differences between CDNs for such
small files of just a few KBs...
Post by RSget_iplayer ought when unable to download subtitles
successfully to continue to call AtomicParsley to add metadata.
While in this case it's not the actual downloading that failed,
but rather the conversion to .srt (e.g. you can fetch the raw
corrupt ttml with --subsraw), I too agree with that.
After another series of tests made, of note is the fact
that every GiP version from 3.00 onwards does fail to
convert this corrupted subtitles file, but, lo-and-behold,
v2.99 does so successfully:
=======================================
get_iplayer v2.99, Copyright (C) 2008-2010 Phil Lewis
This program comes with ABSOLUTELY NO WARRANTY; for details
use --warranty.
This is free software, and you are welcome to redistribute it under
certain
conditions; use --conditions for details.
NOTE: A UK TV licence is required to legally access BBC iPlayer TV content
INFO Trying to download PID using type tv
INFO: pid found in cache
Matches:
5276: Suspicion - -, BBC Two, b00gmlrx
WARNING: Could not download programme metadata from
http://www.bbc.co.uk/program
mes/b00gmlrx.xml
INFO: Downloading Subtitles to 'D:\Vangelis\iPlayer
Recordings/Suspicion_-__b00g
mlrx_editorial.srt'
=======================================
Actually, this is not a fluke; prior to 3.00,
GiP would produce monochrome .srt files, so,
without examining the code itself, I suspect
the older TTML parsing code was more forgiving...
Post by RSFurther, if subtitles1 fails it ought to
try subtitles2 and subtitles3
Again, it isn't the actual download that failed,
but the conversion; since all 3 (2 by my tests)
modes point to same file, conversion of the other
two should fail also; we still don't know at which
stage the corruption took place; I'm presuming
during file generation, not during upload to CDNs (?).
Now, if the actual download failed, then I see
your point as a valid one... I won't pretend I fully
understand the actual perl code, but perl wizards
could enlighten us as to actual content of GiP
subroutines "subtitles_available" & "download_subtitles";
my hunch is GiP already does what you suggest,
as far as downloading is concerned...
Apologies for the length of this post and thanks
to those that stayed to read the end of it...
Kindest regards,
Vangelis.