Discussion:
BBC Collections
Mike Casswell
2017-10-09 07:21:19 UTC
Permalink
I am unable to download the programmes in bulk, using pid-recursive, at
BBC Collections locations such as
http://www.bbc.co.uk/iplayer/group/p056n6px

This method has worked in the past for apparently similar collections -
but note the use of 'group' in the url which I do not recall from
previous examples. There is no problem using individual pids.

I get 'failed to download json pid info' 'could not determine pid type'
'0 matching programmes' 'could not download programme metadata from..'.

I know I'm being idle but is there a way I might get pid-recursive to
work with these locations? There are a great number of very interesting
programmes available.
--
Mike Casswell
Alan Milewczyk
2017-10-09 10:54:56 UTC
Permalink
Post by Mike Casswell
I am unable to download the programmes in bulk, using pid-recursive, at
BBC Collections locations such as
http://www.bbc.co.uk/iplayer/group/p056n6px
This method has worked in the past for apparently similar collections -
but note the use of 'group' in the url which I do not recall from
previous examples. There is no problem using individual pids.
I get 'failed to download json pid info' 'could not determine pid type'
'0 matching programmes' 'could not download programme metadata from..'.
I know I'm being idle but is there a way I might get pid-recursive to
work with these locations? There are a great number of very interesting
programmes available.
My understanding is that you can't these days, there was a change a
while back that made this method obsolete (you'd have to go through the
release notes of previous versions to determine when the change took
place). Using the individual PIDs is the only way AFAIK. I agree,
though, some superb programming here.


Alan


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
Vangelis forthnet
2017-10-09 18:25:21 UTC
Permalink
at BBC Collections locations such as
http://www.bbc.co.uk/iplayer/group/p056n6px
This method has worked in the past for apparently similar collections -
but note the use of 'group' in the url
which I do not recall from previous examples.
Hi Mike... I don't think "group PIDs" were ever
supported in GiP, only "brand" and "series" ones...
so it might have been a brand/series page that has
worked for you in the past...
Since we can't test things now :-( , it's mostly my
word against your memory, hahah...; but I found a Forum
entry from March 2015 that supports my claim:

https://squarepenguin.co.uk/forums/thread-377-post-2112.html#pid2112
You can't download from the PID for a group, like the one you posted.
there was a change a while back that made this method obsolete
(you'd have to go through the release notes of previous versions
to determine when the change took place)
That change was the removal of the various XML feeds
by the beeb at the end of April (2017). GiP 3.00 was the
first version that tried to alleviate it as much as possible
https://github.com/get-iplayer/get_iplayer/wiki/release300#1-restored-functionality-broken-by-the-bbc
but do note the following limitation:
https://github.com/get-iplayer/get_iplayer/wiki/release300#recursive-downloads
Using the individual PIDs is the only way AFAIK
Currently yes; but I'm sure there's a number of
dextrous list members who could conjure up a
(simple?) script that would parse (web scrape)
"group" pages to harvest the individual PIDs;
just examining page source of
http://www.bbc.co.uk/iplayer/group/p056n6px
I'm seeing href="*" URIs with "#group=p056n6px"
appended to them... That's the ones containing the PIDs.

Wouldn't hurt to request such a feature in
the Forum, either (but, TBH, very unlikely to be
even considered by the dev...; will also be,
like all web scraping, very fragile, even to the
slightest beeb change...).

Best regards all,
Vangelis.
Ralph Corderoy
2017-10-09 18:37:06 UTC
Permalink
Hi Vangelis,
Post by Vangelis forthnet
just examining page source of
http://www.bbc.co.uk/iplayer/group/p056n6px
I'm seeing href="*" URIs with "#group=p056n6px"
appended to them... That's the ones containing the PIDs.
Here in Unix-land, renowned for its text processing...

$ g=p056n6px
$ curl -sS 'http://www.bbc.co.uk/iplayer/group/'$g |
Post by Vangelis forthnet
grep -o 'http://www\.bbc\.co\.uk/[^ ]*#group='$g
http://www.bbc.co.uk/iplayer/episode/p055t73r/the-colony#group=p056n6px
http://www.bbc.co.uk/iplayer/episode/p055vzj1/tuesday-documentary-the-block#group=p056n6px
http://www.bbc.co.uk/iplayer/episode/p055sys5/man-alive-gale-is-dead#group=p056n6px
http://www.bbc.co.uk/iplayer/episode/p053r2q1/waiting-for-work#group=p056n6px
http://www.bbc.co.uk/iplayer/episode/p00gxvjj/borrowed-pasture#group=p056n6px
http://www.bbc.co.uk/iplayer/episode/b0074tkn/40-minutes-heart-of-the-angel#group=p056n6px
$
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
Mike Casswell
2017-10-10 07:25:35 UTC
Permalink
Thanks to both for the advice. What I was expecting, to be honest, but
thought I'd check.

The numbers don't justify, for me, scripts for scraping. I'm manually
'scraping' the pids and downloading them in batches.

Until that magic day when the BBC releases everything they have..
--
Mike Casswell
Vangelis forthnet
2017-10-11 01:22:47 UTC
Permalink
Post by Vangelis forthnet
at BBC Collections locations such as
http://www.bbc.co.uk/iplayer/group/p056n6px
This method has worked in the past for
apparently similar collections -
but note the use of 'group' in the url
which I do not recall from previous examples.
Hi Mike... I don't think "group PIDs" were ever
supported in GiP, only "brand" and "series" ones...
so it might have been a brand/series page that has
worked for you in the past...
... Well, it turns out that you were right and,
still, I am right, too!
Just came across the following "group" listing:

http://www.bbc.co.uk/iplayer/group/b096k7q7

Unlike your previous example (group being
an assortment of different programmes), this
"group PID" does work with --pid-recursive
because it's actually a series PID (you can tell
because all members of the group are
individual episodes of the same series):

get_iplayer --type=tv --pid=b096k7q7 --pid-recursive -i =>

INFO: Series or Brand PID detected
INFO: BBC Four - The Vietnam War - Available now
INFO: Page 1 of 1
INFO: Series 1, Things Fall Apart (January 1968-June 1968) (b097ts0d)
INFO: Series 1, This Is What We Do (July 1967-December 1967) (b097ts0b)
INFO: Series 1, Doubt (January 1966-June 1967) (b096v3f2)
INFO: Series 1, Hell Come to Earth (January 1964-December 1965)
(b096v3dw)
INFO: Series 1, Riding the Tiger (1961-1963) (b096k948)
INFO: Series 1, Deja Vu (1858-1961) (b096k8wz)

I hope this makes it clearer now...

Cheers,
Vangelis.

Loading...