Discussion:
Unable to refresh
RS
2018-03-22 17:57:47 UTC
Permalink
Am I doing something stupid, or is there a problem refreshing the cache?

I have just refreshed with
get_iplayer --refresh --type all
and I get told the number of tv and radio programmes added is 0.

get_player .* --since 70
shows 67 programmes

get_iplayer .* --since 60
show 0 matching programmes
Geoff Smith
2018-03-22 18:15:56 UTC
Permalink
https://forums.squarepenguin.co.uk/showthread.php?tid=1707
Post by RS
Am I doing something stupid, or is there a problem refreshing the cache?
I have just refreshed with
get_iplayer --refresh --type all
and I get told the number of tv and radio programmes added is 0.
get_player .* --since 70
shows 67 programmes
get_iplayer .* --since 60
show 0 matching programmes
_______________________________________________
get_iplayer mailing list
http://lists.infradead.org/mailman/listinfo/get_iplayer
RS
2018-03-22 18:57:32 UTC
Permalink
Post by Geoff Smith
https://forums.squarepenguin.co.uk/showthread.php?tid=1707
Post by RS
Am I doing something stupid, or is there a problem refreshing the cache?
I have just refreshed with
get_iplayer --refresh --type all
and I get told the number of tv and radio programmes added is 0.
get_player .* --since 70
shows 67 programmes
get_iplayer .* --since 60
show 0 matching programmes
Thanks for pointing me to that. I had a gap in my cache as I had been
away, and I wondered if that was causing a problem. Thanks for saving
me an unnecessary investigation.

Best wishes
Richard
Ralph Corderoy
2018-03-22 18:24:06 UTC
Permalink
Hi Richard,
Post by RS
Am I doing something stupid, or is there a problem refreshing the cache?
...
Post by RS
get_player .* --since 70
shows 67 programmes
If you are typing `get_player .* --since 70' into a Linux shell then it
will glob the `.*' and replace it with the expansion, e.g. `. ..',
unless it's quoted. I don't see how that will be affecting your lack of
additions in the last 60 hours, but it clouds the problem. If you are
quoting it, then paste what you run to stop wasting our time. :-)

This will show when PIDs were added to your cache, in order.

sort -t\| -k16,16n -k1,1 ~/.get_iplayer/tv.cache |
gawk -F\| '
!/^#/ {
print strftime("%Y-%m-%d %T %z %a", $16, 1) " " $4 " " $3
}
'
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
RS
2018-03-22 19:25:14 UTC
Permalink
Post by Ralph Corderoy
Hi Richard,
Post by RS
Am I doing something stupid, or is there a problem refreshing the cache?
...
Post by RS
get_player .* --since 70
shows 67 programmes
If you are typing `get_player .* --since 70' into a Linux shell then it
will glob the `.*' and replace it with the expansion, e.g. `. ..',
unless it's quoted. I don't see how that will be affecting your lack of
additions in the last 60 hours, but it clouds the problem. If you are
quoting it, then paste what you run to stop wasting our time. :-)
This will show when PIDs were added to your cache, in order.
sort -t\| -k16,16n -k1,1 ~/.get_iplayer/tv.cache |
gawk -F\| '
!/^#/ {
print strftime("%Y-%m-%d %T %z %a", $16, 1) " " $4 " " $3
}
'
Hi Ralph

Sorry if I've got the command wrong.

get_iplayer .* --since 70
get_iplayer '.*' --since 70
get_iplayer ".*" --since 70

all give 67 programmes

I don't know how to insert your left back quote ` in a Linux Terminal.

I need to go back to the documentation to check what the correct format
is to use with --since

Anyway Geoff Smith has confirmed there is a problem because the layout
of the schedule pages has changed.

Best wishes
Richard
RS
2018-03-22 19:49:17 UTC
Permalink
Post by RS
Post by Ralph Corderoy
If you are typing `get_player .* --since 70' into a Linux shell then it
will glob the `.*' and replace it with the expansion, e.g. `. ..',
unless it's quoted.
Sorry if I've got the command wrong.
get_iplayer .* --since 70
get_iplayer '.*' --since 70
get_iplayer ".*" --since 70
all give 67 programmes
I don't know how to insert your left back quote ` in a Linux Terminal.
I need to go back to the documentation to check what the correct format
is to use with --since
The 3.09 release notes say,
"get_iplayer no longer lists all programmes when invoked without a
search argument. If you wish to list all programmes, you must now
explicitly specify a wildcard search: get_iplayer ".*" - note the
quotes. The Web PVR Manager does that by default. Also remember to use
--refresh to force ad hoc cache updates if you don't supply a search
argument."

"note the quotes" is in bold, but in this case it seems to give the same
results without.

Best wishes
Richard
Ralph Corderoy
2018-03-22 23:40:17 UTC
Permalink
Hi Richard,
Post by Ralph Corderoy
If you are typing `get_player .* --since 70' into a Linux shell
then it will glob the `.*' and replace it with the expansion, e.g.
`. ..', unless it's quoted.
The 3.09 release notes say, "get_iplayer no longer lists all
programmes when invoked without a search argument. If you wish to list
get_iplayer ".*" - note the quotes.
"note the quotes" is in bold, but in this case it seems to give the
same results without.
The instructions here say to add two and two, in bold, but I find
multiplying them works just as well, and raising one to the power of the
other. :-)

«.*» glob'd by the shell to «.» and «..», and perhaps other things, then
gives get_iplayer two or more regexps and it tries to match any of them.
It's not your intent. Convention is to use the strongest quotes
possible to ease the interpretation by the readers. For regexps, that's
single quotes, so «get_iplayer '.*'». But «get_iplayer ^» will give the
same results and needs no quotes.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
RS
2018-03-23 12:48:18 UTC
Permalink
Post by Ralph Corderoy
Hi Richard,
Post by Ralph Corderoy
If you are typing `get_player .* --since 70' into a Linux shell
then it will glob the `.*' and replace it with the expansion, e.g.
`. ..', unless it's quoted.
The 3.09 release notes say, "get_iplayer no longer lists all
programmes when invoked without a search argument. If you wish to list
get_iplayer ".*" - note the quotes.
"note the quotes" is in bold, but in this case it seems to give the
same results without.
The instructions here say to add two and two, in bold, but I find
multiplying them works just as well, and raising one to the power of the
other. :-)
«.*» glob'd by the shell to «.» and «..», and perhaps other things, then
gives get_iplayer two or more regexps and it tries to match any of them.
It's not your intent. Convention is to use the strongest quotes
possible to ease the interpretation by the readers. For regexps, that's
single quotes, so «get_iplayer '.*'». But «get_iplayer ^» will give the
same results and needs no quotes.
Hi Ralph

It can be easy to over-complicate problems. I am as guilty of that as
anyone.

The primary symptom was that when I refreshed the cache I was told 0
programmes had been added. I was aware that the algorithm for scraping
the schedules had been described as fragile.

When I made my original post I was hoping someone else would try it to
confirm whether there was general problem, or one that only affected me.
If I had looked at the forum I would have seen that had already been done.

My use of --since was by way of confirmation. As it happened I
mis-remembered the command and was too lazy to check the documentation.
Even so I found that with a --since argument less than 60, no programmes
were found, and with an argument over 70, programmes were found. That
was consistent with the dates in some news programmes found which were 3
days earlier.

I got exactly the same results when I inserted the missing quotes. Why
I got the same results is beyond my level of knowledge. If I had
remembered that the search string was a regex I agree I could have used ^

I have come across this in relation to bash
"The characters *, ? and [ are called glob characters or wild card
characters. If an unquoted argument contains one or more glob
characters, the shell processes the argument for file name generation.
The glob characters are part of glob patterns which represent file and
directory names. These patterns are similar to regular expressions, but
differ in syntax, since they are intended to match file names and words
(not arbitrary strings). The special constructions that may appear in
glob patterns are: ... "

What that seems to mean is that without quotes around *. a file name or
word can be matched by bash and with quotes an arbitrary string can be
matched as a regex. It is not clear to me why that matters. One thing
that does not appear to have happened is infinite recursion, or even
matching of additional programmes.

As for using a single quote ' as the strongest quote, I suspect the
documentation has used double quotes " for compatibility with Windows.

Best wishes
Richard
Ralph Corderoy
2018-03-24 09:43:44 UTC
Permalink
Hi Richard,
Post by Ralph Corderoy
If you are typing `get_player .* --since 70' into a Linux shell
I have come across this in relation to bash "The characters *, ? and [
are called glob characters or wild card characters. If an unquoted
argument contains one or more glob characters, the shell processes the
argument for file name generation. The glob characters are part of
glob patterns which represent file and directory names. These patterns
are similar to regular expressions, but differ in syntax, since they
are intended to match file names and words (not arbitrary strings).
The special constructions that may appear in glob patterns are: ... "
What that seems to mean is that without quotes around *. a file name
or word can be matched by bash and with quotes an arbitrary string can
be matched as a regex. It is not clear to me why that matters.
The shell is expanding globs before invoking get_iplayer, thus they're
not seen by get_iplayer if they match anything. If they don't match
anything then they normally remain and are passed to get_iplayer anyway.
For the arguments get_iplayer does see, it decides to interpret some of
them as regexps.

«get_iplayer Railway» has no glob metacharacters to expand so one
argument is passed to get_iplayer, it uses it as a regexp, it has no
regexp metacharacters so effectively is a substring search of the
titles.

«get_iplayer R.*way» has a glob metacharacter, the «*», the shell looks
at the current directory for entries starting «R.» and ending «way».
There are none. The glob remains, unexpanded. get_iplayer has one
argument, «R.*way» that it uses as a regexp. There's two regexp
metacharacters, «.*», meaning zero or more of any character, used in the
search.

«get_iplayer R.*way» is run again, and again has a glob, the «*». This
time, the current directory has «R.steinway» in it. The argument with
the glob is expanded into that and get_iplayer has one argument,
«R.steinway», that's used as a regexp. It's unlikely to match any
titles, e.g. «Resteinway».

To avoid glob expansion, quote the glob metacharacters, «get_iplayer
'R.*way'», and get_iplayer sees the regexp «R.*way».
One thing that does not appear to have happened is infinite recursion,
or even matching of additional programmes.
Your unquoted «.*» on Linux would often expand to «. ..», and perhaps
more if you've other `dot' files present. These are two regexps
interpreted by get_iplayer. It prints titles matching either. Since
anything matching the second is also matched by the first, you are
seeing any title at least one character long. That's almost like «.*»
and «^» except that a zero-length title won't be matched.
As for using a single quote ' as the strongest quote, I suspect the
documentation has used double quotes " for compatibility with Windows.
Yes, I expect you're right. Fortunately, I've only had a little
exposure to that in the days of DOS. :-) If Windows doesn't treat «^»
specially then that could be used instead with no quoting needed.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
Jeremy Nicoll - ml gip
2018-03-24 09:56:29 UTC
Permalink
Post by Ralph Corderoy
Yes, I expect you're right. Fortunately, I've only had a little
exposure to that in the days of DOS. :-) If Windows doesn't treat
«^» specially then that could be used instead with no quoting needed.
The ^ character is Windows' escape character for command-line use.
--
Jeremy Nicoll - my opinions are my own
RS
2018-03-24 11:29:07 UTC
Permalink
Post by Jeremy Nicoll - ml gip
Yes, I expect you're right.  Fortunately, I've only had a little
exposure to that in the days of DOS.  :-)  If Windows doesn't treat
«^» specially then that could be used instead with no quoting needed.
The ^ character is Windows' escape character for command-line use.
If you look at
https://github.com/get-iplayer/get_iplayer/wiki/custcmd
and
https://github.com/get-iplayer/get_iplayer/wiki/aactomp3
there are examples of the different quoting needed for Unix/macOS and
Windows.

Best wishes
Richard
CJB
2018-03-24 11:44:26 UTC
Permalink
So can we expect to see a new version of GiP if the Beeb has rejigged
its schedule pages? Thanks - CJB
Ralph Corderoy
2018-03-24 11:56:08 UTC
Permalink
Post by CJB
So can we expect to see a new version of GiP if the Beeb has rejigged
its schedule pages?
Yes.

RS
2018-03-24 11:17:55 UTC
Permalink
Post by Ralph Corderoy
Post by Ralph Corderoy
If you are typing `get_player .* --since 70' into a Linux shell
The shell is expanding globs before invoking get_iplayer, thus they're
not seen by get_iplayer if they match anything. If they don't match
anything then they normally remain and are passed to get_iplayer anyway.
For the arguments get_iplayer does see, it decides to interpret some of
them as regexps.
«get_iplayer Railway» has no glob metacharacters to expand so one
argument is passed to get_iplayer, it uses it as a regexp, it has no
regexp metacharacters so effectively is a substring search of the
titles.
«get_iplayer R.*way» has a glob metacharacter, the «*», the shell looks
at the current directory for entries starting «R.» and ending «way».
There are none. The glob remains, unexpanded. get_iplayer has one
argument, «R.*way» that it uses as a regexp. There's two regexp
metacharacters, «.*», meaning zero or more of any character, used in the
search.
«get_iplayer R.*way» is run again, and again has a glob, the «*». This
time, the current directory has «R.steinway» in it. The argument with
the glob is expanded into that and get_iplayer has one argument,
«R.steinway», that's used as a regexp. It's unlikely to match any
titles, e.g. «Resteinway».
To avoid glob expansion, quote the glob metacharacters, «get_iplayer
'R.*way'», and get_iplayer sees the regexp «R.*way».
One thing that does not appear to have happened is infinite recursion,
or even matching of additional programmes.
Your unquoted «.*» on Linux would often expand to «. ..», and perhaps
more if you've other `dot' files present. These are two regexps
interpreted by get_iplayer. It prints titles matching either. Since
anything matching the second is also matched by the first, you are
seeing any title at least one character long. That's almost like «.*»
and «^» except that a zero-length title won't be matched.
Hi Ralph

I can see that in some special cases leaving out the quotes will give
wrong results. I only wanted a crude indication of how long it had been
since refreshing the cache had been working, but that is no excuse for
getting it wrong.

I clearly still need to think it through further. I would have expected
get_iplayer * --since 110
to match 67 programmes, but it matches 0.

I would have expected
get_iplayer '*' --since 110
to match 0 programmes but it matches 67.

If I put in an invalid regex

get_iplayer *. --since 110
get_iplayer '*.' --since 110

both seem to reach Perl as a regex even though the first has a bash
wildcard.

Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE
./ at /usr/bin/get_iplayer line 1245.

Best wishes
Richard
Mark Carroll
2018-03-24 11:26:28 UTC
Permalink
Post by RS
I can see that in some special cases leaving out the quotes will give
wrong results. I only wanted a crude indication of how long it had been
since refreshing the cache had been working, but that is no excuse for
getting it wrong.
I clearly still need to think it through further. I would have expected
get_iplayer * --since 110
to match 67 programmes, but it matches 0.
(snip)

You can type 'echo' first to see what get_iplayer is actually getting:

$ ls
another thing
$ echo get_iplayer * --since 110
get_iplayer another thing --since 110

bash won't expand unmatched wildcards:

$ echo a* b*
another b*

-- Mark
Ralph Corderoy
2018-03-24 11:54:34 UTC
Permalink
Hi Richard,
Post by RS
If they don't match anything then they normally remain and are
passed to get_iplayer anyway.
...
Post by RS
The argument with the glob is expanded into that and get_iplayer has
one argument, «R.steinway», that's used as a regexp. It's unlikely
to match any titles, e.g. «Resteinway».
...
Post by RS
I clearly still need to think it through further.
Or, study my worked example above until it's understood. :-)
Post by RS
I would have expected
get_iplayer * --since 110
to match 67 programmes, but it matches 0.
«*» is probably being expanded by the shell based on entries in the
current directory.
Post by RS
I would have expected
get_iplayer '*' --since 110
to match 0 programmes but it matches 67.
get_iplayer, for some unknown reason, has

my @search_args = map { $_ eq "*" ? ".*" : $_ } @ARGV;

to break its consistency over argument handling. It treats these two
the same. A bad idea IMO.

get_iplayer '*'
get_iplayer '.*'
Post by RS
If I put in an invalid regex
get_iplayer *. --since 110
get_iplayer '*.' --since 110
both seem to reach Perl as a regex even though the first has a bash
wildcard.
Again, the worked example predicts this and covers it. Create «a-dot-.»
in the current directory and try again.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
J K.Eason
2018-03-22 19:09:00 UTC
Permalink
*Date:* Thu, 22 Mar 2018 17:57:47 +0000
Am I doing something stupid, or is there a problem refreshing the
cache?
I have just refreshed with
get_iplayer --refresh --type all
and I get told the number of tv and radio programmes added is 0.
get_player .* --since 70
shows 67 programmes
get_iplayer .* --since 60
show 0 matching programmes
The BBC have broken GIP by changing the page layout according to a post
by Dinky a short while ago at
https://forums.squarepenguin.co.uk/thread-1707.html

"The layout of BBC schedule pages has just changed, so get_iplayer cache
updates are broken at the moment. Download programmes via --pid or --url
until things are sorted."
--
Regards
John
Mike Ralphson
2018-03-22 19:29:05 UTC
Permalink
Just in case there is an extended period until get-iplayer can refresh its cache again, daily radio and tv cache files are still available from

https://schedules.github.io/ess/
 
Mike
Continue reading on narkive:
Loading...