Ralph Corderoy
2018-03-04 13:39:36 UTC
Hi,
I noticed get_iplayer showing
Rothaà Móra an tSaoil: Series 1
and wondered if it was a bug, but the BBC's JSON has
$ curl -sS https://www.bbc.co.uk/programmes/b09w6dhm.json |
"Rotha\u00c3\u00ad M\u00c3\u00b3ra an tSaoil"
$
and get_iplayer is correctly showing U+c3 and U+ad after `Rotha'.
The problem is the BBC have taken a UTF-8 encoding of the intended rune
and encoded it again as UTF-8.
$ iconv -f utf-8 -t ucs-2be <<<$'\xc3\xad \xc3\xb3' |
0000010
$
Thus the title is meant to be
$ printf 'Rotha\u00ed M\u00f3ra an tSaoil\n'
Rothaí Móra an tSaoil
$
Can a BBC lurker please see if they can stop it happening. Thanks.
I noticed get_iplayer showing
Rothaà Móra an tSaoil: Series 1
and wondered if it was a bug, but the BBC's JSON has
$ curl -sS https://www.bbc.co.uk/programmes/b09w6dhm.json |
grep -o '"Roth[^"]*"'
"Rotha\u00c3\u00ad M\u00c3\u00b3ra an tSaoil""Rotha\u00c3\u00ad M\u00c3\u00b3ra an tSaoil"
$
and get_iplayer is correctly showing U+c3 and U+ad after `Rotha'.
The problem is the BBC have taken a UTF-8 encoding of the intended rune
and encoded it again as UTF-8.
$ iconv -f utf-8 -t ucs-2be <<<$'\xc3\xad \xc3\xb3' |
od --endian=big -tx2
0000000 00ed 0020 00f3 000a0000010
$
Thus the title is meant to be
$ printf 'Rotha\u00ed M\u00f3ra an tSaoil\n'
Rothaí Móra an tSaoil
$
Can a BBC lurker please see if they can stop it happening. Thanks.
--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy