Perl grabber
Here is the perl script project.
Basically they want to clean up the XML
because it is not compliant. After cleaning with the perl script, the
document should be readable with any RSS 2.0 newsreader.
Within the perl script, grab rss news feed (xml document) from the following
URL (using LWP or some kind of module in perl) - we are using perl 5.8.3
(you can use any CPAN module that you need)
http://hrw.org/doc/?t=news_rss
1. Delete all extra blank lines.
2. Search and replace special characters with equivalent ASCII codes. The
following characters are non-ASCII (they come from Microsoft word). In this
case, they want the resulting XML to contain only ASCII characters. This
list may not be complete but we should be able to add to it (also maybe
there is a module already that performs this function).
Quotation Marks (both close quotes and open quotes),
Apostrophes ('')
M-Dash
é
3. remove any HTML coding (for example)
4. Remove all html comments
Только perl!! В приват ничего писать не надо, указывайте все в заявке