Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. updated the batch shell file with the following utf8 related feeds: http://www.people.com.cn/rss/world.xml http://www.kidsguide.gr/index.php?option=com_rss&feed=RSS2.0&no_html=1 2. executed the file 3. Actual results: Error messages got displayed: 1. # /usr/share/www-app-blogs-beachead/www-app-blogs-beachead.sh # java.net.UnknownHostException: www.edhatmagazine.com # at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:177) # at java.net.Socket.connect(Socket.java:519) # at java.net.Socket.connect(Socket.java:469) # at sun.net.NetworkClient.doConnect(NetworkClient.java:157) # at sun.net.www.http.HttpClient.openServer(HttpClient.java:382) # at sun.net.www.http.HttpClient.openServer(HttpClient.java:509) # at sun.net.www.http.HttpClient.<init>(HttpClient.java:231) # at sun.net.www.http.HttpClient.New(HttpClient.java:304) # at sun.net.www.http.HttpClient.New(HttpClient.java:316) # at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:817) # at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:769) # at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:694) # at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:938) # at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:174) # at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:151) # at com.redhat.www.blogs.beachhead.RSSAggregator.readFeed(RSSAggregator.java:47) # at com.redhat.www.blogs.beachhead.RSSAggregator.readFeed(RSSAggregator.java:72) # at com.redhat.www.blogs.beachhead.Main.main(Main.java:114) # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:1: HTML parser error : Document is empty 2. # [root@batch1 ~]# /usr/share/www-app-blogs-beachead/www-app-blogs-beachead.sh # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:11: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view& # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:11: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view&id # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:11: HTML parser error : htmlParseEntityRef: expecting ';' # ef="http://www.kidsguide.gr/index.php?option=com_content&task=view&id=137&Itemid # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:12: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view& # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:12: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view&id # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:12: HTML parser error : htmlParseEntityRef: expecting ';' # ef="http://www.kidsguide.gr/index.php?option=com_content&task=view&id=113&Itemid # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:13: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view& # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:13: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view&id # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:13: HTML parser error : htmlParseEntityRef: expecting ';' # ef="http://www.kidsguide.gr/index.php?option=com_content&task=view&id=121&Itemid # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:14: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view& # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:14: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view&id # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:14: HTML parser error : htmlParseEntityRef: expecting ';' # ef="http://www.kidsguide.gr/index.php?option=com_content&task=view&id=108&Itemid # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:14: HTML parser error : htmlParseEntityRef: no name # =com_content&task=view&id=108&Itemid=62" target="_parent">Αθλητισμός & # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:15: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view& # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:15: HTML parser error : htmlParseEntityRef: expecting ';' # <li><a href="http://www.kidsguide.gr/index.php?option=com_content&task=view&id # ^ # /var/www/www.redhat.com/dhtml/beachead/today-tmp.html:15: HTML parser error : htmlParseEntityRef: expecting ';' # ref="http://www.kidsguide.gr/index.php?option=com_content&task=view&id=98&Itemid # ^ # blog beachhead mv operation failed # [batch1.webqa-colo Expected results: Additional info:
Changing $SUBJECT; the issue isn't UTF8, the issue is that we're not appropriately escaping the ampersand '&' character in URLs when we aggregate the various blog links.
Ok, this should be fixed in subversion revision 43. Dan will need to build new rpms. Please note that http://www.people.com.cn/rss/world.xml feed is not encoded as UTF-8; it's a Chinese character set that doesn't map to UTF-8. I'm quite happy w/ it exploding messily in that case until we get clearer requirements.
Changing product to "Red Hat Collaboration Applications".
ACTION: Closing because it is fixed. QA/WEB