Description of problem: Recently there appeared two (empty) posts, both from blog-ciah.rhcloud.com, incidentally both from Perl SIG, that get duplicated everytime news are fetched. This isn't happening on the planet web page, nor I see it in the atom/rss data, so I guess it's some bug in claws. Version-Release number of selected component (if applicable): claws-mail-plugins-rssyl-3.10.1-1.fc20.x86_64 How reproducible: Always Steps to Reproduce: 1. Add Fedora Planet to feed list 2. Refresh 3. Refresh again Actual results: There are as many Perl SIG posts as there are refreshes. Expected results: Every post appears once. Additional info: I successfully reproduced it with atom and rss 2 feeds, didn't try the rest. The first post is called "Perl SIG: Announcing Tangerine", the other is "Perl SIG: glibc issue while build in EPEL". The glibc one is from 14/09/07 (Sun), the tangerine one from 14/09/08 (Mon). I'm not sure for how long fedora planet keeps data for old posts, this might appear "fixed" when these two posts disappear from the feed...
> I successfully reproduced it with atom and rss 2 feeds, Mixed feelings from my side. You could have provided a direct link to the feed. Plus, saving snapshots of the downloaded file (e.g. with wget/curl) while reproducing the issue might have been helpful. Without that work being done it takes considerable effort to try to reproduce the problem and rule out that it's caused by a feed update. http://www.thewildbeast.co.uk/claws-mail/bugzilla/buglist.cgi?query_format=advanced&order=Importance&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=Plugins%2FRSSyl&product=Claws%20Mail
(In reply to Michael Schwendt from comment #1) > > I successfully reproduced it with atom and rss 2 feeds, > > Mixed feelings from my side. > > You could have provided a direct link to the feed. Plus, saving snapshots of > the downloaded file (e.g. with wget/curl) while reproducing the issue might > have been helpful. > Ah, yes, good point. The feeds I tried are: http://planet.fedoraproject.org/atom.xml http://planet.fedoraproject.org/rss20.xml The latest posts this happens with points to http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e and is empty. It appears in the planet once, but I have tens of it in claws already. The feed itself does not need to be updated, just claws "refresh feed" is enough to make it multiplicate (one is added with every refresh). In the RSS 2.0 feed it looks like (I suppose you don't need the whole feed): <item> <title>Perl SIG: Perl 5.20 rebuild finished</title> <guid isPermaLink="false"> http://blog-ciah.rhcloud.com/<540F2280.9050400> </guid> <link> http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e </link> <pubDate>Tue, 09 Sep 2014 15:53:36 +0000</pubDate> </item> In Atom 1.0: <entry> <title type="html">Perl 5.20 rebuild finished</title> <link href="http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e"/> <id> http://blog-ciah.rhcloud.com/<540F2280.9050400> </id> <updated>2014-09-09T15:53:36+00:00</updated> <content type="html"> <img src="http://planet.fedoraproject.org/images/heads/default.png" alt="" style="float: right;"> </content> <author> <name>Jitka Plesnikova</name> <email>jplesnik</email> <uri>http://blog-ciah.rhcloud.com/</uri> </author> <source> <title type="html">Camel in a Hat</title> <subtitle type="html">Generated from <https:></https:></subtitle> <link rel="self" href="http://blog-ciah.rhcloud.com/rss.xml"/> <id>http://blog-ciah.rhcloud.com/</id> </source> </entry> If you still need more info, I'll try to provide it. It looks similar to http://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=2197 (the screenshot with patch applied), but is probably different, as the title doesn't change.
Created attachment 936725 [details] Just the offending post in a feed The attached feed contains *only* offending post and claws just duplicates it on refresh, i.e. does not multiplicate more than once. I guess the behaviour is slightly more complex than I though. I need to update feed update time (<updated>) in order for claws to make the post triplicate, quadruplicate, ...
(In reply to Martin Sourada from comment #3) > update time (<updated>) in order for claws to make the post triplicate, > quadruplicate, ... I just noticed the post itself has the <updated> element, the update is needed for feed's <updated>, not post's.
I've filed: http://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=3282 Claws Mail saves the feed item to ~/.claws-mail/RSSyl/Fedora Planet/ and loads and parses it upon refreshing the feed, but doesn't recognize it as seen before.
Examined this further.. rssyl_cb_feed_compare() doesn't recognize the post as seen before, because somewhere a different "id" string is generated for it: The compare function searches for this id (a): http://blog-ciah.rhcloud.com/ Claws Mail saved this id (b): http://blog-ciah.rhcloud.com/<540F2280.9050400> rssyl_parse_folder_item_file() strips off from the Message-ID, and Claws Mail debug tells: RSSyl: got id 'http://blog-ciah.rhcloud.com/'
A fix has been committed upstream: http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=9e4d6e44b4b9324109d5d1aa045a94332109cc96
I have not idea how the RSS plug-in interacts with the e-mail client, but RSS specification states the RSS guid element must be considered opaque strings without any meaning. Contrary, the commit parses something like an e-mail address. I hope upstream knows what he does. (Yes, I'm the editor of the blog.)
Should be fixed in 3.11.1