Bug 1139749 - Claws-mail-rssyl multiplicates posts coming from blog-ciah.rhcloud.com via fedora planet
Summary: Claws-mail-rssyl multiplicates posts coming from blog-ciah.rhcloud.com via fe...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: claws-mail
Version: 20
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Andreas Bierfert
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-09 14:41 UTC by Martin Sourada
Modified: 2015-01-16 10:34 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-01-16 10:34:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Just the offending post in a feed (1.29 KB, text/plain)
2014-09-12 04:15 UTC, Martin Sourada
no flags Details

Description Martin Sourada 2014-09-09 14:41:10 UTC
Description of problem:
Recently there appeared two (empty) posts, both from blog-ciah.rhcloud.com, incidentally both from Perl SIG, that get duplicated everytime news are fetched. This isn't happening on the planet web page, nor I see it in the atom/rss data, so I guess it's some bug in claws.

Version-Release number of selected component (if applicable):
claws-mail-plugins-rssyl-3.10.1-1.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Add Fedora Planet to feed list
2. Refresh
3. Refresh again

Actual results:
There are as many Perl SIG posts as there are refreshes.

Expected results:
Every post appears once.

Additional info:
I successfully reproduced it with atom and rss 2 feeds, didn't try the rest. The first post is called "Perl SIG: Announcing Tangerine", the other is "Perl SIG: glibc issue while build in EPEL". The glibc one is from 14/09/07 (Sun), the tangerine one from 14/09/08 (Mon). I'm not sure for how long fedora planet keeps data for old posts, this might appear "fixed" when these two posts disappear from the feed...

Comment 1 Michael Schwendt 2014-09-11 19:06:41 UTC
> I successfully reproduced it with atom and rss 2 feeds,

Mixed feelings from my side.

You could have provided a direct link to the feed. Plus, saving snapshots of the downloaded file (e.g. with wget/curl) while reproducing the issue might have been helpful.

Without that work being done it takes considerable effort to try to reproduce the problem and rule out that it's caused by a feed update.

http://www.thewildbeast.co.uk/claws-mail/bugzilla/buglist.cgi?query_format=advanced&order=Importance&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&component=Plugins%2FRSSyl&product=Claws%20Mail

Comment 2 Martin Sourada 2014-09-12 04:05:31 UTC
(In reply to Michael Schwendt from comment #1)
> > I successfully reproduced it with atom and rss 2 feeds,
> 
> Mixed feelings from my side.
> 
> You could have provided a direct link to the feed. Plus, saving snapshots of
> the downloaded file (e.g. with wget/curl) while reproducing the issue might
> have been helpful.
> 
Ah, yes, good point.

The feeds I tried are:
http://planet.fedoraproject.org/atom.xml
http://planet.fedoraproject.org/rss20.xml

The latest posts this happens with points to 
http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e

and is empty. It appears in the planet once, but I have tens of it in claws already. The feed itself does not need to be updated, just claws "refresh feed" is enough to make it multiplicate (one is added with every refresh).

In the RSS 2.0 feed it looks like (I suppose you don't need the whole feed):
<item>
<title>Perl SIG: Perl 5.20 rebuild finished</title>
<guid isPermaLink="false">
http://blog-ciah.rhcloud.com/<540F2280.9050400>
</guid>
<link>
http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e
</link>
<pubDate>Tue, 09 Sep 2014 15:53:36 +0000</pubDate>
</item>

In Atom 1.0:
<entry>
<title type="html">Perl 5.20 rebuild finished</title>
<link href="http://blog-ciah.rhcloud.com/#i3c35343046323238302e39303530343030407265646861742e636f6d3e"/>
<id>
http://blog-ciah.rhcloud.com/<540F2280.9050400>
</id>
<updated>2014-09-09T15:53:36+00:00</updated>
<content type="html">
<img src="http://planet.fedoraproject.org/images/heads/default.png" alt="" style="float: right;">
</content>
<author>
<name>Jitka Plesnikova</name>
<email>jplesnik</email>
<uri>http://blog-ciah.rhcloud.com/</uri>
</author>
<source>
<title type="html">Camel in a Hat</title>
<subtitle type="html">Generated from &lt;https:&gt;&lt;/https:&gt;</subtitle>
<link rel="self" href="http://blog-ciah.rhcloud.com/rss.xml"/>
<id>http://blog-ciah.rhcloud.com/</id>
</source>
</entry>

If you still need more info, I'll try to provide it. 

It looks similar to
http://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=2197

(the screenshot with patch applied), but is probably different, as the title doesn't change.

Comment 3 Martin Sourada 2014-09-12 04:15:52 UTC
Created attachment 936725 [details]
Just the offending post in a feed

The attached feed contains *only* offending post and claws just duplicates it on refresh, i.e. does not multiplicate more than once. I guess the behaviour is slightly more complex than I though. I need to update feed update time (<updated>) in order for claws to make the post triplicate, quadruplicate, ...

Comment 4 Martin Sourada 2014-09-12 04:22:53 UTC
(In reply to Martin Sourada from comment #3)
> update time (<updated>) in order for claws to make the post triplicate,
> quadruplicate, ...
I just noticed the post itself has the <updated> element, the update is needed for feed's <updated>, not post's.

Comment 5 Michael Schwendt 2014-09-12 12:09:31 UTC
I've filed:
http://www.thewildbeast.co.uk/claws-mail/bugzilla/show_bug.cgi?id=3282

Claws Mail saves the feed item to ~/.claws-mail/RSSyl/Fedora Planet/ and loads and parses it upon refreshing the feed, but doesn't recognize it as seen before.

Comment 6 Michael Schwendt 2014-09-13 21:33:16 UTC
Examined this further..

rssyl_cb_feed_compare() doesn't recognize the post as seen before, because somewhere a different "id" string is generated for it:

  The compare function searches for this id (a):
    http://blog-ciah.rhcloud.com/

  Claws Mail saved this id (b):
    http://blog-ciah.rhcloud.com/<540F2280.9050400>

rssyl_parse_folder_item_file() strips off from the Message-ID, and Claws Mail debug tells:

  RSSyl: got id 'http://blog-ciah.rhcloud.com/'

Comment 7 Michael Schwendt 2014-09-14 08:59:23 UTC
A fix has been committed upstream:

http://git.claws-mail.org/?p=claws.git;a=commitdiff;h=9e4d6e44b4b9324109d5d1aa045a94332109cc96

Comment 8 Petr Pisar 2014-10-21 14:59:36 UTC
I have not idea how the RSS plug-in interacts with the e-mail client, but RSS specification states the RSS guid element must be considered opaque strings without any meaning. Contrary, the commit parses something like an e-mail address. I hope upstream knows what he does.

(Yes, I'm the editor of the blog.)

Comment 9 Andreas Bierfert 2015-01-16 10:34:00 UTC
Should be fixed in 3.11.1


Note You need to log in before you can comment on or make changes to this bug.