Bug 458513

Summary: Concurrent connections fail randomly in Konqueror, Kmail, Akregator, etc.
Product: [Fedora] Fedora Reporter: Sterling Winter <sterling.winter>
Component: kdelibsAssignee: Than Ngo <than>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: kevin, ltinkl, rdieter, tuxbrewr
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-06 17:49:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sterling Winter 2008-08-09 04:16:33 UTC
Description of problem:
Multiple concurrent DNS requests cause network-based KDE 4.1 applications to randomly fail. I've experienced this problem in Konqueror, Kmail, and Akgregator. Upstream reports KTorrent, KGet, and probably other applications are also affected.

Version-Release number of selected component (if applicable):
kdelibs-4.1.0-4.fc9.x86_64 (updates-testing)
kdebase-4.1.0-1.fc9.1.x86_64 (updates-testing)
kdepim-4.1.0-2.fc9.x86_64 (kde-unstable)
kdepim-libs-4.1.0-2.fc9.x86_64 (kde-unstable)
kdepimlibs-4.1.0-2.fc9.x86_64 (updates)

How reproducible:
Random, but very frequent

Steps to Reproduce:
Konqueror: Visit a page with many images, i.e. http://cnn.com

Kmail: Use the "Check Mail" button to check 2 or more accounts simultaneously (I tested with 8 POP accounts, mostly on the same POP3 server)

Akregator: Use the "Fetch All Feeds" button to check 2 or more feeds simultaneously (I tested with 5 active feeds)
  
Actual results:
Konqueror: Extremely slow loads on pages with many objects; sometimes sub-requests time out resulting in missing images, CSS, JavaScript, etc.

Kmail: Mail check fails randomly for roughly 50% of accounts and displays "Could not connect to host pop.foo.net: Unknown error." for each failure

Akregator: Feed update fails randomly for roughly 25% of the active feeds, resulting in red "X" icons next to each failed feed

Expected results:
Smooth operation of concurrent requests with no failures.

Additional info:
Problem lies with multiple DNS requests for the same domain alongside use of deficient DNS-caching routers. Fixed upstream:
http://bugs.kde.org/show_bug.cgi?id=162600

The fix was committed for inclusion in KDE 4.2, but not (yet) backported to 4.1:
http://websvn.kde.org/?view=rev&revision=830140

This patch really should be backported to Fedora's kdelibs before 4.1 goes into stable.

Comment 1 Kevin Kofler 2008-08-09 04:47:17 UTC
Is that with kdepim 4? Because the official kdepim in F9 is 3.5.9 and KMail and Akregator don't use kdelibs 4 at all.

Comment 2 Sterling Winter 2008-08-09 05:06:15 UTC
(In reply to comment #1)
> Is that with kdepim 4? Because the official kdepim in F9 is 3.5.9 and KMail and
> Akregator don't use kdelibs 4 at all.

Yes indeed, see above under "Version-Release number of selected component (if applicable)". However, Konqueror and KGet from updates-testing are also affected (I've now tested KGet, though I still haven't tested KTorrent). Many F9 users of KDE 4.1 will get bitten by this regardless of whether they're using unofficial kdepim or not.

I should mention one suggested workaround for this is to bypass your router's DNS caching and configure your machines to use your ISP's DNS servers directly. For me this meant changing "Method" from "DHCP" to "Manual" and entering a static IP address, subnet mask, gateway, and my ISP's DNS server IPs in the "IPv4 Settings" tab of my connection config dialog in NetworkManager. This seems to work perfectly for me so far, but at the cost of DHCP flexibility.

Comment 3 Kevin Kofler 2008-08-09 05:14:58 UTC
Looks like we need both these changesets:
http://websvn.kde.org/?view=rev&revision=830140
http://websvn.kde.org/?view=rev&revision=832072
with the second fixing a regression in the first: http://bugs.kde.org/show_bug.cgi?id=167166

Note that the real problem is that your router is broken, not KDE, the patches only work around that.

Comment 4 Rex Dieter 2008-08-09 05:27:49 UTC
Afaict, the upstream bug seems not completely resolved, and not officially backported to 4.1 branch (correct me if I'm wrong), so I'd feel hesitant about patching anything here yet.  just my $0.02.

Comment 5 Kevin Kofler 2008-08-09 05:40:06 UTC
kde#167166 is the regression from the first changeset, fixed by the second.

Comment 6 Sterling Winter 2008-08-09 05:41:24 UTC
(In reply to comment #3)
> Looks like we need both these changesets:
> http://websvn.kde.org/?view=rev&revision=830140
> http://websvn.kde.org/?view=rev&revision=832072
> with the second fixing a regression in the first:
> http://bugs.kde.org/show_bug.cgi?id=167166

Oops, I forgot to follow up on that reported SSL regression. Need more caffeine.
 
> Note that the real problem is that your router is broken, not KDE, the patches
> only work around that.

Well so far this is known to affect a popular Netgear model, a commercial 2wire gateway (what I use currently, but won't be after Monday!), and the popular Fritz!Box.

According to Thiago Macieira [1] the reason this started happening now is that kdelibs4 might be doing something different with DNS requests than was done before. So while crappy routers do seem to be the main culprit it's possible kdelibs4 is making things worse.

[1] http://bugs.kde.org/show_bug.cgi?id=162600#c18

Comment 7 Kevin Kofler 2008-08-09 12:34:54 UTC
Oh, and was this working in 4.0? Or have you only tried 4.1? According to the upstream bug, this looks like it was broken in 4.0 too, which would make this definitely not a 4.1 update blocker. The patch is also reported to be incomplete and not to fix the problem for everyone affected. So I think we want to wait for a proper fix there and push a separate update once that's out.

Comment 8 Sterling Winter 2008-08-09 18:37:47 UTC
The only time I've used KDE 4.0 was on the LiveCD/DVD spins, but with those I did experience the concurrent request slowdowns in Konqueror (I didn't try any other affected apps). I agree this issue isn't necessarily a blocker, especially since there's a simple and effective workaround, but I do feel it's a strong "should fix" because there will be more complaints (and dupes). ;)

Comment 9 Rex Dieter 2008-09-06 17:49:33 UTC
Looks like we'll just have to wait until upstream sorts this all out properly.  We'll continue to monitor this there.