Bug 806029

Summary: Ktorrent crashes during startup.
Product: [Fedora] Fedora Reporter: Leonid Zhaldybin <lzhaldyb>
Component: ktorrentAssignee: Roland Wolters <roland.wolters>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: alekcejk, esammons, kevin, law, mcrha, mkrizek, rdieter, roland.wolters, smparrish
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 806070 (view as bug list) Environment:
Last Closed: 2012-03-29 18:51:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 806070    
Bug Blocks:    
Attachments:
Description Flags
Ktorrent backtrace.
none
A few backtraces against the newer glibc. none

Description Leonid Zhaldybin 2012-03-22 17:38:41 UTC
Created attachment 572048 [details]
Ktorrent backtrace.

Description of problem:
After I added a few torrents to Ktorrent, starting it up became quite a problem. It crashes every time I try to start it and I have to hit "restart application" button in the  KDE Crash Handler a few times to finally get it running.

Version-Release number of selected component (if applicable):
ktorrent-4.2.0-1.fc17.x86_64

How reproducible:
always

Steps to Reproduce:
1. Add a few torrents to Ktorrent.
2. Close it and try to start it up again.
3. Ktorrent will crash.
  
Actual results:
Ktorrent krushes on startup.

Expected results:
Ktorrent starts up successfully.

Additional info:

Comment 1 Leonid Zhaldybin 2012-03-22 18:45:35 UTC
I've discovered that if nscd is running, then ktorrent starts up just fine. So, to work around this problem, do the following:

yum install nscd
service nscd start

Comment 2 nucleo 2012-03-22 18:54:28 UTC
Can't reproduce this so, lease report this bug to upstream:
https://bugs.kde.org/enter_bug.cgi?product=ktorrent

Comment 3 Leonid Zhaldybin 2012-03-22 20:13:39 UTC
(In reply to comment #2)
> Can't reproduce this so, lease report this bug to upstream:
> https://bugs.kde.org/enter_bug.cgi?product=ktorrent

I've already reported this to upstream (see https://bugs.kde.org/show_bug.cgi?id=296560). Turns out, this is probably a bug in glibc. I'm creating a clone.

Comment 4 Jeff Law 2012-03-23 04:27:33 UTC
Can you verify what version of nscd is installed?  Can you also let me know if this system was updated from glibc-2.14-whatever to glibc-2.15-whatever at some point?  

Jeff

Comment 5 Leonid Zhaldybin 2012-03-23 17:55:14 UTC
(In reply to comment #4)
> Can you verify what version of nscd is installed?  Can you also let me know if
> this system was updated from glibc-2.14-whatever to glibc-2.15-whatever at some
> point?  
> 
> Jeff

The version of nscd package is nscd-2.15-28.fc17.x86_64. Sorry I didn't provide this information right away.
As to glibc version, I installed Fedora 17 from the Fedora-17-Alpha-x86_64-Live-KDE CD, which has glibc-2.15-11.fc17.x86_64 on it. And the version which I have currently installed is glibc-2.15-28.fc17.x86_64. So, the answer to the second question is: no, there was no update from glibc-2.14 to glibc-2.15.

Leonid.

Comment 6 Jeff Law 2012-03-27 03:18:27 UTC
Thanks Leonid.  One theory bites the dust.

I've got another theory; specifically nscd_get_nl_timestamp appears to call nscd_get_mapping w/o locking the hst_map_handle data structure.

I see a potential data race on hst_map_handle_mapped (*mappedp in nscd_get_mapping).  A data race would easily account for the unpredictable behaviour.

I don't have an environment handy to reproduce this locally and test that theory.  If I built some test rpms could you test them?

Comment 7 Jeff Law 2012-03-27 04:38:08 UTC
http://koji.fedoraproject.org/koji/taskinfo?taskID=3935345

Contains a test fix for the above-mentioned issue.  If you could give it a spin and see if it fixes the faults in ktorrent, I'd be grateful.

It's a scratch build, so I have no idea how long it'll be available.

Comment 8 Jeff Law 2012-03-27 06:14:55 UTC
Ignore the build in c#7, I suspect using that will lock up your system as it fails to release the acquired lock.  This should be better:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3935519

Comment 9 Leonid Zhaldybin 2012-03-27 18:30:31 UTC
Jeff,
I installed your build from c8:
# rpm -qa | egrep  "glibc|nscd" | sort
glibc-2.15-30.fc17.x86_64
glibc-common-2.15-30.fc17.x86_64
glibc-debuginfo-2.15-30.fc17.x86_64
glibc-debuginfo-common-2.15-30.fc17.x86_64
glibc-devel-2.15-30.fc17.x86_64
glibc-headers-2.15-30.fc17.x86_64
nscd-2.15-30.fc17.x86_64

Unfortunately, that didn't do the trick. The ktorrent application still fails to start approx. 8 times out of 10 - same as before. And, same as before, it happens only if nscd is not running.
I'm going to attach a couple of backtraces, just in case they are different from the original one.
If I can help in any way, just let me know.

Comment 10 Leonid Zhaldybin 2012-03-27 18:31:34 UTC
Created attachment 573149 [details]
A few backtraces against the newer glibc.

Comment 11 Jeff Law 2012-03-27 18:45:38 UTC
Thanks.  Those backtraces are significantly different; which makes me wonder if I goof'd the locking...  Interestingly enough they do show us going through the nscd_get_nl_timestamp routine like I suspected (via check_pf).   So, well, hopefully I just typo'd or something in the locking code I added.

Expect new rpms reasonably soon.

Comment 12 Jeff Law 2012-03-27 19:53:35 UTC
I may have prematurely considered this the same failure as all the faults in nscd_get_mapping because the original kcrash files indicated you were at the same location in nscd_get_mapping when your crash occurred.

However, in your crash, according to kcrash the it's thread #1 that is faulting inside pthread_cond_wait; that's a significant difference between the other reports which are faulting in nscd_get_mapping.

Access to the raw core file and rpm -q -a list of the RPMs on your box might be helpful in diagnosing this further.  It'd be nice to rule out kcrash simply reporting the wrong active thread; furthermore, assuming kcrash has the right thread, examining the register state would be useful.

Comment 13 Jeff Law 2012-03-28 17:55:21 UTC
As I mentioned in c#12, I'm not 100% sure your problem is the same as the reported crashes in nscd_get_mapping, but I'm treating them as related until I can prove it otherwise.

Another gent testing this stuff noticed an oversight in the last patch which this set of RPMS fixes.  If you could test them it'd be greatly appreciated.

http://koji.fedoraproject.org/koji/taskinfo?taskID=3940526

Given I haven't bumped the NVR, you'll probably have to use the --force option to get RPM to install the newer version.

Comment 14 Leonid Zhaldybin 2012-03-28 19:50:04 UTC
It's good news this time!
After installing the build from c13, I ran a simple script to test it. And ktorrent started up 200 times without a single crash. It seems that the issue is fixed, at least on my platform.
I have only x86_64 installation at hand. I'll try to test this build on i686 in a virtual machine.

Comment 15 Jeff Law 2012-03-29 16:27:34 UTC
*** Bug 806070 has been marked as a duplicate of this bug. ***

Comment 16 Jeff Law 2012-03-29 18:51:36 UTC
Patch installed into rawhide & f17.

Comment 17 Milan Crha 2012-03-30 12:24:43 UTC
(In reply to comment #16)
> Patch installed into rawhide & f17.

fixed in version...?

Comment 18 Martin Bříza 2012-07-18 08:29:12 UTC
*** Bug 797783 has been marked as a duplicate of this bug. ***