Bug 711329 - During cds sync goferd on all cds nodes crashes very often
Summary: During cds sync goferd on all cds nodes crashes very often
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Update Infrastructure for Cloud Providers
Classification: Red Hat
Component: Upstream
Version: 2.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: ---
Assignee: Jay Dobies
QA Contact: wes hayutin
URL:
Whiteboard:
: 711389 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-06-07 08:15 UTC by Sachin Ghai
Modified: 2013-02-14 14:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-31 12:56:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Multithreaded script that crashes NSS (2.63 KB, text/x-python)
2011-06-22 13:15 UTC, John Matthews
no flags Details

Description Sachin Ghai 2011-06-07 08:15:39 UTC
Description of problem:
When I tried to restart pulp-cds service, i got following:

[root@dhcp193-65 ~]# service pulp-cds restart
Stopping httpd:                                            [  OK  ]
Stopping goferd                                            [  OK  ]
goferd (7103) already running.
Starting httpd:                                            [  OK  ]
[root@dhcp193-65 ~]# service pulp-cds restart
Stopping httpd:                                            [  OK  ]
Stopping goferd                                            [FAILED]
Starting goferd                                            [  OK  ]
Starting httpd:                                            [  OK  ]
[root@dhcp193-65 ~]# 


Version-Release number of selected component (if applicable):
pulp 0.186
rhui-tools 2.0.26

How reproducible:
very frequently. I found this when I unregistered the CDS from rhui-manager and tried to restart pulp-cds on cds node.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

[root@dhcp193-65 ~]# service pulp-cds restart
Stopping httpd:                                            [  OK  ]
Stopping goferd                                            [  OK  ]
goferd (7103) already running.
Starting httpd:                                            [  OK  ]
[root@dhcp193-65 ~]# service pulp-cds restart
Stopping httpd:                                            [  OK  ]
Stopping goferd                                            [FAILED]
Starting goferd                                            [  OK  ]
Starting httpd:                                            [  OK  ]
[root@dhcp193-65 ~]# 


Expected results:
pulp-cds service should restart goferd properly

Additional info:

Comment 1 Jay Dobies 2011-06-07 12:50:45 UTC
Yesterday I was running an environment and gofer randomly stopped on me in the middle of a sync. I thought it was a fluke, but between this bug and 711389, I'm thinking gofer is crashing.

Sachin - Can you post the gofer version you're using?

Jeff - Can you take a look at this?

Comment 2 Jay Dobies 2011-06-07 12:52:34 UTC
*** Bug 711389 has been marked as a duplicate of this bug. ***

Comment 3 Sachin Ghai 2011-06-07 12:57:01 UTC
Gofer on RHUA/PULP server:
======================

[root@dhcp193-79 pulp]# rpm -qa | grep gofer
python-gofer-0.38-1.el6.noarch
gofer-0.38-1.el6.noarch
[root@dhcp193-79 pulp]# 


Gofer on CDS node:
==================
[root@dhcp193-65 ~]# rpm -qa |grep gofer
gofer-0.38-1.el6.noarch
python-gofer-0.38-1.el6.noarch
[root@dhcp193-65 ~]#

Comment 4 Jeff Ortel 2011-06-07 13:55:48 UTC
(In reply to comment #1)
> Yesterday I was running an environment and gofer randomly stopped on me in the
> middle of a sync. I thought it was a fluke, but between this bug and 711389,
> I'm thinking gofer is crashing.
> 
> Sachin - Can you post the gofer version you're using?
> 
> Jeff - Can you take a look at this?

Sure.  According to Bug 711389, there is nothing in the gofer log.  This true in all cases? Can you enable core files or some other means to detect the python interpreter core dumping?  Do you think this happens when the is not performing a CDS sync?

Comment 5 Jeff Ortel 2011-06-07 14:38:03 UTC
John Matthews has seen the underlying 'C' libs used by grinder core the python interpreter on F12 which I /think/ may line up with what's in EL6.  Have we seen this on the other platforms?

Comment 6 Kedar Bidarkar 2011-06-08 05:42:08 UTC
Not sure though, but this mainly when we try to sync numerous repos
simultaneously. 

Something like rhel5 , rhel6, rhui1.2 content.



------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= Content Delivery Server (CDS) Management =-

   l   list all CDS instances registered to the RHUI
   a   register (add) a new CDS instance
   d   unregister (delete) a CDS instance from the RHUI
   r   manage repositories hosted on a CDS instance

                                  Connected: dhcp201-149.englab.pnq.redhat.com
------------------------------------------------------------------------------
rhui (cds) => r

Select the CDS instance to manage repositories: 
  1  - cds1-196
  2  - cds2-101
Enter value (1-2) or 'b' to abort: 1

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Repositories Management - cds1-196 =-

   l   list repositories deployed on this CDS
   a   deploy (add) a repository to this CDS
   d   undeploy (delete) a repository from this CDS

                                  Connected: dhcp201-149.englab.pnq.redhat.com
------------------------------------------------------------------------------
rhui (cds1-196) => l

-= CDS Repositories =-

Custom Repositories
  Qpid

Red Hat Repositories
  Red Hat Update Infrastructure 1.2 (SRPMS) (5Server-x86_64)
  Red Hat Update Infrastructure 1.2 (RPMs) (5Server-x86_64)
  Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64)


------------------------------------------------------------------------------
rhui (cds1-196) => <

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= Content Delivery Server (CDS) Management =-

   l   list all CDS instances registered to the RHUI
   a   register (add) a new CDS instance
   d   unregister (delete) a CDS instance from the RHUI
   r   manage repositories hosted on a CDS instance

                                  Connected: dhcp201-149.englab.pnq.redhat.com
------------------------------------------------------------------------------
rhui (cds) => r

Select the CDS instance to manage repositories: 
  1  - cds1-196
  2  - cds2-101
Enter value (1-2) or 'b' to abort: 2

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Repositories Management - cds2-101 =-

   l   list repositories deployed on this CDS
   a   deploy (add) a repository to this CDS
   d   undeploy (delete) a repository from this CDS

                                  Connected: dhcp201-149.englab.pnq.redhat.com
------------------------------------------------------------------------------
rhui (cds2-101) => l

-= CDS Repositories =-

Custom Repositories
  Qpid

Red Hat Repositories
  Red Hat Enterprise Linux Server (RPMs) (5Server-x86_64)
  Red Hat Update Infrastructure 1.2 (SRPMS) (5Server-x86_64)
  Red Hat Update Infrastructure 1.2 (RPMs) (5Server-x86_64)
  Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64)
  Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-i386)


------------------------------------------------------------------------------
rhui (cds2-101) =>

Comment 7 Kedar Bidarkar 2011-06-08 05:47:13 UTC
------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 15:19:51
(updated every 5 seconds, ctrl+c to exit)


cds1-196 .................................................... [  UP  ]
cds2-101 .................................................... [ DOWN ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds1-196
06-08-2011 20:27             06-08-2011 08:29             scheduled  

cds2-101
06-08-2011 17:46             06-08-2011 04:28             scheduled  


                                  Connected: dhcp201-149.englab.pnq.redhat.com



[root@dhcp201-101 ~]# ps -ef | grep -i goferd
root      4562  4129  0 15:21 pts/0    00:00:00 grep -i goferd

Comment 8 Jeff Ortel 2011-06-08 13:17:10 UTC
It would be very helpful if you can enable core dumps and re-run you tests.  Can you do that?  Once we have a core file, we can determine what is crashing goferd.

Comment 9 Sachin Ghai 2011-06-08 14:12:17 UTC
I enabled the core dump on CDS node( 10.65.193.65) and started the CDS sync.

Presently I found following crashes. I'll update you more on this shortly. 

[root@dhcp193-65 ~]# abrt-cli  -l
0.
	UID        : 0
	UUID       : 8dcec7d7
	Package    : grinder-0.0.100-1.el6
	Executable : /usr/bin/grinder
	Crash Time : Fri 03 Jun 2011 06:08:36 PM IST
	Crash Count: 1
	Hostname   : dhcp193-65.pnq.redhat.com
1.
	UID        : 0
	UUID       : 2ab0393d0b5cdfecd3f45b732ae269d26fc30ecd
	Package    : curl-7.19.7-16.el6
	Executable : /usr/bin/curl
	Crash Time : Fri 03 Jun 2011 08:14:00 PM IST
	Crash Count: 1
	Hostname   : dhcp193-65.pnq.redhat.com
2.
	UID        : 0
	UUID       : 1d0528c75b9fe2e1fae6b6f88df2a7b9c07e05c9
	Package    : gofer-0.38-1.el6
	Executable : /usr/bin/python
	Crash Time : Mon 06 Jun 2011 03:02:24 PM IST
	Crash Count: 1
	Hostname   : dhcp193-65.pnq.redhat.com
3.
	UID        : 0
	UUID       : 16ab3b244fa26af4de9672377f1c13a975566f3b
	Package    : gofer-0.38-1.el6
	Executable : /usr/bin/python
	Crash Time : Tue 07 Jun 2011 06:53:40 PM IST
	Crash Count: 3
	Hostname   : dhcp193-65.pnq.redhat.com


[root@dhcp193-65 ~]# abrt-cli -i 1d0528c75b9fe2e1fae6b6f88df2a7b9c07e05c9
>> Generating backtrace
Crash ID:           0:1d0528c75b9fe2e1fae6b6f88df2a7b9c07e05c9
Last crash:         Mon 06 Jun 2011 03:02:24 PM IST
Analyzer:           CCpp
Component:          gofer
Package:            gofer-0.38-1.el6
Command:            python /usr/bin/goferd
Executable:         /usr/bin/python
System:             Red Hat Enterprise Linux Server release 6.0 (Santiago), kernel 2.6.32-71.el6.x86_64
Reason:             Process /usr/bin/python was killed by signal 11 (SIGSEGV)
Coredump file:      /var/spool/abrt/ccpp-1307352744-6105/coredump
Rating:             0
Hostname:           dhcp193-65.pnq.redhat.com


[root@dhcp193-65 ~]# abrt-cli -i 16ab3b244fa26af4de9672377f1c13a975566f3b
>> Generating backtrace
Crash ID:           0:16ab3b244fa26af4de9672377f1c13a975566f3b
Last crash:         Mon 06 Jun 2011 05:05:01 PM IST
Analyzer:           CCpp
Component:          gofer
Package:            gofer-0.38-1.el6
Command:            python /usr/bin/goferd
Executable:         /usr/bin/python
System:             Red Hat Enterprise Linux Server release 6.0 (Santiago), kernel 2.6.32-71.el6.x86_64
Reason:             Process /usr/bin/python was killed by signal 6 (SIGABRT)
Coredump file:      /var/spool/abrt/ccpp-1307360101-6731/coredump
Rating:             0
Crash function:     __libc_message
Hostname:           dhcp193-65.pnq.redhat.com
[root@dhcp193-65 ~]#

Comment 10 Sachin Ghai 2011-06-09 08:06:16 UTC
Hi Jeff, Core file is generated and it is the lastest core file when new crash was detected.

[root@dhcp193-65 ccpp-1307604130-21051]# abrt-cli  -i 32e6668bd54c0f4fc387e243cca4af5b19983b29
Crash ID:           0:32e6668bd54c0f4fc387e243cca4af5b19983b29
Last crash:         Thu 09 Jun 2011 12:52:10 PM IST
Analyzer:           CCpp
Component:          gofer
Package:            gofer-0.38-1.el6
Command:            python /usr/bin/goferd
Executable:         /usr/bin/python
System:             Red Hat Enterprise Linux Server release 6.0 (Santiago), kernel 2.6.32-71.el6.x86_64
Reason:             Process /usr/bin/python was killed by signal 6 (SIGABRT)
Coredump file:      /var/spool/abrt/ccpp-1307604130-21051/coredump
Rating:             0
Crash function:     __libc_message
Hostname:           dhcp193-65.pnq.redhat.com
[root@dhcp193-65 ccpp-1307604130-21051]# 


The core file size is around 223MB. So please check this file on my test node 10.65.193.65 under /var/spool/abrt/ccpp-1307604130-21051/coredump

[root@dhcp193-65 ccpp-1307604130-21051]# pwd
/var/spool/abrt/ccpp-1307604130-21051
[root@dhcp193-65 ccpp-1307604130-21051]# ll -h coredump 
-rw-r--r--. 1 root root 223M Jun  9 12:52 coredump
[root@dhcp193-65 ccpp-1307604130-21051]#

Comment 11 Jeff Ortel 2011-06-09 14:25:19 UTC
The stack trace for offending thread shows that __libc_message() invoked from cURL (pyCURL) seems to be the culprit.

trace:
Thread 1 (Thread 0x7f1900dfa710 (LWP 21144)):
#0  0x00000031956329a5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003195634185 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000319566fd5b in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000003195675676 in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f19185a1e72 in ?? () from /usr/lib64/libnsspem.so
No symbol table info available.
#5  0x00007f19185934f4 in ?? () from /usr/lib64/libnsspem.so
No symbol table info available.
#6  0x00007f191859364d in ?? () from /usr/lib64/libnsspem.so
No symbol table info available.
#7  0x00007f19185a0e0f in ?? () from /usr/lib64/libnsspem.so
No symbol table info available.
#8  0x00007f191859cdf1 in ?? () from /usr/lib64/libnsspem.so
No symbol table info available.
#9  0x0000003ad2046b71 in PK11_Sign () from /usr/lib64/libnss3.so
No symbol table info available.
#10 0x0000003ad340e620 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#11 0x0000003ad340f159 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#12 0x0000003ad3412860 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#13 0x0000003ad3413e30 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#14 0x0000003ad34148cc in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#15 0x0000003ad3414aaa in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#16 0x0000003ad341e262 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#17 0x0000003ad3421fa2 in ?? () from /usr/lib64/libssl3.so
No symbol table info available.
#18 0x0000003ad403ef5d in Curl_nss_recv () from /usr/lib64/libcurl.so.4
No symbol table info available.
#19 0x0000003ad4037de3 in Curl_ssl_recv () from /usr/lib64/libcurl.so.4
No symbol table info available.
#20 0x0000003ad401744c in Curl_read () from /usr/lib64/libcurl.so.4
No symbol table info available.
#21 0x0000003ad4029f72 in Curl_readwrite () from /usr/lib64/libcurl.so.4
No symbol table info available.
#22 0x0000003ad402bd18 in Curl_perform () from /usr/lib64/libcurl.so.4
No symbol table info available.
#23 0x00007f191f16f73b in ?? () from /usr/lib64/python2.6/site-packages/pycurl.so
No symbol table info available.
#24 0x0000003196ede81e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#25 0x0000003196ee05a4 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#26 0x0000003196ede61e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#27 0x0000003196edf52d in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#28 0x0000003196edf52d in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#29 0x0000003196edf52d in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#30 0x0000003196ee05a4 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#31 0x0000003196e6e9f0 in ?? () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#32 0x0000003196e43e13 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#33 0x0000003196e592ef in ?? () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#34 0x0000003196e43e13 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#35 0x0000003196ed8ac3 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#36 0x0000003196f0b47a in ?? () from /usr/lib64/libpython2.6.so.1.0
No symbol table info available.
#37 0x0000003195a077e1 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#38 0x00000031956e153d in clone () from /lib64/libc.so.6

Comment 12 Jay Dobies 2011-06-16 13:42:34 UTC
Refiled upstream for Pulp's sprint 25: 713755

Comment 13 Sachin Ghai 2011-06-17 08:53:55 UTC
Just to update that goferd crashes with new build as well.
pulp 0.190 and rhui-tools 2.0.30

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 14:17:48
(updated every 5 seconds, ctrl+c to exit)


cds0021 ..................................................... [  UP  ]
cdss00115 ................................................... [ DOWN ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds0021
06-17-2011 14:50             06-17-2011 13:50             finished   

cdss00115
06-17-2011 14:14             06-17-2011 13:41             running    


                                         Connected: dhcp193-163.pnq.redhat.com
------------------------------------------------------------------------------
^Crhui (sync) => 



[root@dhcp193-115 ~]# service pulp-cds restart
Stopping httpd:                                            [  OK  ]
Stopping goferd                                            [FAILED]
Starting goferd                                            [  OK  ]
Starting httpd:                                            [  OK  ]
[root@dhcp193-115 ~]#

Comment 16 John Matthews 2011-06-20 12:15:42 UTC
Below is an example of a crash I started to see in grinder once we moved to using a PEM cert and not the broken out cert/key.

Note:  This is on Fedora 12.  I see this crash on syncing large repos occasionally on F12 only.  I still have not seen the crash on my rhel6/f14 setups.  

Guess: Bug in nss/libcurl

Fedora-12 RPMs (my setup that shows crash occasionally)
nss-3.12.6-12.fc12.x86_64 
libcurl-7.19.7-2.fc12.x86_64
python-pycurl-7.19.0-4.fc12.x86_64

(Referring to QE's setups)
CDS1: RHEL6
nss-3.12.9-9.el6.x86_64
libcurl-7.19.7-26.el6.x86_64
python-pycurl-7.19.0-5.el6.x86_64

CDS2: RHEL6
nss-3.12.9-9.el6.x86_64
libcurl-7.19.7-26.el6.x86_64
python-pycurl-7.19.0-5.el6.x86_64



grinder.BaseFetch: INFO     Fetching 61804 bytes: perl-Digest-SHA-5.47-115.el6.x86_64.rpm from https://sat-perf-03.idm.lab.bos.redhat.com/pulp/repos/content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/Packages/perl-Digest-SHA-5.47-115.el6.x86_64.rpm
grinder.ParallelFetch: INFO     25 threads are active
grinder.BaseFetch: INFO     Fetching 51844 bytes: mcelog-1.0pre3-0.2.el6.x86_64.rpm from https://sat-perf-03.idm.lab.bos.redhat.com/pulp/repos/content/dist/rhel/rhui/server-6/releases/6Server/x86_64/os/Packages/mcelog-1.0pre3-0.2.el6.x86_64.rpm
*** glibc detected *** /usr/bin/python: double free or corruption (fasttop): 0x00007f2e3c00b780 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3127074576]
/usr/lib64/libnsspem.so(+0x18db2)[0x7f2e67ef7db2]
/usr/lib64/libnsspem.so(+0xa787)[0x7f2e67ee9787]
/usr/lib64/libnsspem.so(+0x9e1e)[0x7f2e67ee8e1e]
/usr/lib64/libnsspem.so(+0xa335)[0x7f2e67ee9335]
/usr/lib64/libnsspem.so(+0x181fe)[0x7f2e67ef71fe]
/usr/lib64/libnsspem.so(+0xdeb9)[0x7f2e67eeceb9]
/usr/lib64/libnsspem.so(+0x12c5c)[0x7f2e67ef1c5c]
/usr/lib64/libnss3.so(PK11_Sign+0xe2)[0x3139c464f2]
/usr/lib64/libssl3.so[0x313a80e5f0]
/usr/lib64/libssl3.so[0x313a80f0f9]
/usr/lib64/libssl3.so[0x313a8127d0]
/usr/lib64/libssl3.so[0x313a813c60]
/usr/lib64/libssl3.so[0x313a81487b]
/usr/lib64/libssl3.so[0x313a8149da]
/usr/lib64/libssl3.so[0x313a81e0f2]
/usr/lib64/libssl3.so[0x313a821e02]
/usr/lib64/libcurl.so.4(Curl_nss_recv+0x6f)[0x313c43c9df]
/usr/lib64/libcurl.so.4(Curl_ssl_recv+0x13)[0x313c435903]
/usr/lib64/libcurl.so.4(Curl_read+0x274)[0x313c4169e4]
/usr/lib64/libcurl.so.4(Curl_readwrite+0x192)[0x313c429372]
/usr/lib64/libcurl.so.4(Curl_perform+0x320)[0x313c42b0d0]
/usr/lib64/python2.6/site-packages/pycurl.so(+0x873b)[0x7f2e6be0173b]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x4e52)[0x31334dc652]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalCodeEx+0x875)[0x31334de4e5]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x53af)[0x31334dcbaf]
/usr/lib64/libpython2.6.so.1.0(PyEval_EvalFrameEx+0x5877)[0x31334dd077]

Comment 17 John Matthews 2011-06-21 17:29:40 UTC
Attempted to force grinder to split the single PEM cert out to cert/key then use as separate key/cert.  Didn't seem to fix issue, still seeing crash.

Below crash is from Fedora-14.
nss-3.12.10-1.fc14.i686

Loaded symbols for /usr/lib64/libnsspem.so
Core was generated by `/usr/bin/python /usr/bin/grinder yum --label rhel6_test --cacert ./candlepin-ca'.
Program terminated with signal 6, Aborted.
#0  0x00000033510330c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00000033510330c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003351034a76 in abort () at abort.c:92
#2  0x000000335106fcfb in __libc_message (do_abort=2, fmt=0x335115ea98 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x0000003351075526 in malloc_printerr (action=3, str=0x335115ca55 "corrupted double-linked list", ptr=<value optimized out>) at malloc.c:6283
#4  0x00000033510770b1 in _int_free (av=0x7f44a4000020, p=0x7f44a403fef0, have_lock=0) at malloc.c:4964
#5  0x00007f45095114a2 in nss_ZFreeIf (pointer=0x7f44a404ada0) at arena.c:975
#6  0x00007f4509502a8e in pem_PopulateModulusExponent (io=0x28f6550) at prsa.c:302
#7  0x00007f450950115c in pem_FetchPrivKeyAttribute (io=0x28f6550, type=<value optimized out>) at pobject.c:321
#8  0x00007f4509502365 in pem_mdCryptoOperationRSA_GetFinalLength (mdOperation=<value optimized out>, fwOperation=<value optimized out>, mdSession=<value optimized out>, fwSession=<value optimized out>, mdToken=<value optimized out>, 
    fwToken=<value optimized out>, mdInstance=0x7f450972e660, fwInstance=0x28d2d70, pError=0x7f44e0ff7b78) at prsa.c:420
#9  0x00007f45095106ee in nssCKFWCryptoOperation_GetFinalLength (fwOperation=<value optimized out>, pError=<value optimized out>) at crypto.c:178
#10 0x00007f45095072c1 in nssCKFWSession_UpdateFinal (fwSession=0x7f44ac04ead0, type=<value optimized out>, state=NSSCKFWCryptoOperationState_SignVerify, inBuf=<value optimized out>, inBufLen=36, outBuf=0x7f44b80277a0 "\310\001", outBufLen=
    0x7f44e0ff7c30) at session.c:2218
#11 0x00007f450950d15c in NSSCKFWC_Sign (fwInstance=<value optimized out>, hSession=<value optimized out>, pData=0x7f44e0ff7e50 "N\004\201\202\250\237\021\273gƛu.5L?m\210\002笚70\210s簴\321-v\027\060p\243", ulDataLen=36, pSignature=
    0x7f44b80277a0 "\310\001", pulSignatureLen=0x7f44e0ff7c30) at wrap.c:3822
#12 0x00007f450da9b7d4 in PK11_Sign (key=0x7f44b800fd80, sig=0x7f44e0ff7dd0, hash=0x7f44e0ff7c90) at pk11obj.c:768
#13 0x00007f450dfc8db0 in ssl3_SignHashes (hash=<value optimized out>, key=0x7f44b800fd80, buf=0x7f44e0ff7dd0, isTLS=1) at ssl3con.c:887
#14 0x00007f450dfcef20 in ssl3_SendCertificateVerify (ss=0x7f44b801e520, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:4824
#15 ssl3_HandleServerHelloDone (ss=0x7f44b801e520, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:5734
#16 ssl3_HandleHandshakeMessage (ss=0x7f44b801e520, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8630
#17 0x00007f450dfd0a2c in ssl3_HandleHandshake (ss=0x7f44b801e520, cText=<value optimized out>, databuf=0x7f44b801e888) at ssl3con.c:8725
#18 ssl3_HandleRecord (ss=0x7f44b801e520, cText=<value optimized out>, databuf=0x7f44b801e888) at ssl3con.c:9064
#19 0x00007f450dfd1ae6 in ssl3_GatherCompleteHandshake (ss=0x7f44b801e520, flags=0) at ssl3gthr.c:209
#20 0x00007f450dfd1cea in ssl3_GatherAppDataRecord (ss=0x7f44b801e520, flags=0) at ssl3gthr.c:251
#21 0x00007f450dfdb4d2 in DoRecv (ss=0x7f44b801e520, buf=0x7f44b8001740 "", len=16384, flags=0) at sslsecur.c:552
#22 ssl_SecureRecv (ss=0x7f44b801e520, buf=0x7f44b8001740 "", len=16384, flags=0) at sslsecur.c:1160
#23 0x00007f450dfdf292 in ssl_Recv (fd=<value optimized out>, buf=0x7f44b8001740, len=16384, flags=0, timeout=4294967295) at sslsock.c:1593
#24 0x000000353ec44c07 in nss_recv (conn=0x7f44b800b080, num=<value optimized out>, buf=<value optimized out>, buffersize=<value optimized out>, curlcode=0x7f44e0ff810c) at nss.c:1451
#25 0x000000353ec19e22 in Curl_read (conn=0x7f44b800b080, sockfd=<value optimized out>, buf=0x7f44b8001740 "", sizerequested=16384, n=0x7f44e0ff8190) at sendf.c:579
#26 0x000000353ec2c14d in readwrite_data (conn=0x7f44b800b080, done=0x7f44e0ff824e) at transfer.c:396
#27 Curl_readwrite (conn=0x7f44b800b080, done=0x7f44e0ff824e) at transfer.c:1004
#28 0x000000353ec2dd47 in Transfer (data=0x7f44b8000f10) at transfer.c:1367
#29 Curl_do_perform (data=0x7f44b8000f10) at transfer.c:2053
#30 0x00007f450e42c843 in do_curl_perform (self=0x7f44b802a1d0) at src/pycurl.c:1024
#31 0x00000035390e912b in call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4055
#32 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:2721
#33 0x00000035390eb04d in PyEval_EvalCodeEx (co=0x26e4cb0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=4, kws=0x7f44c4002398, kwcount=4, defs=0x26f0208, defcount=6, closure=0x0)
    at /usr/src/debug/Python-2.7/Python/ceval.c:3311
#34 0x00000035390e963a in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4167
---Type <return> to continue, or q <return> to quit---
#35 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4092
#36 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:2721
#37 0x00000035390ea71d in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4157
#38 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4092
#39 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:2721
#40 0x00000035390ea71d in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4157
#41 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4092
#42 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:2721
#43 0x00000035390ea71d in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4157
#44 call_function (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:4092
#45 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:2721
#46 0x00000035390eb04d in PyEval_EvalCodeEx (co=0x216c230, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at /usr/src/debug/Python-2.7/Python/ceval.c:3311
#47 0x0000003539071c62 in function_call (func=<function at remote 0x2174d70>, arg=
    (<WorkerThread(_Thread__ident=139933809350400, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f110>, acquire=<built-in method acquire of thread.lock object at remote 0x284f110>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x284f110>) at remote 0x284e710>, _Thread__name='Thread-17', _Thread__daemonic=False, _stop=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f130>, acquire=<built-in method acquire of thread.lock object at remote 0x284f130>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x284f130>) at remote 0x284e790>) at remote 0x284e750>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f0f0>, acquire=<built-in method acquire of thread.lock object at remote 0x...(truncated), kw=0x0) at /usr/src/debug/Python-2.7/Objects/funcobject.c:526
#48 0x0000003539048fc3 in PyObject_Call (func=<function at remote 0x2174d70>, arg=<value optimized out>, kw=<value optimized out>) at /usr/src/debug/Python-2.7/Objects/abstract.c:2522
#49 0x000000353905a65f in instancemethod_call (func=<function at remote 0x2174d70>, arg=
    (<WorkerThread(_Thread__ident=139933809350400, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f110>, acquire=<built-in method acquire of thread.lock object at remote 0x284f110>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x284f110>) at remote 0x284e710>, _Thread__name='Thread-17', _Thread__daemonic=False, _stop=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f130>, acquire=<built-in method acquire of thread.lock object at remote 0x284f130>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x284f130>) at remote 0x284e790>) at remote 0x284e750>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x284f0f0>, acquire=<built-in method acquire of thread.lock object at remote 0x...(truncated), kw=0x0) at /usr/src/debug/Python-2.7/Objects/classobject.c:2578
#50 0x0000003539048fc3 in PyObject_Call (func=<instancemethod at remote 0x3e48d20>, arg=<value optimized out>, kw=<value optimized out>) at /usr/src/debug/Python-2.7/Objects/abstract.c:2522
#51 0x00000035390e3a87 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x3e48d20>, arg=(), kw=<value optimized out>) at /usr/src/debug/Python-2.7/Python/ceval.c:3940
#52 0x000000353911bc32 in t_bootstrap (boot_raw=0x3b8acf0) at /usr/src/debug/Python-2.7/Modules/threadmodule.c:446
#53 0x0000003351806ccb in start_thread (arg=0x7f44e0ff9700) at pthread_create.c:301
#54 0x00000033510e0c2d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Comment 18 John Matthews 2011-06-21 20:59:47 UTC
Another crash on Fedora-14

#0  0x00000033510330c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00000033510330c5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003351034a76 in abort () at abort.c:92
#2  0x000000335106fcfb in __libc_message (do_abort=2, fmt=0x335115ea98 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3  0x0000003351076d63 in malloc_printerr (av=<value optimized out>, p=0x7f5c40022a40, have_lock=0) at malloc.c:6283
#4  _int_free (av=<value optimized out>, p=0x7f5c40022a40, have_lock=0) at malloc.c:4795
#5  0x00007f5cb67db4a2 in nss_ZFreeIf (pointer=0x7f5c40022a60) at arena.c:975
#6  0x00007f5cb67cc574 in pem_getPrivateKey (arena=0x7f5c5403c9c0, rawkey=<value optimized out>, pError=0x7f5c5effbb78, modulus=0x1d06db8) at prsa.c:202
#7  0x00007f5cb67cc6cd in pem_mdCryptoOperationRSAPriv_Create (proto=0x7f5cb69f41a0, mdMechanism=0x7f5cb69f4200, pError=0x7f5c5effbb78, mdKey=<value optimized out>) at prsa.c:361
#8  0x00007f5cb67d997f in nssCKFWMechanism_SignInit (fwMechanism=0x7f5c5404b150, pMechanism=0x7f5c5effbc10, fwSession=0x7f5c40021620, fwObject=0x1ce1e18) at mechanism.c:659
#9  0x00007f5cb67d6ff1 in NSSCKFWC_SignInit (fwInstance=<value optimized out>, hSession=<value optimized out>, pMechanism=0x7f5c5effbc10, hKey=<value optimized out>) at wrap.c:3752
#10 0x00007f5cbad657a3 in PK11_Sign (key=0x7f5c54027960, sig=0x7f5c5effbdd0, hash=0x7f5c5effbc90) at pk11obj.c:760
#11 0x00007f5cbb292db0 in ssl3_SignHashes (hash=<value optimized out>, key=0x7f5c54027960, buf=0x7f5c5effbdd0, isTLS=1) at ssl3con.c:887
#12 0x00007f5cbb298f20 in ssl3_SendCertificateVerify (ss=0x7f5c540582e0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:4824
#13 ssl3_HandleServerHelloDone (ss=0x7f5c540582e0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:5734
#14 ssl3_HandleHandshakeMessage (ss=0x7f5c540582e0, b=<value optimized out>, length=<value optimized out>) at ssl3con.c:8630
#15 0x00007f5cbb29aa2c in ssl3_HandleHandshake (ss=0x7f5c540582e0, cText=<value optimized out>, databuf=0x7f5c54058648) at ssl3con.c:8725
#16 ssl3_HandleRecord (ss=0x7f5c540582e0, cText=<value optimized out>, databuf=0x7f5c54058648) at ssl3con.c:9064
#17 0x00007f5cbb29bae6 in ssl3_GatherCompleteHandshake (ss=0x7f5c540582e0, flags=0) at ssl3gthr.c:209
#18 0x00007f5cbb29bcea in ssl3_GatherAppDataRecord (ss=0x7f5c540582e0, flags=0) at ssl3gthr.c:251
#19 0x00007f5cbb2a54d2 in DoRecv (ss=0x7f5c540582e0, buf=0x7f5c54010020 "", len=16384, flags=0) at sslsecur.c:552
#20 ssl_SecureRecv (ss=0x7f5c540582e0, buf=0x7f5c54010020 "", len=16384, flags=0) at sslsecur.c:1160
#21 0x00007f5cbb2a9292 in ssl_Recv (fd=<value optimized out>, buf=0x7f5c54010020, len=16384, flags=0, timeout=4294967295) at sslsock.c:1593
#22 0x000000353ec44c07 in nss_recv (conn=0x7f5c5404bb30, num=<value optimized out>, buf=<value optimized out>, buffersize=<value optimized out>, curlcode=0x7f5c5effc10c) at nss.c:1451
#23 0x000000353ec19e22 in Curl_read (conn=0x7f5c5404bb30, sockfd=<value optimized out>, buf=0x7f5c54010020 "", sizerequested=16384, n=0x7f5c5effc190) at sendf.c:579
#24 0x000000353ec2c14d in readwrite_data (conn=0x7f5c5404bb30, done=0x7f5c5effc24e) at transfer.c:396
#25 Curl_readwrite (conn=0x7f5c5404bb30, done=0x7f5c5effc24e) at transfer.c:1004
#26 0x000000353ec2dd47 in Transfer (data=0x7f5c5400f7f0) at transfer.c:1367
#27 Curl_do_perform (data=0x7f5c5400f7f0) at transfer.c:2053
#28 0x00007f5cbb6f6843 in do_curl_perform (self=0x7f5c54020640) at src/pycurl.c:1024

Comment 19 John Matthews 2011-06-21 21:14:59 UTC
From backtraces in Comment #17 and Comment #18 we are seeing crashes from a double free in:

 - pem_getPrivateKey
 if (modulus) {
        nss_ZFreeIf(modulus->data);  <-- Line 202 Crash Comment #18
        modulus->data = (void *) nss_ZAlloc(NULL, lpk->u.rsa.modulus.len);
        modulus->size = lpk->u.rsa.modulus.len;
        nsslibc_memcpy(modulus->data, lpk->u.rsa.modulus.data,
                       lpk->u.rsa.modulus.len);
    }

 - pem_PopulateModulusExponent
   nss_ZFreeIf(io->u.key.key.coefficient.data); <-- Line 302 Crash Comment #17
    io->u.key.key.coefficient.data =
        (void *) nss_ZAlloc(NULL, lpk->u.rsa.coefficient.len);

This bug 701678 comment #7 has a similar backtrace.
https://bugzilla.redhat.com/show_bug.cgi?id=701678#c7

Comment 20 John Matthews 2011-06-22 13:15:07 UTC
Created attachment 506000 [details]
Multithreaded script that crashes NSS

Comment 21 John Matthews 2011-06-22 15:00:34 UTC
RHEL-6
nss-3.12.9-9.el6.x86_64
libcurl-7.19.7-26.el6.x86_64
python-pycurl-7.19.0-5.el6.x86_64
curl-7.19.7-26.el6.x86_64

Ran Attachment in comment 20 and see below backtrace.
100 Threads fetching a file through curl with certs.
Crashes in under a minute.



Core was generated by `/usr/bin/python -tt ./crash-nss.py --cacert ./candlepin-ca.crt --cert ./ssl_cer'.
Program terminated with signal 6, Aborted.
#0  0x0000003b414329a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install python-2.6.5-3.el6_0.2.x86_64
(gdb) bt
#0  0x0000003b414329a5 in raise () from /lib64/libc.so.6
#1  0x0000003b41434185 in abort () from /lib64/libc.so.6
#2  0x0000003b4146fd5b in __libc_message () from /lib64/libc.so.6
#3  0x0000003b41475676 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fc9981dbe72 in nss_ZFreeIf (pointer=0x7fc9a8291380) at arena.c:975
#5  0x00007fc9981cd92a in pem_PopulateModulusExponent (io=0x7fc9b011bf60) at prsa.c:295
#6  0x00007fc9981cce9e in pem_FetchPrivKeyAttribute (io=0x7fc9b011bf60, type=<value optimized out>) at pobject.c:321
#7  0x00007fc9981cd3b5 in pem_mdCryptoOperationRSA_GetFinalLength (mdOperation=<value optimized out>, fwOperation=<value optimized out>, mdSession=<value optimized out>, fwSession=<value optimized out>, mdToken=<value optimized out>, 
    fwToken=<value optimized out>, mdInstance=0x7fc9983f8960, fwInstance=0x7fc9b00f0a00, pError=0x7fc975fcdc78) at prsa.c:421
#8  0x00007fc9981db2be in nssCKFWCryptoOperation_GetFinalLength (fwOperation=<value optimized out>, pError=<value optimized out>) at crypto.c:178
#9  0x00007fc9981d0f49 in nssCKFWSession_UpdateFinal (fwSession=0x7fc9a025b650, type=<value optimized out>, state=NSSCKFWCryptoOperationState_SignVerify, inBuf=<value optimized out>, inBufLen=36, outBuf=0x7fc9a0293200 "\310\001", 
    outBufLen=0x7fc975fcdd30) at session.c:2219
#10 0x00007fc9981d5cec in NSSCKFWC_Sign (fwInstance=<value optimized out>, hSession=<value optimized out>, pData=0x7fc975fcde60 "\304\325\067+\271\032X\373*\354\372Q\204\221\v\200\332\303~U\314mӴbuM\374B\312\001Bㇻ4", ulDataLen=36, 
    pSignature=0x7fc9a0293200 "\310\001", pulSignatureLen=0x7fc975fcdd30) at wrap.c:3822
#11 0x0000003e2fe46ba2 in PK11_Sign (key=0x7fc9a027c2f0, sig=0x7fc975fcde40, hash=0x7fc975fcdd90) at pk11obj.c:768
#12 0x0000003e30e0e620 in ssl3_SignHashes (hash=<value optimized out>, key=0x7fc9a027c2f0, buf=0x7fc975fcde40, isTLS=1) at ssl3con.c:887
#13 0x0000003e30e0f159 in ssl3_SendCertificateVerify (ss=0x7fc9a0119600) at ssl3con.c:4824
#14 ssl3_HandleServerHelloDone (ss=0x7fc9a0119600) at ssl3con.c:5736
#15 0x0000003e30e12860 in ssl3_HandleHandshakeMessage (ss=0x7fc9a0119600, b=0x7fc9a0120ecf "\272\376c3\254\230\266\322,F'\254\263UMn\036\365\016\305", '\f' <repeats 13 times>, ">\017$!\216\263", length=<value optimized out>) at ssl3con.c:8632
#16 0x0000003e30e13e30 in ssl3_HandleHandshake (ss=0x7fc9a0119600, cText=<value optimized out>, databuf=0x7fc9a0119968) at ssl3con.c:8727
#17 ssl3_HandleRecord (ss=0x7fc9a0119600, cText=<value optimized out>, databuf=0x7fc9a0119968) at ssl3con.c:9066
#18 0x0000003e30e148cc in ssl3_GatherCompleteHandshake (ss=0x7fc9a0119600, flags=0) at ssl3gthr.c:209
#19 0x0000003e30e14aaa in ssl3_GatherAppDataRecord (ss=0x7fc9a0119600, flags=0) at ssl3gthr.c:251
#20 0x0000003e30e1e262 in DoRecv (ss=0x7fc9a0119600, buf=0x7fc9a004f158 "", len=16384, flags=0) at sslsecur.c:552
#21 ssl_SecureRecv (ss=0x7fc9a0119600, buf=0x7fc9a004f158 "", len=16384, flags=0) at sslsecur.c:1151
#22 0x0000003e30e21fa2 in ssl_Recv (fd=<value optimized out>, buf=0x7fc9a004f158, len=16384, flags=0, timeout=300000) at sslsock.c:1591
#23 0x0000003e3063ef5d in Curl_nss_recv () from /usr/lib64/libcurl.so.4
#24 0x0000003e30637de3 in Curl_ssl_recv () from /usr/lib64/libcurl.so.4
#25 0x0000003e3061744c in Curl_read () from /usr/lib64/libcurl.so.4
#26 0x0000003e30629f72 in Curl_readwrite () from /usr/lib64/libcurl.so.4
#27 0x0000003e3062bd18 in Curl_perform () from /usr/lib64/libcurl.so.4
#28 0x00007fc9bf2cf73b in ?? () from /usr/lib64/python2.6/site-packages/pycurl.so
#29 0x0000003d502de81e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#30 0x0000003d502e05a4 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#31 0x0000003d502de61e in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#32 0x0000003d502df52d in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#33 0x0000003d502df52d in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#34 0x0000003d502e05a4 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#35 0x0000003d5026e9f0 in ?? () from /usr/lib64/libpython2.6.so.1.0
#36 0x0000003d50243e13 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#37 0x0000003d502592ef in ?? () from /usr/lib64/libpython2.6.so.1.0
#38 0x0000003d50243e13 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#39 0x0000003d502d8ac3 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
#40 0x0000003d5030b47a in ?? () from /usr/lib64/libpython2.6.so.1.0
#41 0x0000003b418077e1 in start_thread () from /lib64/libpthread.so.0
#42 0x0000003b414e153d in clone () from /lib64/libc.so.6

Comment 22 Kedar Bidarkar 2011-06-27 09:37:51 UTC
Was trying to sync cds multiple times, by restarting the goferd service.

At times the goferd process goes defunct. 
This prohibits the restart of goferd service. kill <process> fails.
May be it demands a reboot to start the goferd process.

[root@ip-10-86-250-248 subsys]# ps -ef | grep -i 5598
root      5598     1  0 Jun24 ?        00:08:33 [python] <defunct>
root     17319 16842  0 05:10 pts/0    00:00:00 grep -i 5598
[root@ip-10-86-250-248 subsys]# cd /var/run
[root@ip-10-86-250-248 run]# ls
abrt        autofs.fifo-misc  avahi-daemon  cron.reboot        hald            lvm             net-snmp  plymouth      rpcbind.sock   sepermit       sudo         xinetd.pid
abrtd.pid   autofs.fifo-net   console       dbus               haldaemon.pid   mdadm           nscd      pm-utils      rpc.statd.pid  setrans        syslogd.pid
atd.pid     autofs.pid        ConsoleKit    dhclient-eth0.pid  httpd           messagebus.pid  nslcd     rpcbind.lock  saslauthd      sm-notify.pid  utmp
auditd.pid  autofs-running    crond.pid     goferd.pid         irqbalance.pid  netreport       pluto     rpcbind.pid   screen         sshd.pid       vpnc
[root@ip-10-86-250-248 run]# cat goferd.pid 
5598
[root@ip-10-86-250-248 gofer]# ps -ef | grep -i goferd
root     17296 16842  0 05:05 pts/0    00:00:00 grep -i goferd

Comment 23 Jay Dobies 2011-07-26 16:01:37 UTC
This was ON_QA as of a while ago, I forgot to update the status.

Comment 24 Sachin Ghai 2011-07-28 12:19:30 UTC
This issue has also been fixed with new buils. I verified this with pulp 0.214.

Didn't see the goferd crash. I synched two large repos of rhel6 and CDS sync was successfull without goferd crash.

------------------------------------------------------------------------------
             -= Red Hat Update Infrastructure Management Tool =-


-= CDS Synchronization Status =-

Last Refreshed: 17:39:15
(updated every 5 seconds, ctrl+c to exit)


cds0149 ..................................................... [  UP  ]
cds162 ...................................................... [  UP  ]


Next Sync                    Last Sync                    Last Result         
------------------------------------------------------------------------------
cds0149
07-28-2011 18:59             07-28-2011 17:32             Success    

cds162
07-28-2011 20:39             07-28-2011 17:34             Success    


                                  Connected: dhcp201-136.englab.pnq.redhat.com
------------------------------------------------------------------------------

Comment 25 wes hayutin 2011-08-01 21:40:24 UTC
moving to release pending

Comment 26 wes hayutin 2012-05-31 12:56:42 UTC
closing out, product released


Note You need to log in before you can comment on or make changes to this bug.