Bug 843449 - rare coredump when running python script using threading module
Summary: rare coredump when running python script using threading module
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: python
Version: 5.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Dave Malcolm
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-07-26 11:13 UTC by Jan Hutař
Modified: 2013-03-21 05:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-21 05:07:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jan Hutař 2012-07-26 11:13:00 UTC
Description of problem:
I'm getting rare coredump when running python script using threading module.


Version-Release number of selected component (if applicable):
python-2.4.3-46.el5_8.2
glibc-2.5-81.el5_8.4


How reproducible:
about one of 20 attempts


Steps to Reproduce:
1. Use threading module repeatedly


Actual results:
11861 Segmentation fault      (core dumped)


Expected results:
Should work

Comment 3 Jan Hutař 2012-07-26 11:27:57 UTC
Looking at this more, looks like issue is not so rare as noted in initial comment.

From log of our results, it looks like this issue first appeared 2012-07-19. Looks like errata RHSA-2012:0745 which contains python-2.4.3-46.el5_8.2 was released 2012-06-18 and errata RHSA-2012:1097 which have glibc-2.5-81.el5_8.4 was released 2012-07-18 => Adding "Regression" keyword and switching component to glibc.

Comment 4 Jeff Law 2012-07-26 18:38:05 UTC
This looks more like an openssl or python problem to me.

(gdb) x/i $pc
0x38d347ae4b <memcpy+347>:      rep movsq %ds:(%rsi),%es:(%rdi)
(gdb) p/x $rcx
$12 = 0x711


So we're in the middle of a copy, with some bytes reamining.

(gdb) p/x $rsi
$13 = 0x38d6b51000
(gdb) x/x $rsi
0x38d6b51000:   Cannot access memory at address 0x38d6b51000


The source address points to an unmapped page and in fact is right on a page boundary indicating we were probably iterating up through addresses until we hit the end of mapped addresses then immediately faulted.


(gdb) up
#1  0x00000038d68dea8e in __memcpy_ichk (c=0x2aaaac0113a0, data_=0x38d6b51000, 
    len=18446744073709551611) at /usr/include/bits/string3.h:51
51        return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) up
#2  SHA1_Update (c=0x2aaaac0113a0, data_=0x38d6b51000, 
    len=18446744073709551611) at ../md32_common.h:316
316                             memcpy (p+n,data,len);
(gdb) p data
$14 = (const unsigned char *) 0x38d6b50884 ""

And the address corresponds reasonably well to what we find in SHA1_Update from libcrypt.


(gdb) p len
$19 = 18446744073709551611
(gdb) p/x len
$25 = 0xfffffffffffffffb
Wow...  OK, now that seems interesting.

(gdb) up
#3  0x00000038d68db6d9 in ssleay_rand_add (buf=0x38d690be32, num=20, add=0)
    at md_rand.c:269
269                             MD_Update(&m,&(state[st_idx]),j-k);


(gdb) p j
$28 = <value optimized out>
(gdb) p k
$29 = 25

Damn, no value for "j".  However, we might be able to recover it:


259             for (i=0; i<num; i+=MD_DIGEST_LENGTH)
260                     {
261                     j=(num-i);
262                     j=(j > MD_DIGEST_LENGTH)?MD_DIGEST_LENGTH:j;
263
264                     MD_Init(&m);
(gdb) 
265                     MD_Update(&m,local_md,MD_DIGEST_LENGTH);
266                     k=(st_idx+j)-STATE_SIZE;
267                     if (k > 0)
268                             {
269                             MD_Update(&m,&(state[st_idx]),j-k);
270                             MD_Update(&m,&(state[0]),k);
271                             }



line #261.

(gdb) p num
$33 = 20
(gdb) p i
$34 = 0


So the value of "j" is 20.  20 - 25 is -5, which is

0xfffffffffffffffb

Which corresponds to the len parameter in SHA1_Update and which is passed as the length to memcpy.

This is clearly not a glibc problem, but a problem higher up in the chain, most likely openssl.  Reassigned to openssl.

Comment 5 Tomas Mraz 2012-07-26 19:29:21 UTC
I suppose it might be caused by improper locking (f. e. no locking callbacks initialized in openssl).

Comment 8 Tomas Mraz 2013-02-25 11:52:17 UTC
I suppose python is not setting up the thread locks properly when it uses OpenSSL.

Comment 11 RHEL Program Management 2013-03-21 05:07:04 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.