Bug 843449 - rare coredump when running python script using threading module
rare coredump when running python script using threading module
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: python (Show other bugs)
Unspecified Unspecified
unspecified Severity medium
: rc
: ---
Assigned To: Dave Malcolm
BaseOS QE - Apps
: Regression
Depends On:
  Show dependency treegraph
Reported: 2012-07-26 07:13 EDT by Jan Hutař
Modified: 2013-03-21 01:07 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-03-21 01:07:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jan Hutař 2012-07-26 07:13:00 EDT
Description of problem:
I'm getting rare coredump when running python script using threading module.

Version-Release number of selected component (if applicable):

How reproducible:
about one of 20 attempts

Steps to Reproduce:
1. Use threading module repeatedly

Actual results:
11861 Segmentation fault      (core dumped)

Expected results:
Should work
Comment 3 Jan Hutař 2012-07-26 07:27:57 EDT
Looking at this more, looks like issue is not so rare as noted in initial comment.

From log of our results, it looks like this issue first appeared 2012-07-19. Looks like errata RHSA-2012:0745 which contains python-2.4.3-46.el5_8.2 was released 2012-06-18 and errata RHSA-2012:1097 which have glibc-2.5-81.el5_8.4 was released 2012-07-18 => Adding "Regression" keyword and switching component to glibc.
Comment 4 Jeff Law 2012-07-26 14:38:05 EDT
This looks more like an openssl or python problem to me.

(gdb) x/i $pc
0x38d347ae4b <memcpy+347>:      rep movsq %ds:(%rsi),%es:(%rdi)
(gdb) p/x $rcx
$12 = 0x711

So we're in the middle of a copy, with some bytes reamining.

(gdb) p/x $rsi
$13 = 0x38d6b51000
(gdb) x/x $rsi
0x38d6b51000:   Cannot access memory at address 0x38d6b51000

The source address points to an unmapped page and in fact is right on a page boundary indicating we were probably iterating up through addresses until we hit the end of mapped addresses then immediately faulted.

(gdb) up
#1  0x00000038d68dea8e in __memcpy_ichk (c=0x2aaaac0113a0, data_=0x38d6b51000, 
    len=18446744073709551611) at /usr/include/bits/string3.h:51
51        return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) up
#2  SHA1_Update (c=0x2aaaac0113a0, data_=0x38d6b51000, 
    len=18446744073709551611) at ../md32_common.h:316
316                             memcpy (p+n,data,len);
(gdb) p data
$14 = (const unsigned char *) 0x38d6b50884 ""

And the address corresponds reasonably well to what we find in SHA1_Update from libcrypt.

(gdb) p len
$19 = 18446744073709551611
(gdb) p/x len
$25 = 0xfffffffffffffffb
Wow...  OK, now that seems interesting.

(gdb) up
#3  0x00000038d68db6d9 in ssleay_rand_add (buf=0x38d690be32, num=20, add=0)
    at md_rand.c:269
269                             MD_Update(&m,&(state[st_idx]),j-k);

(gdb) p j
$28 = <value optimized out>
(gdb) p k
$29 = 25

Damn, no value for "j".  However, we might be able to recover it:

259             for (i=0; i<num; i+=MD_DIGEST_LENGTH)
260                     {
261                     j=(num-i);
262                     j=(j > MD_DIGEST_LENGTH)?MD_DIGEST_LENGTH:j;
264                     MD_Init(&m);
265                     MD_Update(&m,local_md,MD_DIGEST_LENGTH);
266                     k=(st_idx+j)-STATE_SIZE;
267                     if (k > 0)
268                             {
269                             MD_Update(&m,&(state[st_idx]),j-k);
270                             MD_Update(&m,&(state[0]),k);
271                             }

line #261.

(gdb) p num
$33 = 20
(gdb) p i
$34 = 0

So the value of "j" is 20.  20 - 25 is -5, which is


Which corresponds to the len parameter in SHA1_Update and which is passed as the length to memcpy.

This is clearly not a glibc problem, but a problem higher up in the chain, most likely openssl.  Reassigned to openssl.
Comment 5 Tomas Mraz 2012-07-26 15:29:21 EDT
I suppose it might be caused by improper locking (f. e. no locking callbacks initialized in openssl).
Comment 8 Tomas Mraz 2013-02-25 06:52:17 EST
I suppose python is not setting up the thread locks properly when it uses OpenSSL.
Comment 11 RHEL Product and Program Management 2013-03-21 01:07:04 EDT
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.