Bug 612853

Summary: SIGSEGV inside malloc_consolidate in python running yum (RHEL6.0-20100707.4 x86_64)
Product: Red Hat Enterprise Linux 6 Reporter: Pavel Holica <pholica>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED DUPLICATE QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: chkr, dcantrell, dgregor, dmalcolm, dwalsh, notting, pmatilai
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: abrt_hash:ad6904f01b248fe592896df818a3b0a772bdfb92
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-26 15:40:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
Backtrace with debuginfo from coredump ("core.4651")
none
DSOs mapped in the process none

Description Pavel Holica 2010-07-09 08:55:27 UTC
abrt version: 1.1.8
architecture: x86_64
Attached file: backtrace
cmdline: /usr/bin/python /usr/bin/yum install emacs cnetworkmanager
component: yum
crash_function: malloc_consolidate
executable: /usr/bin/python
kernel: 2.6.32-44.el6.x86_64
package: yum-3.2.27-12.el6
rating: 3
reason: Process /usr/bin/python was killed by signal 11 (SIGSEGV)
release: Red Hat Enterprise Linux Server release 6.0 Beta (Santiago)
time: 1278664630
uid: 0

comment
-----
Yum crashes when running for first time. I've tried to install emacs, and yum
crashed while first package libXt-1.0.7-1.el6.x86_64 was installing.

How to reproduce
-----
1. Install RHEL6.0-20100707.4 x86_64 server from DVD with default packages
2. reboot
3. add yum http repository of RHEL6.0-20100707.4 x86_64 server
4. configure network using ifup and dhclient
5. run yum install emacs

Comment 1 Pavel Holica 2010-07-09 08:55:30 UTC
Created attachment 430573 [details]
File: backtrace

Comment 2 Pavel Holica 2010-07-09 09:02:15 UTC
I've tried yum-complete-transaction and yum crashed while installing emacs-common too.

Comment 4 Dave Malcolm 2010-07-09 17:51:44 UTC
Thanks for filing this bug report.

What is the output of "rpm -qa"?

How reproducible is this?  Are you able to generate a coredump of the crashing process?  A coredump would be most helpful ("ulimit -c unlimited")

Comment 5 Dave Malcolm 2010-07-09 20:20:28 UTC
Alternatively, are you able to run yum under "valgrind"?  Does this indicate problem areas?
# valgrind /usr/bin/yum

Comment 7 Pavel Holica 2010-07-12 10:08:41 UTC
I've been able to reproduce this bug in KVM on Fedora 13 machine with 2.6.33.5-124.fc13.x86_64 kernel with /sys/module/kvm_intel/parameters/ept set to "N".

I've tried this on x86_64 and ppc64 bare metal machines and it was ok.

Comment 8 Dave Malcolm 2010-07-15 23:11:30 UTC
Created attachment 432249 [details]
Backtrace with debuginfo from coredump ("core.4651")

Backtrace shows free() error "double free or corruption (out)" within libselinux deep inside running an rpm transaction, invoked from the python bindings by yum.

Comment 9 Dave Malcolm 2010-07-15 23:16:33 UTC
Code in question is line 266 below:

  261             }
  262
  263             if (prev_t2r_trans && strcmp(prev_t2r_trans, trans) == 0) {
  264                     *rawp = strdup(prev_t2r_raw);
  265             } else {
> 266                     free(prev_t2r_trans);
  267                     prev_t2r_trans = NULL;
  268                     free(prev_t2r_raw);
  269                     prev_t2r_raw = NULL;
  270                     if (trans_to_raw_context(trans, rawp))

"prev_t2r_trans" appears to be unavailable directly via coredump, but disassembly suggests it's in %r12, which is:
(gdb) p $r12
$1 = 64886592
(gdb) p (char*)64886592
$2 = 0x3de1740 ""

Defined at /usr/src/debug/libselinux-2.0.94/src/setrans_client.c:30
   30 static __thread security_context_t prev_t2r_trans = NULL;

Only written to via a call to strdup()

Comment 10 Dave Malcolm 2010-07-15 23:35:17 UTC
Similarities with bug 615102, but I'm not sure if they're duplicates at this stage.

Comment 11 Dave Malcolm 2010-07-15 23:53:08 UTC
Created attachment 432259 [details]
DSOs mapped in the process

Output from running:
gdb -c core.4651 --eval-command="python print '\n'.join([dso.filename for dso in gdb.objfiles()])" --batch|grep "^/" |sort > dsos.txt

Comment 12 Dave Malcolm 2010-07-19 21:10:40 UTC
(In reply to comment #7)
> I've been able to reproduce this bug in KVM on Fedora 13 machine with
> 2.6.33.5-124.fc13.x86_64 kernel with /sys/module/kvm_intel/parameters/ept set
> to "N".
> 
> I've tried this on x86_64 and ppc64 bare metal machines and it was ok.    

Pavel: how reproducible is this?  Are you able to reproduce it reliably on KVM guests?  Are you able to reproduce it at all on bare-metal?

CCing pmatilai and dwalsh: see attachment 432249 [details].  The backtrace from the coredump shows heap corruption seen in a call to selinux_trans_to_raw_context deep inside librpm as called by yum from the rpm-python bindings.  Any thoughts?

There could be a number of causes for this:
- I'm not familiar with the "__thread" annotation on variables (see comment #9); perhaps that's not working properly?  (would presumably need to be zeroed for every thread).  (Perhaps it did work, but I'm misreading the coredump).
- could be some kind of heap misuse going on elsewhere within the process.  How much testing under valgrind have, say, the most recent rpm-python bindings had?
- might relate to bug 615102

Comment 13 Pavel Holica 2010-07-20 08:01:38 UTC
It happens quite often, but randomly and not always.
I've tried to reproduce it twice now with no success.
From what I observed, when this bug occurs, it is very probable, that it will happen again even after new installation with new disk image. But when it doesn't occur, it's probable that everything will be ok next time too.

As mentioned in comment 7, I'm only able to reproduce it on KVM guest only.

Comment 14 Dave Malcolm 2010-07-20 15:32:38 UTC
Thanks for the info.

Does it happen just on one specific host?  Could we be seeing a hardware problem?

> From what I observed, when this bug occurs, it is very probable, that it will
> happen again even after new installation with new disk image. But when it
> doesn't occur, it's probable that everything will be ok next time too.
Could this relate to the state of the hypervisor?   Is the clumping of success/failure related to, say, reboots of the hypervisor?

Comment 15 Pavel Holica 2010-07-22 11:50:35 UTC
I'm not able to reproduce this bug any more, so I can't tell. Last time I was able to reproduce it was in comment 7. Since that time, I've performed yum update on my system (both kernel and libvirt were updated).
I've tried to reproduce this bug many times with no success.

Comment 16 Dave Malcolm 2010-07-23 14:50:21 UTC
This may be a duplicate of bug 608710

Comment 17 Dave Malcolm 2010-07-26 15:40:59 UTC
Given that this only happened on KVM hardware, it looks like it was a duplicate of bug 607650.

*** This bug has been marked as a duplicate of bug 607650 ***