Bug 614029

Summary: [RHEL6][Kernel] Segfaults seen while running tests in i386 KVM guest
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Burke <jburke>
Component: kernelAssignee: Jes Sorensen <Jes.Sorensen>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: amit.shah, arozansk, bburns, lihuang, llim, mjenner, notting, pbunyan, syeghiay, tburke
Target Milestone: rc   
Target Release: 6.1   
Hardware: All   
OS: Linux   
URL: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=15306436
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 21:03:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580953    

Description Jeff Burke 2010-07-13 14:27:56 UTC
Description of problem:
 While testing the latest kernel. We had a 32 bit KVM guest get a bunch of segfaults when running the connectathon test

Version-Release number of selected component (if applicable):
 kernel 2.6.32-47.el6
 RHEL6.0-Beta2-5.0-Server

How reproducible:
 Unknown

Actual results:
Running as user "root" and group "root". This could be dangerous.
Capturing on eth0
===== Starting 'nfsvers=2_tcp' test 'general' =====
rup: rhel4-nfs: RPC: Unable to receive; errno = No route to host

----- Server load  -----
----- start: Tue Jul 13 00:45:29 EDT 2010 -----
./server -g -F nfs -onfsvers=2,tcp -p /export/home rhel4-nfs
Start tests on path /mnt/rhel4-nfs/dhcp70-241.test [y/n]? 
sh ./runtests  -g -t /mnt/rhel4-nfs/dhcp70-241.test

GENERAL TESTS: directory /mnt/rhel4-nfs/dhcp70-241.test
make[1]: Entering directory `/mnt/tests/kernel/filesystems/nfs/connectathon/cthon04/general'
if test ! -x runtests; then chmod a+x runtests; fi
cd /mnt/rhel4-nfs/dhcp70-241.test; rm -f Makefile runtests runtests.wrk *.sh *.c mkdummy rmdummy nroff.in makefile.tst
cp Makefile runtests runtests.wrk *.sh *.c mkdummy rmdummy nroff.in makefile.tst /mnt/rhel4-nfs/dhcp70-241.test
make[1]: Leaving directory `/mnt/tests/kernel/filesystems/nfs/connectathon/cthon04/general'

Small Compile
	0.1 (0.0) real	0.0 (0.0) user	0.0 (0.0) sys

Tbl
	0.0 (0.0) real	0.0 (0.0) user	0.0 (0.0) sys

Nroff
	0.0 (0.0) real	0.0 (0.0) user	0.0 (0.0) sys

Large Compile
	0.1 (0.0) real	0.0 (0.0) user	0.0 (0.0) sys

Four simultaneous large compiles
./server: line 149: 12631 Segmentation fault      sh ./runtests $passarg $TEST $TESTARG $NFSTESTDIR
Tests failed, leaving /mnt/rhel4-nfs mounted
----- end: Tue Jul 13 00:45:33 EDT 2010 -----
rup: rhel4-nfs: RPC: Unable to receive; errno = No route to host

----- Server load  -----
----- return code: 1 -----
1609 packets captured


Console log:

device eth0 entered promiscuous mode
SELinux: initialized (dev 0:14, type nfs), uses genfs_contexts
cc1[12914]: segfault at 101000d ip 0091c3da sp bfa0f924 error 4 in libc-2.12.so[8a5000+185000]
cc[12909]: segfault at 72727272 ip 72727272 sp bffa8c1c error 14 in locale-archive[b74fa000+200000]
cc1[12913]: segfault at 7171717d ip 009173dd sp bfc6b064 error 4 in libc-2.12.so[8a5000+185000]
cc1[12912]: segfault at 63636363 ip 63636363 sp bfbf41dc error 14
cc[12911]: segfault at 0 ip (null) sp bfa7bf2c error 14
sh[12633]: segfault at 4 ip 0806ca71 sp bfe5b110 error 6 in bash[8047000+d1000]
sh[12631]: segfault at 79797985 ip 08092d5e sp bf861d50 error 4 in bash[8047000+d1000]
device eth0 left promiscuous mode

Expected results:
This test should pass

Additional info:
this was seen on 1 of the two test runs and only on i386 so far

Comment 5 Jes Sorensen 2010-07-29 09:20:47 UTC
Jeff,

Please try with qemu-kvm-0.12.1.2-2.104.el6.x86_64, -90 is old.

Jes

Comment 12 Amit Shah 2010-08-04 08:30:40 UTC
The logs mention NFS timeouts, Jeff, is this the same machine where you see installs fail because of NFS timeouts?

The logs here also mention eth0 left promisc mode, a cause for some of the bugs.

Does putting the host's network interface back in promisc mode make the tests run fine?

Can you tell us what type the host network interface is?

(This could turn out to be the same as bug 619151.)

Comment 13 Jeff Burke 2010-08-04 13:08:19 UTC
Amit,
Q: is this the same machine where you see installs fail because of NFS timeouts?
A: No this is a different host. 

Q: Does putting the host's network interface back in promisc mode make the tests
run fine?
A: Unknown for two reasons. Not sure it has been reproduced. Two the test is written to do this on purpose. When we run the connectathon test we use ethereal to capture the packets. That when when the test fails the NFS folks have a place to look. The test has always done this so I don't that think ethereal putting the interface in promisc mode caused the segfaults.

This BZ is not for networking going down or the network not working. This BZ is for " Segfaults seen while running tests in i386 KVM guest"

Comment 14 Jeff Burke 2010-08-04 13:25:36 UTC
According to inventory the network interface is using tg3.

Comment 16 Martin Jenner 2010-08-06 22:50:13 UTC
Ran job 

  https://beaker.engineering.redhat.com/jobs/11055

Did not hit the Seg Fault issue. Need to ping jburke to see if the failures are known issues.

Comment 19 Lawrence Lim 2010-08-12 03:59:35 UTC
Since we are not able to reproduce, we will need help from the reporter.

Comment 20 Jeff Burke 2010-08-13 13:12:57 UTC
Lawrence,
  What is it that you need from me?

Comment 21 Jes Sorensen 2010-08-22 07:47:34 UTC
Since we don't seem to be able to reproduce this bug, should we move
it to 6.1?

Jes

Comment 23 Jes Sorensen 2010-11-10 17:32:47 UTC
Jeff,

Are you able to reproduce this bug with current RHEL6 bits, or can we
close it?

Thanks,
Jes

Comment 24 Jes Sorensen 2010-11-10 17:34:17 UTC
CONDNACK for now, until we are able to reproduce it again. If not we should
close it.

Jes

Comment 25 Jeff Burke 2010-11-10 18:47:51 UTC
Jes,
 I would recommend closing it as CURRENTRELEASE. If we see it again we can reopen or open a new BZ. Thanks for checking.

Jeff

Comment 26 Jes Sorensen 2010-11-10 21:03:28 UTC
Closing per Jeff's last comment.

Please reopen if you see this again.

Jes