Bug 521321 - processes left in 'X' state after being killed on guest
Summary: processes left in 'X' state after being killed on guest
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Markus Armbruster
QA Contact: Lawrence Lim
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
 
Reported: 2009-09-04 18:32 UTC by Don Zickus
Modified: 2014-03-26 01:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-02 12:25:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
xml reproducer for RHTS (9.75 KB, text/plain)
2009-12-17 14:48 UTC, Jeff Burke
no flags Details
left behind in /tmp by make run (13.67 KB, application/octet-stream)
2009-12-17 17:51 UTC, Markus Armbruster
no flags Details

Description Don Zickus 2009-09-04 18:32:19 UTC
Description of problem:
I was running my normal kernel sanity tests on KVM guests and noticed some
tests were exceeding their expected time allotment.  Upon investigation, I
noticed on one particular test, the state of the processes were in the 'X'
state.  This shouldn't be noticable because I think the 'X' state happens so
quickly that it is hard to capture on one process let alone the 10 I see on
my test.  

A side effect of the RHTS infrastructure is that if a test takes too long, a
local watchdog comes in and kills the test, performs an alt-sysrq-T (for
task info) and an alt-sysrq-w (for cpu stack info).  This allows us to see
what is going on that may help explain why the test timed out.

For this particular problem,
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=9861967
will show that output.  Notice all the 'pttest's in the 'X' state.

So a couple of issues here.  One we didn't expect the test to take this long
and two, why are these processes being stuck in the 'X' state.

Version-Release number of selected component (if applicable):
RHEL-5.4 the 08192009 tree (which is what I think RHEL-5.4 GA is).

How reproducible:
I am 3 for 3

Steps to Reproduce:
1.  install RHEL-5.4 GA
2.  install the pttest
(http://rhts.redhat.com/rpms/development/noarch/noarch/rh-tests-kernel-standards-pttest-1.2-11.noarch.rpm)
(it may need other rpms from that host)
3.  cd /mnt/tests/kernel/standards/pttest
4.  make run
5.  after 20 minutes kill it (killall pttest??)

Actual results:
processes hang

Expected results:
processes to be cleaned up

Additional info:

Comment 1 Markus Armbruster 2009-12-16 13:43:12 UTC
Re "(it may need other rpms from that host)": got a handy .repo for that?
Chasing deps by hand gets old pretty quick.

Comment 2 Don Zickus 2009-12-16 14:55:49 UTC
If you use the rhts reservation mechanism, then the machine automagically has a repo installed for you and doing a 'yum install rh-tests-kernel-standards-pttest' would get the test and all of its dependencies.

Otherwise something like the below might work, I normally don't work outside of rhts and never see the repo files, but ..

[rhts]
name=Red Hat Test Suite - $basearch - Base
baseurl=http://qafiler.boston.redhat.com/rhts/prod
enabled=0
gpgcheck=0


[rhts-testing]
name=Red Hat Test Suite - $basearch - Testing
baseurl=http://qafiler.boston.redhat.com/rhts/devel
enabled=0
gpgcheck=0

[rhts-tests]
name=Red Hat Test Suite - $basearch - Testing
baseurl=http://rhts.redhat.com/rpms/development/noarch/noarch
enabled=1
gpgcheck=0

Comment 3 Markus Armbruster 2009-12-16 16:00:27 UTC
Installs a bunch with s/\.boston\./.bos./g in baseurl.  Thanks!

Comment 4 Markus Armbruster 2009-12-16 17:37:57 UTC
I tried following the steps to reproduce, but for me "make run" executes in less than 7s.

# make run
chmod a+x ./runtest.sh
./runtest.sh
***** Start of runtest.sh *****
***** running test with 498 MB *****
./runtest.sh: line 42: [: missing `]'       <--- suspicious
***** Running for 1000 *****
[...]
***** End of ulimit settings *****
***** End of runtest.sh *****
/kernel/standards/pttest result: PASS
   metric: 2
   Log: /tmp/tmp.Z15224

It leaves some droppings in /tmp.  Want me to attach them?

Comment 5 Don Zickus 2009-12-16 18:32:08 UTC
(In reply to comment #4)
> I tried following the steps to reproduce, but for me "make run" executes in
> less than 7s.
> 
> # make run
> chmod a+x ./runtest.sh
> ./runtest.sh
> ***** Start of runtest.sh *****
> ***** running test with 498 MB *****
> ./runtest.sh: line 42: [: missing `]'       <--- suspicious
> ***** Running for 1000 *****

That is suspicious, but that piece of code only checks if the test should run 10,000 times or 1,000.  Yours ran 1,000 instead of 10,000.  You could probably hand edit line 44 for that if need be.  Regardless..

> [...]
> ***** End of ulimit settings *****
> ***** End of runtest.sh *****
> /kernel/standards/pttest result: PASS
>    metric: 2
>    Log: /tmp/tmp.Z15224
> 
> It leaves some droppings in /tmp.  Want me to attach them?  

No, just noise.  The fact that your make run completed means you didn't have the same results I did.  Are you using a 5.4 distro?  I think 5.5 has changes to userspace that probably makes this problem go away.

Comment 6 Markus Armbruster 2009-12-17 08:26:43 UTC
I use repo http://porkchop.redhat.com/released/RHEL-5-Server/U4/x86_64/os/Server

Hmm, looks like I accidentally used a newer qemu.  With the one from the repo, I get

***** End of ulimit settings *****
***** End of runtest.sh *****
/kernel/standards/pttest result: PASS
   metric: 2
   Log: /tmp/tmp.js2216
   DMesg: /tmp/dmesg.log
/kernel/standards/pttest/dmesg result: FAIL
   Log: /tmp/tmp.rhts-db-submit-result.y19Bpe

Comment 7 Don Zickus 2009-12-17 14:36:17 UTC
That failure is probably not the same thing, though if you attach the dmesg log I can double check.

Adding Jeff B.  Maybe the way we have the guests setup causes this problem.

Comment 8 Jeff Burke 2009-12-17 14:48:09 UTC
Created attachment 379012 [details]
xml reproducer for RHTS

Attached is the xml reproducer we used for that job listed in comment #1

Comment 9 Markus Armbruster 2009-12-17 17:51:25 UTC
Created attachment 379063 [details]
left behind in /tmp by make run

Comment 10 Don Zickus 2009-12-17 18:09:52 UTC
Lovely, a broken harddrive.  

Jeff, do we have another machine?


Note You need to log in before you can comment on or make changes to this bug.