Description of problem: I was running my normal kernel sanity tests on KVM guests and noticed some tests were exceeding their expected time allotment. Upon investigation, I noticed on one particular test, the state of the processes were in the 'X' state. This shouldn't be noticable because I think the 'X' state happens so quickly that it is hard to capture on one process let alone the 10 I see on my test. A side effect of the RHTS infrastructure is that if a test takes too long, a local watchdog comes in and kills the test, performs an alt-sysrq-T (for task info) and an alt-sysrq-w (for cpu stack info). This allows us to see what is going on that may help explain why the test timed out. For this particular problem, http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=9861967 will show that output. Notice all the 'pttest's in the 'X' state. So a couple of issues here. One we didn't expect the test to take this long and two, why are these processes being stuck in the 'X' state. Version-Release number of selected component (if applicable): RHEL-5.4 the 08192009 tree (which is what I think RHEL-5.4 GA is). How reproducible: I am 3 for 3 Steps to Reproduce: 1. install RHEL-5.4 GA 2. install the pttest (http://rhts.redhat.com/rpms/development/noarch/noarch/rh-tests-kernel-standards-pttest-1.2-11.noarch.rpm) (it may need other rpms from that host) 3. cd /mnt/tests/kernel/standards/pttest 4. make run 5. after 20 minutes kill it (killall pttest??) Actual results: processes hang Expected results: processes to be cleaned up Additional info:
Re "(it may need other rpms from that host)": got a handy .repo for that? Chasing deps by hand gets old pretty quick.
If you use the rhts reservation mechanism, then the machine automagically has a repo installed for you and doing a 'yum install rh-tests-kernel-standards-pttest' would get the test and all of its dependencies. Otherwise something like the below might work, I normally don't work outside of rhts and never see the repo files, but .. [rhts] name=Red Hat Test Suite - $basearch - Base baseurl=http://qafiler.boston.redhat.com/rhts/prod enabled=0 gpgcheck=0 [rhts-testing] name=Red Hat Test Suite - $basearch - Testing baseurl=http://qafiler.boston.redhat.com/rhts/devel enabled=0 gpgcheck=0 [rhts-tests] name=Red Hat Test Suite - $basearch - Testing baseurl=http://rhts.redhat.com/rpms/development/noarch/noarch enabled=1 gpgcheck=0
Installs a bunch with s/\.boston\./.bos./g in baseurl. Thanks!
I tried following the steps to reproduce, but for me "make run" executes in less than 7s. # make run chmod a+x ./runtest.sh ./runtest.sh ***** Start of runtest.sh ***** ***** running test with 498 MB ***** ./runtest.sh: line 42: [: missing `]' <--- suspicious ***** Running for 1000 ***** [...] ***** End of ulimit settings ***** ***** End of runtest.sh ***** /kernel/standards/pttest result: PASS metric: 2 Log: /tmp/tmp.Z15224 It leaves some droppings in /tmp. Want me to attach them?
(In reply to comment #4) > I tried following the steps to reproduce, but for me "make run" executes in > less than 7s. > > # make run > chmod a+x ./runtest.sh > ./runtest.sh > ***** Start of runtest.sh ***** > ***** running test with 498 MB ***** > ./runtest.sh: line 42: [: missing `]' <--- suspicious > ***** Running for 1000 ***** That is suspicious, but that piece of code only checks if the test should run 10,000 times or 1,000. Yours ran 1,000 instead of 10,000. You could probably hand edit line 44 for that if need be. Regardless.. > [...] > ***** End of ulimit settings ***** > ***** End of runtest.sh ***** > /kernel/standards/pttest result: PASS > metric: 2 > Log: /tmp/tmp.Z15224 > > It leaves some droppings in /tmp. Want me to attach them? No, just noise. The fact that your make run completed means you didn't have the same results I did. Are you using a 5.4 distro? I think 5.5 has changes to userspace that probably makes this problem go away.
I use repo http://porkchop.redhat.com/released/RHEL-5-Server/U4/x86_64/os/Server Hmm, looks like I accidentally used a newer qemu. With the one from the repo, I get ***** End of ulimit settings ***** ***** End of runtest.sh ***** /kernel/standards/pttest result: PASS metric: 2 Log: /tmp/tmp.js2216 DMesg: /tmp/dmesg.log /kernel/standards/pttest/dmesg result: FAIL Log: /tmp/tmp.rhts-db-submit-result.y19Bpe
That failure is probably not the same thing, though if you attach the dmesg log I can double check. Adding Jeff B. Maybe the way we have the guests setup causes this problem.
Created attachment 379012 [details] xml reproducer for RHTS Attached is the xml reproducer we used for that job listed in comment #1
Created attachment 379063 [details] left behind in /tmp by make run
Lovely, a broken harddrive. Jeff, do we have another machine?