Description of problem: PPC64 systems are seen to fail in a consistent manner via rhts: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2980305 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2949623 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2949542 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2951973 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2949836 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2949793 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2949793 These show both syscalls and filter test failures... Version-Release number of selected component (if applicable): 2.4.21-70.EL How reproducible: 75% Steps to Reproduce: 1. Run the aforementioned rhts test on i386 2. 3. Actual results: Expected results: Additional info: This may be related to the bz 444000 which is against the filter subtest w/in the same parent test (audit-test.) As this is a consistent failure and we have no data to say it is a test or kernel failure, I am marking it as a blocker to prompt futher investigation.
Hey Josh -- would you mind updating this bz with the info we talked about the other day? Thanks :) P.
As noted in the bug form, this is ppc squad7-lp1.rhts.bos.redhat.com http://rhts.redhat.com/testlogs/22699/80095/667442/sys.log.gz process_attrs open basic_bad squad3.rhts.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2948317 process_attrs open basic_bad ibm-js20-01.lab.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2948311 process_attrs ibm-js20-02.lab.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2948314 process_attrs ibm-js20-01.lab.bos.redhat.com http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2948311 /kernel/syscalls/scrashme/multiple **Out of Memory Error
I looked at the first log in comment#0. It seems that augrok didn't find what it was looking for. The logs seem to have it, though. So, I wonder if this is a race between doing ptrace and having results on disk? If it is, maybe it just needs a sleep added. Is the ppc machines slow to where they need more time? Are they shared by virtualization to where the kernel is not getting enough run time?
------- Comment From dvelarde.com 2008-06-26 17:18 EDT------- Hi Steve, I don't have access to view the links. Is this one of the audit filter testcases we used for EAL4 certification? -debbie
------- Comment From tbreeds.com 2008-06-26 23:57 EDT------- (In reply to comment #3) > Hi Steve, > > I don't have access to view the links. Is this one of the audit filter testcases > we used for EAL4 certification? Can we get the logs posted somewhere we can get at them?
------- Comment From bpeters.com 2008-06-27 11:24 EDT------- At this point in time, the most likely cause of the problem appears to be a timing issue in the Redhat Test Suite itself. I'm working with RH to work out a test case to positively identify the cause of the problem. If anyone has a nagging curiosity and just really wants to tear into this, I've posted several complete sets of logs from the test results here: http://people.redhat.com/bpeters/syscall_45928/
------- Comment From bpeters.com 2008-07-17 11:53 EDT------- We've been kicking around discussion on this between Steve Grubb, Jeff Burke, Peter M. and myself on this topic, and have not been able to conclusively prove it to be an RHTS problem (or, for that matter, that it is not). Mike, at this point, I think the next step is to find someone that can spend half a day or so doing further testing. If we can grab an identical system/RHEL level and show that >1 of the failing tests work outside of RHTS, we'll have pretty good evidence that it is indeed a RHTS problem.
------- Comment From mjwolf.com 2008-07-17 12:06 EDT------- do we have the rhts test suite at IBM?
The test suite within RHTS that is failing can be found here: http://sourceforge.net/projects/audit-test/
------- Comment From dvelarde.com 2008-07-17 12:38 EDT------- It looks like the testcases Steve pointed to were developed by HP, not the ones we created and shared with RedHat.
True, but they had been running fine.
------- Comment From dvelarde.com 2008-07-17 12:44 EDT------- BTW, I just mention that the testcase was created by HP because I had thought we could try to reproduce the problem by running our filter testcases with our normal setup. But they are not the same testcases.
------- Comment From mjwolf.com 2008-07-17 13:02 EDT------- is the system to that I should test with a js20?
------- Comment From bpeters.com 2008-07-17 14:06 EDT------- Mike, I've walked back thru some of the RHTS logs, and it looks like the problems have been witnessed on a variety of systems, including: Two different P5's: models 911051A, 9117570, as well as on a JS20 This seems to be further evidence the problem is in RHTS; I don't see how such a widely impacting problem could have gone unnoticed for so long.
I don't see this issue anymore. Running the latest kernel on RHEL4.7 distro I see the syscalls tests are passing now. - http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3739725 - http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3739736 - http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=3739835 This bug should be closed "CURRENTRELEASE" NOTE: There is still failures with PPC and the audit-test. Specifically the filter/process_attrs test. But that should be a entered in as a new BZ.
Closing per jburke comments above