Bug 112195

Summary: /tset/LSB.os/mfiles/mmap_P/T.mmap_P hangs during an LSB run on x86
Product: Red Hat Enterprise Linux 3 Reporter: Martin Jenner <mjenner>
Component: kernelAssignee: Ernie Petrides <petrides>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: high    
Version: 3.0CC: bpeck, dff, jturner, tburke
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-06 00:05:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Jenner 2003-12-15 21:50:36 UTC
Description of problem:


/tset/LSB.os/mfiles/mmap_P/T.mmap_P hangs during an LSB test run on
x86 platform.


Version-Release number of selected component (if applicable):

HP Proliant DL 560 4CPU Xeon 2GHz/3GB memory

$ rpm -q redhat-lsb lsb-runtime-test
redhat-lsb-1.3-3
lsb-runtime-test-1.3.6-3

Installed tree: RHEL3-U1-re1211.nightly/i386/i386-as Everything.

kernel: Linux pro4.lab.boston.redhat.com 2.4.21-6.ELsmp #1 SMP Tue Dec
9 14:53:23 EST 2003 i686 i686 i386 GNU/Linux

How reproducible:

I am re-running a full LSB run now, but using a reduced scen.exec file
I was able to reproduce this every time.


Steps to Reproduce:
1.
2.
3.

I ran this through the Test Grid but it can be run by hand as follows:

# rpm -ivh lsb-runtime-test-1.3.6-3.i386.rpm

o Set the password for the vsx0 user ( typically set to vsx0).

  # passwd vsx0


o Start the LSB tests. Takes about 6-8 hours to run depending on your
system.

   Login as the vsx0 user.

   IMPORTANT:
   Actually log in as user vsx0 ---- DO NOT ---- "su - vsx0"

   $ run_tests

   Take defaults for all except where input is required. 
 

Actual results:

/tset/LSB.os/mfiles/mmap_P/T.mmap_P hangs



Expected results:

/tset/LSB.os/mfiles/mmap_P/T.mmap_P completes allowing harness to move
onto further tests.

Additional info:

Comment 1 Martin Jenner 2004-01-08 15:47:54 UTC
This regression is still in the latest build tree.

Installed tree RHEL3-U1-re0107.0/i386/i386-as Everything

# rpm -q redhat-lsb lsb-runtime-test
redhat-lsb-1.3-3
lsb-runtime-test-1.3.6-3

Comment 2 Bill Peck 2004-03-02 15:48:35 UTC
This regression is still present with U2.

kernel-2.4.21-9.10.EL
redhat-lsb-1.3-3
lsb-runtime-test-1.3.6-3


Comment 3 Bill Peck 2004-03-02 18:37:18 UTC
Donald - Can you add this to the U2 Blocker list

Comment 4 Mark DeWandel 2004-03-11 14:27:03 UTC
The two T.mmap_P processes are not actually hung but rather are
chewing up cpu like crazy in the signal path.  They both have signal
handlers registered for SIGSEGV and are apparently faulting on a
bad address in user space.  The signal handler fires, returns, and
the bad dereference is re-executed causing the cycle to continue
indefinitely.  From the evidence, I cannot determine whether this
is a test design problem or a latent issue elicited by a kernel
regression but I lean toward the former given the failure mode.
Bill, can you help me obtain the sources for this test?

Comment 5 Martin Jenner 2004-03-11 15:22:34 UTC
A copy of the tests can be obtained via:

 # wget - N \
http://tg1.boston.redhat.com/tg/tests/lsb-runtime-test-1.3.6-3.i386.rpm

Install the collected rpm:

  # rpm -ivh lsb-runtime-test-1.3.6-3.i386.rpm

The sources are installed from the rpm in user account vsx0

  # su - vsx0



Comment 7 Tim Burke 2005-01-31 20:10:38 UTC
This bug report is ancient. Is it still a relevent issue?


Comment 8 Jay Turner 2005-04-18 09:22:28 UTC
Ping?

Comment 9 Ernie Petrides 2005-04-18 17:35:31 UTC
Haven't yet had time to investigate this.


Comment 10 Ernie Petrides 2005-06-15 02:59:54 UTC
Martin, could you please set up this test for me on a test machine
with a RHEL3 U5 environment?  I would like to see if the problem is
still reproducible and (if so) evaluate the test source code.  Please
send me the name of the machine and access instructions, and hopefully
I'll be able to investigate this within the next few business days.

Thanks in advance.  -ernie


Comment 11 Ernie Petrides 2005-06-15 03:00:58 UTC
Also, is it okay if I make this bug public?

Comment 12 Martin Jenner 2005-06-15 13:28:27 UTC
Ernie this is still reproducable; I am not sure about about making this public.


I have setup system edge1.lab to reproduce the hang

$ ssh vsx0  (NOTE: login directly as the user do not su - vsx0)
$ run_tests

The hang will reproduce within 2 minutes.

fyi; I created a trimed down LSB-1.3 test input scen.exec file (to speed up
reproduction of this bug) as follows;

$ ssh vsx0
$ mv scen.exec scen.exec.save
$ echo all > scen.exec
$ grep mfiles scen.exec.save >> scen.exec

If you login as root in another window and kill the hanging T.mmap_P processes
the remaining test will finnish in about 2 minutes.




Comment 13 Ernie Petrides 2005-06-22 22:59:13 UTC
It seems that edge1.lab.boston.redhat.com is being identified as
192.168.76.147, which is not the IP address currently being used
by the machine.  What gives?

Comment 14 Ernie Petrides 2005-06-23 19:13:58 UTC
Martin, I've worked around the DNS issue by putting this in my /etc/hosts:

    192.168.78.50   edge1.lab.boston.redhat.com     edge1

I've been able to reproduce the test hang by using the scen.exec version
that you created (thanks!).  So now I need to locate the source code for
test (and be able to rebuild the test executable with various debugging
code).  Where can I find the source?

The executable resides in /home/tet/test_sets/TESTROOT/tset/LSB.os/mfiles/mmap_P.


Comment 15 Martin Jenner 2005-06-24 13:24:29 UTC
The source can be found at:

  http://ftp.freestandards.org/pub/lsb/test_suites/released-1.3.0/source/runtime/

Comment 16 Ernie Petrides 2005-10-06 00:05:21 UTC
Please see bug 106330 comment #3 for an explanation of the bug
in the LSB test source code.  Closing as dup of a NOTABUG.

*** This bug has been marked as a duplicate of 106330 ***