Bug 179157

Summary: LTC21039-FC5T2 oopses on 1 out of 3 boots on 64-way Squadrons H
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: David Woodhouse <dwmw2>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: davej, jonstanley, sundaram, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: powerpc   
OS: Linux   
Whiteboard: MassClosed
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-20 04:40:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2006-01-27 20:29:00 UTC
LTC Originator is: gcwilson.com

Problem description:  Installed FC5T2 on a 64-way to do performance testing. 
Approximately 1 out of 3 bootups failed with an oops.

If this is not an installation problem,
       Describe any custom patches installed.  None.

       Provide output from "uname -a", if possible:  Not possible; disk is still
around if we can get time on the machine.  But it is the FC5T2 kernel and
userspace; nothing unusual.


Hardware Environment
    Machine type (p650, x235, SF2, etc.): 64-way Squadrons H
    Cpu type (Power4, Power5, IA-64, etc.): Power5
    Describe any special hardware you think might be relevant to this problem: 
64-way, 256G RAM


Please provide access information for the machine if it is available.  This ws
on sqh.ltc.austin.ibm.com.  Others are using the machine now.  I have asked to
be notified if slack time becomes available.

Is this reproducible?
    If so, how long does it (did it) take to reproduce it?  Just boot the system
enough times.  It should take < 20 min.
    Describe the steps:  Install FC5T2; reboot several times.

    If not, describe how the bug was encountered:


Is the system (not just the application) hung?  Yes.
    If so, describe how you determined this:  Kernel oops.


Did the system produce an OOPS message on the console?
    If so, copy it here:

Here's the console log:

  Starting udev:[FAILED]
  Unable to handle kernel paging request for data at address 0x00000004
  Faulting instruction address: 0xc0000000001881a0
  cpu 0x6f: Vector: 300 (Data Access) at [c000001edfe6b860]
      pc: c0000000001881a0: ._raw_spin_lock+0x30/0x17c
      lr: c0000000003611ec: ._spin_lock+0x10/0x24
      sp: c000001edfe6bae0
     msr: 8000000000009032
     dar: 4
   dsisr: 40000000
    current = 0xc00000000343e000
    paca    = 0xc000000000481400
      pid   = 497, comm = events/111
  enter ? for help
  6f:mon>

And xmon exception and backtrace info:

  6f:mon> e
  cpu 0x6f: Vector: 300 (Data Access) at [c000001edfe6b860]
      pc: c0000000001881a0: ._raw_spin_lock+0x30/0x17c
      lr: c0000000003611ec: ._spin_lock+0x10/0x24
      sp: c000001edfe6bae0
     msr: 8000000000009032
     dar: 4
   dsisr: 40000000
    current = 0xc00000000343e000
    paca    = 0xc000000000481400
      pid   = 497, comm = events/111
  6f:mon> t
  [c000001edfe6bb80] c0000000003611ec ._spin_lock+0x10/0x24
  [c000001edfe6bc00] c00000000035e134 .klist_del+0x28/0x58
  [c000001edfe6bc90] c000000000225a94 .device_del+0x44/0x110
  [c000001edfe6bd20] d00000000008572c .scsi_target_reap_work+0xd4/0x128 [scsi_mod]
  [c000001edfe6bdc0] c000000000075a24 .worker_thread+0x204/0x2c8
  [c000001edfe6bee0] c00000000007bc7c .kthread+0x128/0x178
  [c000001edfe6bf90] c00000000002deac .kernel_thread+0x4c/0x68
  6f:mon>

Is the system sitting in a debugger right now? No.
    If so, how long may it stay there?

Comment 1 Rahul Sundaram 2006-02-20 11:06:18 UTC

These bugs are being closed since a large number of updates have been released
after the FC5 test1 and test2 releases. Kindly update your system by running yum
update as root user or try out the third and final test version of FC5 being
released in a short while and verify if the bugs are still present on the system
.Reopen or file new bug reports as appropriate after confirming the presence of
this issue. Thanks

Comment 2 Paul Nasrat 2006-02-28 22:32:03 UTC
Manoj, can you investigate this?

Comment 3 IBM Bug Proxy 2006-06-23 15:57:02 UTC
----- Additional Comments From gcwilson.com  2006-06-23 12:01 EDT -------
I installed the H with FC5 GA this time.  I have seen it drop into xmon once.  I
failed get a backtrace from it.  Will the next time it happens.  I don't know if
this happens with the same frequency, or if it is actually the same problem. 
Now the machine is hanging on boot starting udev.  I am going to let it sit an
hour or so--sometimes udev takes a _long_ time to start (which may be a problem
in itself).  If I can't get it to drop into xmon, I'll open separate bug on the
udev problem.  Neither of these issues is helping me get SELinux/MLS performance
tests run. 

Comment 4 IBM Bug Proxy 2006-06-25 22:01:27 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |REJECTED
         Resolution|                            |DUPLICATE




------- Additional Comments From gcwilson.com  2006-06-25 18:06 EDT -------
Once I worked around the start_udev variant of LTC23422/RH178229, the H boots
FC5 GA and lspp.38 test kernels reliably.  I don't have enough evidence to say
with complete certainty that this bug is really a manifestation of bug 23422. 
But I strongly suspect it is.  Closing as a duplicate of LTC23422/RH178229.

*** This bug has been marked as a duplicate of 23422 *** 

Comment 5 Dave Jones 2006-10-16 18:45:11 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 6 Jon Stanley 2008-01-20 04:40:26 UTC
(this is a mass-close to kernel bugs in NEEDINFO state)

As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.

If you believe that this bug was closed in error, please feel free to reopen
this bug.