Bug 434939 - RHEL5: ibm-morrison timing issues due to bios & powernow-k8 driver incompatibility
RHEL5: ibm-morrison timing issues due to bios & powernow-k8 driver incompatib...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
low Severity low
: rc
: ---
Assigned To: Ed Pollard
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-26 08:35 EST by Prarit Bhargava
Modified: 2013-08-05 20:03 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-04-23 15:51:35 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Prarit Bhargava 2008-02-26 08:35:08 EST
Description of problem: ibm-morrison is brought up in a severly crippled state
because of a broken _PSS table in the IBM bios.


Version-Release number of selected component (if applicable): 2.6.18-79


How reproducible: 100%


Steps to Reproduce:
1. Boot -- dmesg shows many softlockup warnings
  
Actual results:

BUG: soft lockup - CPU#3 stuck for 10s! [ip:1706]
CPU 3:
Modules linked in: dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec butt
on battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i2c_amd756 i2c_
core shpchp k8temp hwmon pcspkr amd_rng serio_raw k8_edac edac_mc tg3 mptspi mpt
scsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci
_hcd
Pid: 1706, comm: ip Not tainted 2.6.18-79.el5dz #24
RIP: 0010:[<ffffffff8000c5f5>]  [<ffffffff8000c5f5>] __delay+0x6/0x10
RSP: 0018:ffff81010f1b5ca0  EFLAGS: 00000293
RAX: 0000000000999146 RBX: ffff81010dca2500 RCX: 000000003fbdb35a
RDX: 000000000000006f RSI: ffffc2000008044c RDI: 0000000001468ae9
RBP: 000000002dcd5b0b R08: ffff81010dca2000 R09: 000000000000a000
R10: 0000000000000009 R11: 0000000000000297 R12: ffff81010dca2000
R13: 000000000000a000 R14: 0000000000000009 R15: 0000000000000297
FS:  00002aaaaaabd7f0(0000) GS:ffff810103e7f640(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003396ccab60 CR3: 000000010f5b4000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff8811fc68>] :tg3:tg3_writephy+0x37/0xe8
 [<ffffffff8812054d>] :tg3:tg3_phy_reset+0x366/0x73d
 [<ffffffff88120f33>] :tg3:tg3_write_mem+0xc6/0xe2
 [<ffffffff88123d7c>] :tg3:tg3_reset_hw+0x4d/0x15a6
 [<ffffffff8000c5f3>] __delay+0x4/0x10
 [<ffffffff8812bbc3>] :tg3:tg3_open+0x2cd/0x5e0
 [<ffffffff802159f4>] dev_open+0x2f/0x6e
 [<ffffffff802141cb>] dev_change_flags+0x5a/0x11a
 [<ffffffff80246f46>] devinet_ioctl+0x235/0x59c
 [<ffffffff8020c7b2>] sock_ioctl+0x1c1/0x1e5
 [<ffffffff800419e7>] do_ioctl+0x21/0x6b
 [<ffffffff8002ff0d>] vfs_ioctl+0x248/0x261
 [<ffffffff8004bd8a>] sys_ioctl+0x59/0x78
 [<ffffffff8005d116>] system_call+0x7e/0x83

BUG: soft lockup - CPU#3 stuck for 10s! [ip:1706]

etc.

Expected results:  System should boot normally.


Additional info:

The issue here is that 3/4 cpus on ibm-morrison have no _PSS table.  The last
cpu has a bogus _PSS table, which is read in by the powernow-k8 driver.  This
results in a system in which one cpu is able to switch to bogus power states and
timers, etc., break.

This has been broken for sometime (since at least 5.1), and is NOT A BLOCKER.
Comment 1 Ed Pollard 2008-02-26 10:50:53 EST
I'm researching new BIOS (though this one isn't really that old, was updated
middle of last year). I'm wondering if maybe it's not BIOS but eprhaps the CPU
itself that is a problem? I will look into it.
Comment 2 Ed Pollard 2008-03-13 12:27:53 EDT
there is a new BIOS for this system that might look promising.

the new version 1.27 has the following fix that I don't know if will help or not

Problem Fixed: Processor performance throttling
    * Added fix in ASL code for properly mapping operating region within the
correct EBDA offset. Operating system could not properly initialize ACPI P-State
tables to support processor performance throttling.

unfortunately the blade is now non-functional so I will have to look into why
that is. Once it is up and running again I will apply the new BIOS and we can
hopefully re-test.
Comment 3 Ed Pollard 2008-03-14 11:27:45 EDT
ibm-morrison is back functional now though it will need to be re-enabled in rhts
before you can reserve it to test again. 

I did upgrade the firmware to the 1.27 level that was released in January
(though I think it has a November date stamp on it).
Comment 4 Ed Pollard 2008-03-31 09:30:33 EDT
Did anyone have a chance to try this again with the new firmware? 
Comment 5 Jeff Burke 2008-03-31 09:43:46 EDT
Ed,
   This system is in status "Unavailable" in RHTS. It is impossible to schedule
tests on it. Reason: "Test was aborted by Scheduler power-reset failed"

   You will need to talk with engineering operations. It looks like it was
enable for a couple of day but then was disabled again.
http://tinyurl.com/33r6ox

Jeff
Comment 6 Ed Pollard 2008-06-23 10:36:30 EDT
IBM-morrison has some issues that I have yet to get resolved. RHTS seems to be
the only thing that will reliably break it.  My apologies for the slow response
time. 
Can you reproduce the problem on ibm-morrison2 it is available in rhts and
shoudl have the same hardware available.
Comment 9 Steve Best 2012-04-23 15:51:35 EDT
closing this out. ibm-morrison has been shipped back to ibm awhile back.

Note You need to log in before you can comment on or make changes to this bug.