Bug 488471 - Problem with drive status leds after update to 2.6.18-128.el5
Problem with drive status leds after update to 2.6.18-128.el5
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
urgent Severity medium
: rc
: ---
Assigned To: David Milburn
Red Hat Kernel QE team
: ZStream
Depends On:
Blocks: 500120
  Show dependency treegraph
 
Reported: 2009-03-04 09:52 EST by Tuomo Soini
Modified: 2009-09-02 04:05 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:05:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to debug ahci LED problem (2.51 KB, patch)
2009-03-06 11:39 EST, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg (2.18 KB, text/plain)
2009-03-06 16:16 EST, Tuomo Soini
no flags Details
Patch #2 to debug ahci LED problem (2.97 KB, patch)
2009-03-06 17:44 EST, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg using patch #2 (2.69 KB, text/plain)
2009-03-07 15:33 EST, Tuomo Soini
no flags Details
Patch #3 to debug ahci LED problem (4.74 KB, patch)
2009-03-10 14:48 EDT, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg using patch #3 (4.85 KB, text/plain)
2009-03-11 07:48 EDT, Tuomo Soini
no flags Details
Patch #4 to debug ahci LED problem (4.87 KB, patch)
2009-03-11 15:55 EDT, David Milburn
no flags Details | Diff
lspci -xxvvv output from ASUS RS160-E5 server (34.67 KB, text/plain)
2009-03-11 16:02 EDT, Tuomo Soini
no flags Details
Patch to disable Enclosure Management messaging for ICH9r (816 bytes, patch)
2009-03-12 09:36 EDT, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg using ich9r work-around patch (3.14 KB, text/plain)
2009-03-12 18:03 EDT, Tuomo Soini
no flags Details
Patch #5 to debug ahci LED problem (4.94 KB, patch)
2009-03-17 12:46 EDT, David Milburn
no flags Details | Diff
Patch to disable Enclosure Management messaging for ICH9r - AHCI mode only (1.95 KB, patch)
2009-03-17 12:47 EDT, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg using patch #5 (6.10 KB, text/plain)
2009-03-18 07:57 EDT, Tuomo Soini
no flags Details
Patch #6 to debug ahci LED problem (6.16 KB, patch)
2009-03-18 18:31 EDT, David Milburn
no flags Details | Diff
ata/AHCI related lines from dmesg using patch #6 (7.30 KB, text/plain)
2009-03-19 06:50 EDT, Tuomo Soini
no flags Details
dmesg from boot with patch #6 (45.51 KB, text/plain)
2009-03-19 09:50 EDT, Tuomo Soini
no flags Details
Patch #7 to debug ahci LED problem (6.82 KB, patch)
2009-03-19 12:10 EDT, David Milburn
no flags Details | Diff
dmesg from boot with patch #7 (85.33 KB, text/plain)
2009-03-19 16:42 EDT, Tuomo Soini
no flags Details
Patch to delay to wait for transmit bit to clear for ICH9r - AHCI mode only (3.59 KB, patch)
2009-03-23 18:25 EDT, David Milburn
no flags Details | Diff
Dmesg using ich9r workaround patch (40.90 KB, text/plain)
2009-03-24 16:35 EDT, Tuomo Soini
no flags Details
ICH10 SATA lspci -xxvvv (1.46 KB, text/plain)
2009-03-26 08:47 EDT, Gregory Fowler
no flags Details
Patch to delay to wait for transmit bit to clear for ICH9r and ICH10 - AHCI mode only (3.64 KB, patch)
2009-03-26 10:03 EDT, David Milburn
no flags Details | Diff
dmesg from patch 8 (28.52 KB, text/plain)
2009-03-26 16:04 EDT, Gregory Fowler
no flags Details
Real dmesg for patch 8 (3.45 KB, text/plain)
2009-03-26 18:13 EDT, Gregory Fowler
no flags Details
Patch to delay to wait for transmit bit to clear when in AHCI mode. (3.52 KB, patch)
2009-04-01 10:26 EDT, David Milburn
no flags Details | Diff
Patch to delay to wait for transmit bit to clear regardless of device id. (1.42 KB, patch)
2009-04-02 19:01 EDT, David Milburn
no flags Details | Diff

  None (edit)
Description Tuomo Soini 2009-03-04 09:52:16 EST
Description of problem:

When I upgraded ASUS RS160-E5 server with 4 1.5TB SATA disks running in AHCI mode to 2.6.18-128.1.1.el5 kernel drives 2 to 4 start to blink drive failure led.
Boot to 2.6.18-92.1.22.el5 fixes problem.

Version-Release number of selected component (if applicable):

Kernel 2.6.18-128.el5 and 2.6.18-128.1.1.el5

How reproducible:

Always

Steps to Reproduce:
1. Boot machine to new kernel, lights for drive 2 to 4 show failure.
2. switched drives to different slots, nothing changed. Always drives 2 to 4 show failure.
  
Actual results:
According smartctl drives are perfecty ok and everything seem to work as expected. First drive status led show green as all should.

Expected results:
All drive status leds should show green when everything should be ok.

Additional info:
ASUS RS160-E5 latest bios, disks ST3150341AS with latest bios SD1B. ICH9R in AHCI mode. Drive activity leds seem to be working all the time.
Comment 1 Tuomo Soini 2009-03-04 11:09:25 EST
http://lkml.org/lkml/2008/12/22/12

Upstream patch is broken. So this is regression in 5.3 caused by broken upstream patch which got included.

Upstream bug:

http://bugzilla.kernel.org/show_bug.cgi?id=12266
Comment 2 David Milburn 2009-03-04 19:18:51 EST
Hi,

I am building a test kernel to look more closely at ahci_sw_activity_blink which is the function that puts together the led messages to transmit to the controller.

Looking at the ahci spec, it would seem the ahci driver is setting Bits 8:6
when setting the activity LED.

The LED bit locations are:
       Bits 2:0 – Activity LED (may be driven by hardware)
       Bits 5:3 – Vendor Specific LED (e.g. locate)
       Bits 8:6 – Vendor Specific LED (e.g. fault)
       Bits 15:9 – Reserved

I will post a link to download the test kernel soon.
Comment 3 Tuomo Soini 2009-03-05 01:55:55 EST
I think I didn't say it clearly enough but fault led doesn't burn for sda. It only burns for sdb, sdc, sdd. That might be wy Intel verified patch and told leds work ok. If they only had one disk on their test system it did look like working ok.
Comment 4 David Milburn 2009-03-05 11:59:11 EST
Would you please boot kernel-2.6.18-131.el5.bz488471.1 and attach the output of
"dmesg -c"? I have added quite a bit of debug to ahci_sw_activity_blink, so
you may want to boot with the "log_buf_len=1000000" kernel parameter or hook
up a serial console, so we can capture all the messages from boot especially
when the ahci driver initializes and see how it it getting into this state.
Please let me know if the debug kernel generates too much output and I will tone
it down. Thank you.

 
http://people.redhat.com/dmilburn/
Comment 5 Tuomo Soini 2009-03-06 02:40:07 EST
I'm running xen on that machine. Can you just provide patch to try?
Comment 6 David Milburn 2009-03-06 11:39:04 EST
Created attachment 334318 [details]
Patch to debug ahci LED problem
Comment 7 David Milburn 2009-03-06 11:41:34 EST
Sure, here you go, this patch adds debug to the blinking function, if you can
capture the output after the driver loads we should be able to see the led
message that the driver is sending down to the controller.
Comment 8 Tuomo Soini 2009-03-06 16:16:20 EST
Created attachment 334351 [details]
ata/AHCI related lines from dmesg
Comment 9 David Milburn 2009-03-06 16:29:11 EST
em_ctl 117506304 = 7010100, which means the Transmit bit is set, so the driver
thinks the controller is still busy transmitting a previous message, if you
look a little farther where there any messages from "AHCI_SW_ACTIVITY_BLINK"?
Comment 10 David Milburn 2009-03-06 16:39:16 EST
Actually, it looks like bit 26 is set which means the driver doesn't use ahci_sw_activity_blink, let me look over the driver some more.
Comment 11 Tuomo Soini 2009-03-06 16:50:52 EST
No other AHCI related lines on dmesg.

Symptoms are still same. sda is ok, sdb to sdd have error led blinking.
Comment 12 David Milburn 2009-03-06 17:44:51 EST
Created attachment 334355 [details]
Patch #2 to debug ahci LED problem
Comment 13 David Milburn 2009-03-06 17:47:01 EST
I have added a couple of printks to ahci_start_port(), please re-test and 
see if they show up in dmesg.
Comment 14 Tuomo Soini 2009-03-07 15:33:19 EST
Created attachment 334415 [details]
ata/AHCI related lines from dmesg using patch #2
Comment 15 David Milburn 2009-03-10 14:48:06 EDT
Created attachment 334694 [details]
Patch #3 to debug ahci LED problem
Comment 16 David Milburn 2009-03-10 14:57:34 EDT
According to the dmesg output in Comment#14, after the first message is
transmitted it appears that Bit 8 (Transmit Message) of the Enclosure Management
Control register never clears. Once a message has been sent, the HBA should
be clearing this bit, but we are seeing subsequent transmits finding this bit
set so ahci_transmit_led_message is returning -EINVAL. I have added a few more
printks to determine if the driver may not be putting the LED message in the
real transmit buffer. Please retest and attach your dmesg output, thanks.
Comment 17 Tuomo Soini 2009-03-11 07:48:26 EDT
Created attachment 334785 [details]
ata/AHCI related lines from dmesg using patch #3
Comment 18 David Milburn 2009-03-11 15:50:24 EDT
I thought the driver maybe writing the enclosure messages outside the 
enclosure management buffer, but that doesn't seem to be the case

ABAR[EM_LOC.OFST*4]                      buffer starts here
ABAR[(EM_LOC.OFST*4) + EM_LOC.SIZE]      buffer ends here

Driver starts up

AHCI_INIT_ONE: em_ctl 0x7010000 em_loc 0x1000002

em_loc was read from global control register offset 0x1c, bits 15:0 specifies
the size of the transmit message buffer area in dwords, in this case 2 dwords.
Bits 31:16 specifies the offset of the message buffer in dwords from the beginning of ABAR, in this case 0x100.

So, a message really consists of 2 dwords, the first one is the message header,
the driver sets bits 15:8 of the message header to indicate the size of the
following message which is always 4 bytes.

The driver seems to be writing to the correct location of the enclosure buffer

        /* write message to EM_LOC */
        writel(message[0], mmio + hpriv->em_loc);
        writel(message[1], mmio + hpriv->em_loc+4);

Maybe the writes of the tranmit are getting out-of-order, I haved added a
readl after the above code to flush those writes, I will attach another patch.
Comment 19 David Milburn 2009-03-11 15:55:12 EDT
Created attachment 334854 [details]
Patch #4 to debug ahci LED problem

Same as patch #3 except added a readl in ahci_transmit_led_message.
Comment 20 David Milburn 2009-03-11 15:57:07 EDT
Would you please test the patch in Comment#19, and attach the output of
"lspci -xxvvv"?
Comment 21 Tuomo Soini 2009-03-11 16:02:34 EDT
Created attachment 334856 [details]
lspci -xxvvv output from ASUS RS160-E5 server
Comment 22 Tuomo Soini 2009-03-11 16:23:06 EDT
I'll test new patch asap.

Btw. At beginning of boot lights are ok. But when text Red Hat nash is shown on console, sdb to sdd lights change to drive error state.
Comment 23 Tuomo Soini 2009-03-12 02:57:17 EDT
I did testing with patch4. There were only one line change to previous ahci messages.

-ata3: SATA max UDMA/133 irq_stat 0x00400040, connection status changed
+ata3: SATA max UDMA/133 irq 20, connection status changed

lspci -xxvvv output didn't change.
Comment 24 David Milburn 2009-03-12 06:37:14 EDT
Did the drive error LEDs still blink for sdb to sdd with patch4?
Comment 25 Tuomo Soini 2009-03-12 07:08:26 EDT
Yes, drive error LEDs are still blinking with patch4.
Comment 26 David Milburn 2009-03-12 09:36:41 EDT
Created attachment 334924 [details]
Patch to disable Enclosure Management messaging for ICH9r
Comment 27 David Milburn 2009-03-12 09:40:18 EDT
I think all the upstream reports of this problem are using ICH9r, please test
this patch and make sure your LEDs function normally, and please post the
dmesg output showing the message stating that messaging has been disabled. I will
ask Intel if they are aware of any problems with this specfic controller. Thanks
for all your testing.
Comment 28 Tuomo Soini 2009-03-12 18:03:26 EDT
Created attachment 335014 [details]
ata/AHCI related lines from dmesg using ich9r work-around patch
Comment 29 Tuomo Soini 2009-03-12 18:03:58 EDT
And yes, drive leds seem to work normally with patch.
Comment 30 David Milburn 2009-03-17 12:46:10 EDT
Created attachment 335554 [details]
Patch #5 to debug ahci LED problem
Comment 31 David Milburn 2009-03-17 12:47:09 EDT
Created attachment 335555 [details]
Patch to disable Enclosure Management messaging for ICH9r - AHCI mode only
Comment 32 David Milburn 2009-03-17 12:54:44 EDT
Would you please try patch #5 and supply the dmesg output?

Also if after applying patch #5 (Comment #30) you end up in the same state
with drive error LEDs blinking, would you see if you can manually turn them
off? 

# echo 0 > /sys/class/scsi_host/host1/em_message
# echo 0 > /sys/class/scsi_host/host2/em_message
# echo 0 > /sys/class/scsi_host/host3/em_message
# echo 0 > /sys/class/scsi_host/host4/em_message
# echo 0 > /sys/class/scsi_host/host5/em_message

Finally, please test with the patch in Comment #31, it should only disable
enclosure management if ICH9r is in AHCI mode. We will not be able to disable
it in all cases, it has been tested in RAID mode without problems with enclosure
management LEDs.
Comment 33 David Milburn 2009-03-17 12:57:17 EDT
Also, please supply dmesg output when testing with patch in Comment #31, I
expect that your LEDs will behave appropriately, but I would also like to
see the message where enclosure management is disabled while configured in
AHCI mode.
Comment 34 Tuomo Soini 2009-03-17 13:13:29 EDT
Will test. But previously you couldn't turn off because
/sys/class/scsi_host/*/em_message were read only. Did you fix them to have correct mode so setting can be changed?
Comment 35 David Milburn 2009-03-17 13:26:27 EDT
No, actually I didn't make any change for managing LEDs through /sys,
I am sure that we have been able use this method in previous releases. Though,
the error that you see when using the /sys method maybe a side-effect of the
problem, when changing values the driver should be using ahci_led_store which
calls ahci_transmit_led_message which I expect is failing in this case too, the
dmesg output will tell us for sure. Actually, why don't you try the /sys method
with both patches (Comment #30 and Comment #31).
Comment 36 David Milburn 2009-03-17 13:57:16 EDT
On second thought you can skip the /sys test after applying the patch in
Comment #31 since we are disabling EMS for the ICH9r, I will see if I can
try it on a configuration known to work.
Comment 37 Tuomo Soini 2009-03-18 07:57:55 EDT
Created attachment 335685 [details]
ata/AHCI related lines from dmesg using patch #5

I couldn't disable em_messages when drives were blinking error, because they all were already disabled. But when I enabled em_messages, drive status led went to correct state.

You can see relevant debug messages when I enabled em messages.

Only strange thing is that even when sda had em_messages set to 0 it's error state led didn't show error while all other devices were in error state.

I guess work-around patch is wrong approach.
Comment 38 David Milburn 2009-03-18 13:12:06 EDT
When you say you enabled em_messages, are you saying that you echo'd a value
into 

/sys/class/scsi_host/hostX/em_message

em_message is a 32-bit value and the LED state is defined by bits 16-24

bits 24-22 Fault LED
bits 21-19 Locate LED
bits 18-16 Activity LED

And withing those 3-bit ranges, 000b turns off and 001b turns on

For instance,

# echo 4194304 > /sys/class/scsi_host/hostX/em_message

That should turn on the Fault LED and echo'ing 0 should turn all LEDs off, so 
enclosure management is enabled in patch #5, but the em_message /sys attribute
is a message value not enable/disable.

So could you please reboot with patch #5, at this point drive #2 should be
blinking the Fault LED, and then just echo 0 to the corresponding em_message
and lets see if the error LED goes off and see what ahci_transmit_message
says in dmesg.

Also see if you can echo 524288 to the same scsi_host, this should turn on
the Locate LED, and see what ahci_transmit_message.

Thanks for testing.
Comment 39 Tuomo Soini 2009-03-18 15:07:28 EDT
echo 0 > /sys/class/scsi_host/host1/em_message

Turns error led off and gives normal state.

AHCI_TRANSMIT_LED_MESSAGE: state 0x0
AHCI_TRANSMIT_LED_MESSAGE: pmp 0x0
AHCI_TRANSMIT_LED_MESSAGE: em_ctl 0x7010000
AHCI_TRANSMIT_LED_MESSAGE: mmio 0x2e800
AHCI_TRANSMIT_LED_MESSAGE: hpriv->em_loc 0x400
AHCI_TRANSMIT_LED_MESSAGE: mmio + hpriv->em_loc+4 0x2ec04
AHCI_TRANSMIT_LED_MESSAGE: mmio + (hpriv->em_loc+4) 0x2ec04


echo 524288 > /sys/class/scsi_host/host1/em_message

Makes error led blink twice the error speed so I think it works.

AHCI_TRANSMIT_LED_MESSAGE: state 0x80000
AHCI_TRANSMIT_LED_MESSAGE: pmp 0x0
AHCI_TRANSMIT_LED_MESSAGE: em_ctl 0x7010000
AHCI_TRANSMIT_LED_MESSAGE: mmio 0x2e800
AHCI_TRANSMIT_LED_MESSAGE: hpriv->em_loc 0x400
AHCI_TRANSMIT_LED_MESSAGE: mmio + hpriv->em_loc+4 0x2ec04
AHCI_TRANSMIT_LED_MESSAGE: mmio + (hpriv->em_loc+4) 0x2ec04

echo 4194304 > /sys/class/scsi_host/host1/em_message

Leaves red led burning and other leds off.

AHCI_TRANSMIT_LED_MESSAGE: state 0x400000
AHCI_TRANSMIT_LED_MESSAGE: pmp 0x0
AHCI_TRANSMIT_LED_MESSAGE: em_ctl 0x7010000
AHCI_TRANSMIT_LED_MESSAGE: mmio 0x2e800
AHCI_TRANSMIT_LED_MESSAGE: hpriv->em_loc 0x400
AHCI_TRANSMIT_LED_MESSAGE: mmio + hpriv->em_loc+4 0x2ec04
AHCI_TRANSMIT_LED_MESSAGE: mmio + (hpriv->em_loc+4) 0x2ec04

But I think you find this interesting:

cd /sys/class/scsi_host/host0/
cat em_message
0
echo 0 >em_message
cat em_message
0
cd ../host1
echo 0 >em_message
cat em_message
1
cd ../host2
cat em_message
0
echo 0 >em_message
2
cd ../host3
cat em_message
0
(Error led is blinking)
echo 0 >em_message
(Error led turns off)
cat em_message
3

Initially all em_message's are 0
Comment 40 David Milburn 2009-03-18 18:31:02 EDT
Created attachment 335775 [details]
Patch #6 to debug ahci LED problem
Comment 41 David Milburn 2009-03-18 18:38:04 EDT
I have added an ahci_led_init function to turn off LEDs near the end of the
driver loading, this should be the same as "echo'ing 0". Please apply this
patch, boot and see if the error LED is off for drives 2 - 4.

Let's come back to the potential "cat em_message" problem, it maybe that is
common to all the controllers not just ich9r, we can investigate after we solve
this issue.
Comment 42 Tuomo Soini 2009-03-19 06:50:14 EDT
Created attachment 335831 [details]
ata/AHCI related lines from dmesg using patch #6

No change in behaviour. sdb-sdd were still in error state.
Comment 43 David Milburn 2009-03-19 08:35:39 EDT
Would you please attach the dmesg for the last boot up with patch #6?
Comment 44 Tuomo Soini 2009-03-19 09:50:29 EDT
Created attachment 335850 [details]
dmesg from boot with patch #6
Comment 45 David Milburn 2009-03-19 12:10:25 EDT
Created attachment 335883 [details]
Patch #7 to debug ahci LED problem
Comment 46 David Milburn 2009-03-19 12:17:46 EDT
This patch adds some delays to ahci_transmit_led_message if it finds the 
controller has not cleared the transmit bit, I expect this to work since
we know that it eventually clears since you can change the led values thru
/sys. This patch will also tells us in the dmesg output how long it took
to clear.

Also, I think I see why "cat ...em_message" is giving those values,
ahci_transmit_led_message sets 

emp->led_state = message[1];

So, now emp->led_state includes the port no, then ahci_led_show reads
emp->led_state, so instead of cat'ing and seeing zero, you see (zero | port_no).

I have changed the above to

emp->led_state = state;

Please boot up, hopefully drive error LEDs will be off for drives 2 - 4, but,
I also need to see the dmesg output so I can see what kind of delay is needed
for the transmit bit to clear. Thanks again for all your testing.
Comment 47 Tuomo Soini 2009-03-19 16:42:51 EDT
Created attachment 335931 [details]
dmesg from boot with patch #7

And with patch#7 leds finally are ok. But it felt like boot took longer.

And verified, echo 0 > em_message doesn't change it's value any more. Now it's always 0 after this.
Comment 48 David Milburn 2009-03-19 17:12:17 EDT
Ok, thanks, I am sure the delay and printks are slowing boot up.

Let me see if I can ask Intel about the delay allowing the transmit bit
to clear.
Comment 49 David Milburn 2009-03-23 18:25:21 EDT
Created attachment 336405 [details]
Patch to delay to wait for transmit bit to clear for ICH9r - AHCI mode only
Comment 50 David Milburn 2009-03-23 18:35:58 EDT
It turns out that the BIOS is resetting the enclosure management message logic,
this is the reason the transmit bit is busy preventing the driver from turning
the LEDs off. The patch in Comment #49 is a work-around for the ahci.c driver,
would please verify that your system boots with the LEDs in the proper state,
please note this doesn't have the fix for cat'ing the /sys/....em_message values,
I have to treat them as seperate patches since they are different issues.

For this test I have added a printk to show how many delays were needed, would
you please include that from the dmesg output?
Comment 51 David Milburn 2009-03-24 10:34:28 EDT
I intended for the udelay(10) to actually be udelay(100) in ahci_start_port(),
please let me know if you need an updated patch; otherwise, please make the
change after applying the patch in Comment #49. Thanks.
Comment 57 Tuomo Soini 2009-03-24 16:35:22 EDT
Created attachment 336535 [details]
Dmesg using ich9r workaround patch

Tested patch was modified with s/10/100/ change.
Comment 58 Tuomo Soini 2009-03-24 16:37:52 EDT
Leds worked correctly with this patch.
Comment 59 Gregory Fowler 2009-03-25 19:17:43 EDT
Hi
i grabbed patch 7 from the Bugzilla and this seems to fix the issue.
The addition of the new function ahci_led_init and the call to it from the port init fuction seem to clear up the issue for us.

We run on ICH10 and had the same issue with port 0 LEds OK and port 1-3 FAULT and LOCATE LEDS on.

We were installing  sgpio  package 
and using it to turn the lights off.

[root@ban25c13bl8 ~]# yum install sgpio
[root@ban25c13bl8 ~]# sgpio -p 1 -s off
[root@ban25c13bl8 ~]# sgpio -p 2 -s off
[root@ban25c13bl8 ~]# sgpio -p 3 -s off


I have added myself to CC list.  
Please let me know if I can be of help with debug on ICH10 controller.
Comment 60 David Milburn 2009-03-26 08:40:48 EDT
Gregory,

Would please attach your "lspci -xxvvv"? I need to add your device ID to the
official patch which will be more like Comment #49? Thanks.
Comment 61 Gregory Fowler 2009-03-26 08:47:11 EDT
Created attachment 336790 [details]
ICH10  SATA lspci -xxvvv
Comment 62 Gregory Fowler 2009-03-26 08:51:00 EDT
Hi
Unreleased system and this is an open bugzilla.
For more info on my particular config please see private issue tracker
https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=273228

There you can get complete lspci output and anything else you require.
Comment 63 Gregory Fowler 2009-03-26 08:55:00 EDT
I tried patch from comment 49 and it did not work for me.

Let me re-patch and reboot and report back just to be sure, since you say the
official patch will look like that.

R u dropping the ahci_led_init function and it's call?
Comment 64 David Milburn 2009-03-26 10:03:34 EDT
Created attachment 336802 [details]
Patch to delay to wait for transmit bit to clear for ICH9r and ICH10 - AHCI mode only
Comment 65 David Milburn 2009-03-26 10:09:58 EDT
Hi, the patch in Comment #49 was a work-around for the ICH9r in ahci mode,
I have added your device ID to ahci_ems_bios_workaround(), the patch in Comment
#64 should work on your platform. I still have 1 printk to see in ahci_start_port
to see how many retries it takes for the ems transmit bit to clear, the driver
is already trying to turn off the LEDs so we don't need ahci_led_init.

Please attach your dmesg output after applying the patch and rebooting and
hopefully your LEDs will be in a good state.
Comment 66 Gregory Fowler 2009-03-26 16:04:06 EDT
Created attachment 336882 [details]
dmesg from patch 8
Comment 67 David Milburn 2009-03-26 16:22:07 EDT
Hi,

I think you must still be using older code, I don't have any of the printks
in the patch in Comment #64, would you please double-check?
Comment 68 Gregory Fowler 2009-03-26 18:13:22 EDT
Created attachment 336921 [details]
Real dmesg  for patch 8

ooops>  :-( 

Here is the real deal.
Comment 69 David Milburn 2009-03-26 18:24:42 EDT
Thanks, so the LEDs were ok when you booted, right?
Comment 70 Gregory Fowler 2009-03-26 18:42:51 EDT
Yes the LEDs were OK when booted.

Thanks
Comment 73 Ed Ciechanowski 2009-03-31 19:26:19 EDT
Tested on RHEL 5.3 x86_64 on a ICH10R platform. 

Tested Patch in Comment #64. Patch applied and complied no problems. Upon reboot the LED's are off. SGPIO application can still controll all LED's on all ports.

Can we make this patch not Specific to ICH9R and ICH10R?


Our next platform will have the same issues, I believe.? (testing now)
Thanks for the fix!
Comment 74 Ed Ciechanowski 2009-03-31 19:44:54 EDT
Tested on RHEL 5.3 x86_64 on a ICH10R platform. 

Applied RPM from the David Milburn's site. This RPM applied correctly and on next reboot LED are in the off state. I am able to controll the LED's with SGPIO application in the OS. 

This is a very nice RPM that creates another boot menu option! Thanks again!

*EDC*
Comment 75 David Milburn 2009-04-01 10:26:14 EDT
Created attachment 337538 [details]
Patch to delay to wait for transmit bit to clear when in AHCI mode.

Ed, thanks for testing.

I have modified the patch to look for device IDs greater than 0x2821 since
ICH8 supports enclosure management, this should should handle all cases
when the controller is configured in AHCI mode.

I am currently building test rpms, I will make them available as soon as they
are ready, we will need to test one more time.
Comment 76 David Milburn 2009-04-01 14:29:24 EDT
Ed, Gregory,

If someone could please test kernel-2.6.18-137.el5.bz488471.2

http://people.redhat.com/dmilburn/

Once we have confirmed test results I will send the patch upstream. Thanks.
Comment 77 Gregory Fowler 2009-04-01 15:32:28 EDT
Hi David
I have just booted your kernel and the Drive LED status is good.
No Fault or Locate LEDs were lit.

Looks good.

Thanks
Comment 78 James Puthukattukaran 2009-04-02 12:07:12 EDT
David,

I was just perusing the patch you proposed and, as Greg mentioned, it works for us. I just wanted to make a couple of comments

1. In the following code section 

+		pci_read_config_byte(pdev, SCC_REG, &tmp);
+		
+		if (tmp & AHCI_MODE)
+			return 1;
+		else 
+			return 0;
+	}
+	

Should the test be more strict? According to the Serial ATA AHCI 1.3 spec (http://download.intel.com/technology/serialata/pdf/rev1_3.pdf), page 8 (section 2.1.5), it says that you should also check the PI byte (offset 0xb) and check if that is 1 to be sure you are in AHCI mode. i.e

                pci_read_config_byte(pdev, SCC_REG, &tmp);
                pci_read_config_byte(pdev, PI_REG, &tmp2);
                if ((tmp & AHCI_MODE) && (tmp2 & 0x1))
                      return 1;
                else
                      return 0;

2. Since you have defined SCC_REG, should you use that in ahci_print_info for the following line in that function?

                    pci_read_config_word(pdev, 0x0a, &cc);
Use SCC_REG instead of 0x0a.

thanks
James
Comment 79 David Milburn 2009-04-02 12:30:51 EDT
James,

I think we may change the patch to not use EM when the controller is configured
in AHCI mode, I will look over the document again and see if we need the
stricter check.

Thanks,
David
Comment 80 David Milburn 2009-04-02 19:01:19 EDT
Created attachment 337951 [details]
Patch to delay to wait for transmit bit to clear regardless of device id.
Comment 81 David Milburn 2009-04-02 19:18:59 EDT
Ed,

I discussed furthur with Kristen, this patch will be much simplier not keying
off any particular Device ID, and in most cases the transmit will succeed
first time. I am currently building test rpms with the patch in Comment #80,
feel free to test the patch in the mean time.

James, I looked over the doc and the ICH9 docs, actually the PI (Programming
Interface Register) would be at offset = 0x9. And you are right it is
programmed differently based upon what value is in the Sub Class Code Reg (0xa).
And if the Sub Class Code Reg is 0x6 then the PI register would be 1 to indicate
the AHCI revision. I think checking the SCC_REG would have been sufficient,
but, hopefully the simplified patch in Comment #80 will be accepted upstream.

Thanks,
David
Comment 82 David Milburn 2009-04-03 09:34:17 EDT
Ed, Gregory,

Please test kernel-2.6.18-137.el5.bz488471.3, once I get positive confirmation
from both of you I will submit the patch in Comment #80 upstream.

http://people.redhat.com/dmilburn/

Thanks,
David
Comment 83 Ed Ciechanowski 2009-04-03 13:10:11 EDT
Tested on RHEL 5.3 x86_64 on a ICH10R platform. 

Applied RPM kernel-2.6.18-137.el5.bz488471.3 from the David Milburn's site. 

This RPM applied correctly and on next reboot LED are in the off state. I am able to controll the LED's with SGPIO application in the OS. 

I believe this patch/RPM is Good to go upstream! 

Thank you for all the work on this!

I think not keying off any particular Device ID is the perfect solution.  Thanks again!
*EDC*

I will continue to regression test platforms with different Chipsets (ICH8R,9R,10R, ibexpeak) and in RAID mode too. This will take a week or 2 as resources are slim right now.
Comment 84 Ed Ciechanowski 2009-04-03 13:45:38 EDT
Also, Verified ICH10R platform in RAID mode works correctly. All LED are in the off state on OS boot and can be controller from with in the OS via FS and SGPIO application. *EDC*
Comment 85 Gregory Fowler 2009-04-03 19:16:15 EDT
Hello
I have verified your latest test kernel.
Everything looks good on our system.

Thanks
Comment 87 RHEL Product and Program Management 2009-04-07 14:59:08 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 88 Don Zickus 2009-04-20 13:12:07 EDT
in kernel-2.6.18-140.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 93 Evan McNabb 2009-06-26 11:23:48 EDT
Intel, Sun,

Can you please test this on one of our 5.4 builds?

http://people.redhat.com/dzickus/el5/
Comment 94 Chris Ward 2009-07-03 14:26:17 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 95 Gregory Fowler 2009-07-06 17:58:59 EDT
(In reply to comment #93)
> Intel, Sun,
> 
> Can you please test this on one of our 5.4 builds?
> 
> http://people.redhat.com/dzickus/el5/  

I have tested 5.4 beta .
It's all good.

Verified fixed.
Comment 97 errata-xmlrpc 2009-09-02 04:05:38 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.