Bug 182687 - lm_sensors fails with piix4_smbus errors on ServerWorks Grand Champion SL/w83781d
lm_sensors fails with piix4_smbus errors on ServerWorks Grand Champion SL/w83...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: David Milburn
Martin Jenner
:
Depends On:
Blocks: 461304
  Show dependency treegraph
 
Reported: 2006-02-23 20:26 EST by Mark T. Voelker
Modified: 2009-06-18 10:48 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-18 15:29:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Dmesg from kernel-2.6.9-42.7.EL.TEST.bz182687. (73.74 KB, text/plain)
2008-04-23 14:56 EDT, Mark T. Voelker
no flags Details
dmesg from kernel 2.6.9-42.7.EL.TEST.bz182687.2 (133.11 KB, text/plain)
2008-04-25 08:04 EDT, Mark T. Voelker
no flags Details
Requested output of lspci -nnv (2.36 KB, text/plain)
2008-05-04 21:35 EDT, Mark T. Voelker
no flags Details
Untested patch (1.21 KB, patch)
2008-05-05 12:43 EDT, David Milburn
no flags Details | Diff
Improved patch (1.46 KB, patch)
2008-05-05 17:34 EDT, Jean Delvare
no flags Details | Diff
Inc delay for csb5 and remove fix_hstcfg (2.77 KB, patch)
2008-05-06 11:42 EDT, David Milburn
no flags Details | Diff

  None (edit)
Description Mark T. Voelker 2006-02-23 20:26:48 EST
Description of problem:
I recently inherited a machine with a Tyan Trinity GC-SL (s2707) motherboard,
which uses Broadcom's ServerWorks Grand Champion SL chipset and a Winbond
W83782D monitoring ASIC (motherboard specs at:
http://www.tyan.com/products/html/trinitygcsl.html).  I ran sensors-detect, set
up modprobe.conf, and inserted the appropriate drivers.  However when I run
"sensors -s", I get:

[bash ~]# sensors -s
No sensors found!
[bash ~]#

I also tried using a sensors.conf file found on Tyan's support site (which
indicates me that lm_sensors must have worked with this board at some point). 
It can be found here:

http://tyan.com/support/html/software_utilities.html

It was meant for lm_sensors v2.7.0 but I figured it was worth a shot. 
Unfortunately I got the same results.  

When I manually inserted the drivers, I noticed the following errors in the syslog:

Feb 23 19:29:10 mybox kernel: mtrr: type mismatch for fc000000,800000 old:
uncachable new: write-combining
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Unusual config register
value
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Unusual config register
value
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1
if you experience problems
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1
if you experience problems
Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Illegal Interrupt
configuration (or code out of date)!
Feb 23 19:56:05 mybox: piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration
(or code out of date)!
Feb 23 19:56:14 mybox kernel: i2c /dev entries driver
Feb 23 19:56:14 mybox kernel: i2c /dev entries driver
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: SMBus Timeout!
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: SMBus Timeout!
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed reset at end of
transaction (01)
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed reset at end of
transaction (01)
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed! (01)
Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed! (01)
Feb 23 19:58:58 mybox last message repeated 106 times

When I boot the box up, I see this:

Starting lm_sensors:  piix4_smbus 0000:00:0f.0: Illegal Interrupt
configuration (or code out of date)!

Having seen the syslog message, I tried adding "options i2c_piix4 fix_hstcfg=1"
to my /etc/modprobe.conf file.  Still no dice:

Feb 23 20:13:01 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device
Feb 23 20:11:29 mybox kernel: last message repeated 7 times
Feb 23 20:13:01 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device
Feb 23 20:13:01 mybox kernel: i2c_adapter i2c-0: Failed! (01)
Feb 23 20:13:01 mybox kernel: i2c_adapter i2c-0: Failed! (01)



Version-Release number of selected component (if applicable):
lm_sensors-2.8.7-2.40.3

How reproducible:
always

Steps to Reproduce:
1.  Run sensors-detect, going with default options the whole way through.
2.  Add a line reading "alias char-major-89 i2c-dev" to /etc/modprobe.conf.
3.  modprobe i2c-piix4 && modprobe w83781d
4.  Run "sensors -s".
5.  Add a line reading "options i2c_piix4 fix_hstcfg=1" to /etc/modprobe.conf.
6.  Remove all relevant modules with "rmmod w83781d eeprom i2c_piix4 i2c_dev
i2c_sensor i2c_core", then reinsert them with "modprobe i2c-piix4 && modprobe
w83781d".
7.  Run "sensors -s".
8.  Download and install Tyan's sensors.conf file:
ftp://ftp.tyan.com/software/lms/lms_s2707.tgz
9.  Since the file suggests doing so, add "options w83781d init=0" to
/etc/modprobe.conf.
9.  Remove all relevant modules with "rmmod w83781d eeprom i2c_piix4 i2c_dev
i2c_sensor i2c_core", then reinsert them with "modprobe i2c-piix4 && modprobe
w83781d".
10.  Run "sensors -s".
  
Actual results:
Errors in syslog and "sensors -s" fails to find any sensors.

Expected results:
No errors in syslog, sensors found.


Additional info:

[bash ~]# uname -a
Linux rtp-wbu-sr-backup1 2.6.9-22.0.2.EL #1 Thu Jan 5 17:03:45 EST 2006 i686
i686 i386 GNU/Linux
[bash ~]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 2)
[bash ~]#


Motherboad info: http://www.tyan.com/products/html/trinitygcsl.html
Broadcom GC SL chipset info:
http://www.broadcom.com/products/Enterprise-Small-Office/SystemI-O-Chips/GC-SL
Winbond W83782D info: http://www.winbond.com/PDF/sheet/w83782d.pdf
Comment 1 Phil Knirsch 2008-04-17 11:41:25 EDT
That sounds a lot more like kernel module problems though as they seem to fail
to initialize properly.

Have you tried this with the latest RHEL-4 update kernel lately?

Thanks,

Read ya, Phil
Comment 2 Mark T. Voelker 2008-04-17 13:16:33 EDT
The machine I discovered this on is no longer in active service (it's been over
two years), but it is still hanging around...I got it up and running with
2.6.9-22.0.2.EL first to make sure I could still reproduce the symptoms.  I can,
except that now I'm seeing the box reliably crash (three times in a row) when
i2c-piix4 is inserted using the fix_hstcfg=1 option.  No errors in the syslog or
on the console, just hangs until power cycled.  I agree that this probably looks
kernel-related.

I then upgraded to U4 with lm_sensors-2.8.7-2.40.3 and kernel 2.6.9-42.7.EL (the
latest in deployment at my organization).  I can now reinsert the i2c-piix4
module with fix_hstcfg=1 without crashing, but otherwise the symptoms are the same.
Comment 3 Phil Knirsch 2008-04-18 05:44:32 EDT
Reassigning to the kernel team then.

Read ya, Phil
Comment 4 David Milburn 2008-04-23 14:13:51 EDT
Mark,

Would it be possible to reproduce the original problem on
kernel-2.6.9-42.7.EL.TEST.bz182687.1.i686.rpm and then supply the dmesg output?

http://people.redhat.com/dmilburn/

I noticed that the upstream driver has removed the "fix_hstcfg" parameter, I did
the same in this test kernel so you can remove that from your modprobe.conf. I
turned on some debugging in the i2c-piix4.c driver to dump out some of the
SMBus registers to see if we can get a better idea why you are seeing the
"SMBus Timeout!" messages.

Thanks,
David
 
Comment 5 Mark T. Voelker 2008-04-23 14:56:51 EDT
Created attachment 303535 [details]
Dmesg from kernel-2.6.9-42.7.EL.TEST.bz182687.

Here's the output from dmesg when using kernel-2.6.9-42.7.EL.TEST.bz182687.
Comment 6 Mark T. Voelker 2008-04-23 15:00:25 EDT
Of particular interest in the above attachment will probably be the bus collision:

i2c_adapter i2c-0: Transaction (post): CNT=08, CMD=02, ADD=91, DAT0=4b, DAT1=00
i2c_adapter i2c-0: Transaction (pre): CNT=08, CMD=16, ADD=91, DAT0=4b, DAT1=00
i2c_adapter i2c-0: temp 09, timeout 501 MAX_TIMEOUT 500
i2c_adapter i2c-0: SMBus Timeout!
i2c_adapter i2c-0: Bus collision! SMBus may be locked until next hard reset.
(sorry!)
i2c_adapter i2c-0: Failed reset at end of transaction (01)

Comment 7 David Milburn 2008-04-24 16:32:35 EDT
Looking through the driver code and the PIIX4 SMBus docs,

temp 09 is the value read from the SMBus Host status register, showing
bit 0 and bit 3 are set.

Bit 0 is read-only and if it is set then it should indicate that the host
controller is in the process of completing a command.
Bit 3 is set by the hardware and does indicate there was a transaction collision.

The driver is successfully clearing Bit 3, but, Bit 0 remains set and the
driver is unable to send anymore commands since host controller is reporting
that it is busy. Each time the driver tries to submit another command
it does check the status register to make sure that the host is not busy
before hitting the start bit in the SMBus Host controller register. I did
see in the PIIX4 errata that there maybe a delay between the time the start
bit is set and the host busy bit is set by the controller. The driver currently
has a delay between the time the start bit is set and the time that it first
polls the host busy bit, the only thing I can think of is maybe in this case the 
delay isn't long enough meaning the driver hits the start bit and sees
in the status register that the host isn't busy, but really the transaction
hasn't been started yet. To rule this out I have increased the time delay
in the kernel-2.6.9-42.7.EL.TEST.bz182687.2.i686.rpm, would you please repeat
the test and supply the dmesg output again. Thanks.

http://people.redhat.com/dmilburn/
Comment 8 Mark T. Voelker 2008-04-25 08:04:43 EDT
Created attachment 303775 [details]
dmesg from kernel 2.6.9-42.7.EL.TEST.bz182687.2 

Here's the new output.	No bus collision this time.  And, things seem to be
working:

[root@loafersglory ~]# modprobe i2c-piix4;modprobe w83781d;modprobe eeprom


[root@loafersglory ~]# 
[root@loafersglory ~]# 
[root@loafersglory ~]# sensors -s
[root@loafersglory ~]# sensors
eeprom-i2c-0-52
Adapter: SMBus PIIX4 adapter at 0580
Memory type:		DDR SDRAM DIMM
Memory size (MB):	512

eeprom-i2c-0-50
Adapter: SMBus PIIX4 adapter at 0580
Memory type:		DDR SDRAM DIMM
Memory size (MB):	512

w83782d-i2c-0-28
Adapter: SMBus PIIX4 adapter at 0580
VCore 1:   +1.50 V  (min =  +0.00 V, max =  +0.00 V)	   ALARM  
2.5V:	   +2.50 V  (min =  +0.00 V, max =  +0.00 V)	   ALARM  
3.3V:	   +3.30 V  (min =  +3.14 V, max =  +3.46 V)		  
+5 V:	   +5.03 V  (min =  +4.73 V, max =  +5.24 V)	   ALARM  
+12V:	  +12.04 V  (min = +10.82 V, max = +13.19 V)	   ALARM  
-12V:	  -11.87 V  (min = -13.18 V, max = -10.88 V)	   ALARM  
-5 V:	   +3.54 V  (min =  -5.25 V, max =  -4.75 V)	   ALARM  
V5SB:	   +5.00 V  (min =  +4.73 V, max =  +5.24 V)	   ALARM  
VBat:	   +0.00 V  (min =  +2.40 V, max =  +3.60 V)	   ALARM  
CPU fan1:    0 RPM  (min = 21093 RPM, div = 2)		    ALARM  
chs fan2:    0 RPM  (min = 33750 RPM, div = 2)		    ALARM  
chs fan3:    0 RPM  (min = 21093 RPM, div = 2)		    ALARM  
sys1temp:    +43°C  (high =   +64°C, hyst =	+66°C)   sensor = 3904
transistor   ALARM   
sys2temp:  +45.0°C  (high =   +80°C, hyst =	+75°C)   sensor = 3904
transistor	     
CPU temp:  +15.5°C  (high =   +80°C, hyst =	+75°C)   sensor = PII/Celeron
diode		
ERROR: Can't get VID data!
alarms:   Chassis intrusion detection			   ALARM
beep_enable:
	  Sound alarm disabled

[root@loafersglory ~]#
Comment 9 David Milburn 2008-04-28 15:12:42 EDT
I have turned off the debugging statements and need to verify the minimal
delay needed between hitting the start bit and checking the host busy bit.
I have actually built 4 kernels, but, I think kernel-2.6.9-42.7.EL.TEST.bz182687.3
will work. If it doesn't would you please continue testing the other kernels
(.4, etc) until you reach one that does work. I appreciate your time, this should 
be the last test assuming we can get the driver maintainer to accept the change.
Thanks again.

http://people.redhat.com/dmilburn/
Comment 10 Mark T. Voelker 2008-04-28 16:36:00 EDT
.3 does the trick.
Comment 11 Jean Delvare 2008-05-03 01:44:51 EDT
Mark, can you please attach the output of lspci -nnv?
Comment 12 Mark T. Voelker 2008-05-04 21:35:23 EDT
Created attachment 304504 [details]
Requested output of lspci -nnv 

Here's the requested output of lspci -nnv, run on pciutils-2.1.99.test8-3.2
with kernel 2.6.9-42.7.EL.TEST.bz182687.3
Comment 13 Jean Delvare 2008-05-05 05:03:25 EDT
Thanks Mark.

> 00:0f.0 Class 0601: 1166:0201

That's a CSB5 south bridge, as I expected. For the records, I couldn't reproduce
the problem on my OSB4 (which doesn't mean it's not there though - as we don't
really know what it takes for the failure to happen.)
Comment 14 David Milburn 2008-05-05 12:43:57 EDT
Created attachment 304537 [details]
Untested patch

Jean, would this be a better approach? Thanks.
Comment 15 Jean Delvare 2008-05-05 17:32:28 EDT
David, your patch would have a very negative impact on performance for the CSB5.
Two consecutive msleep() can't sleep for less than 3 jiffies, which is 30 ms at
HZ=100. At HZ=1000 you'd still be waiting for 4 ms while 2 ms appear to be enough.

Given that the driver currently works fine at HZ<=250, the code change should
have no impact for these values of HZ. The key is to only have one initial
msleep. I'll attach a modified patch which should do the trick and works OK on
my PIIX4.
Comment 16 Jean Delvare 2008-05-05 17:34:06 EDT
Created attachment 304564 [details]
Improved patch
Comment 17 David Milburn 2008-05-05 17:48:55 EDT
Thank you Jean, I will build a test kernel with the patch in Comment #16
Comment 18 David Milburn 2008-05-06 11:42:16 EDT
Created attachment 304646 [details]
Inc delay for csb5 and remove fix_hstcfg

Jean,

Actually for RHEL4, I will need to use the patch in Comment #16 and add the 
upstream commit since we saw the messages initially and it is ok for bit 1
to be set when reading the SMBUS host cfg register.

commit 54aaa1ca1022d95d854315743241bb6bf59f531f
Author: Rudolf Marek <r.marek@sh.cvut.cz>
Date:	Tue Apr 25 13:06:41 2006 +0200

    [PATCH] I2C: i2c-piix4: Remove the fix_hstcfg parameter

I am currently building some RHEL4 test kernels with this patch.
Comment 19 David Milburn 2008-05-06 14:47:12 EDT
Mark,

Would you please test kernel-2.6.9-42.7.EL.TEST.bz182687.4.i686.rpm?

http://people.redhat.com/dmilburn/

Comment 20 Mark T. Voelker 2008-05-07 06:00:54 EDT
David, this kernel works fine.
Comment 21 Jean Delvare 2008-05-07 08:07:03 EDT
Patch added to my i2c tree, I will send it to Linus for kernel 2.6.26-rc2.
Comment 24 RHEL Product and Program Management 2008-05-28 19:06:58 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 25 RHEL Product and Program Management 2008-09-03 09:10:07 EDT
Updating PM score.
Comment 26 Vivek Goyal 2008-09-25 09:17:19 EDT
Committed in 78.11.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 28 Chris Ward 2009-03-27 10:18:23 EDT
~~ Attention Partners! Snap 1 Released ~~
RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should
be a fix present, which addresses this bug. NOTE: there is only a short time
left to test, please test and report back results on this bug
at your earliest convenience.

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have found a NEW bug, clone this
bug and describe the issues you encountered. Further questions can be
directed to your Red Hat Partner Manager.

If you have VERIFIED the bug fix. Please select your PartnerID from the
Verified field above. Please leave a comment with your test results details.
Include which arches tested, package version and any applicable logs.

 - Red Hat QE Partner Management
Comment 29 Chris Ward 2009-04-16 11:50:10 EDT
Confirmed that patch verified by cisco and customer linux-fr.org is included in the latest 4.8 kernel (-88.EL)
Comment 31 errata-xmlrpc 2009-05-18 15:29:04 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html

Note You need to log in before you can comment on or make changes to this bug.