Description of problem: I recently inherited a machine with a Tyan Trinity GC-SL (s2707) motherboard, which uses Broadcom's ServerWorks Grand Champion SL chipset and a Winbond W83782D monitoring ASIC (motherboard specs at: http://www.tyan.com/products/html/trinitygcsl.html). I ran sensors-detect, set up modprobe.conf, and inserted the appropriate drivers. However when I run "sensors -s", I get: [bash ~]# sensors -s No sensors found! [bash ~]# I also tried using a sensors.conf file found on Tyan's support site (which indicates me that lm_sensors must have worked with this board at some point). It can be found here: http://tyan.com/support/html/software_utilities.html It was meant for lm_sensors v2.7.0 but I figured it was worth a shot. Unfortunately I got the same results. When I manually inserted the drivers, I noticed the following errors in the syslog: Feb 23 19:29:10 mybox kernel: mtrr: type mismatch for fc000000,800000 old: uncachable new: write-combining Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Unusual config register value Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Unusual config register value Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1 if you experience problems Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Try using fix_hstcfg=1 if you experience problems Feb 23 19:56:05 mybox kernel: piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration (or code out of date)! Feb 23 19:56:05 mybox: piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration (or code out of date)! Feb 23 19:56:14 mybox kernel: i2c /dev entries driver Feb 23 19:56:14 mybox kernel: i2c /dev entries driver Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: SMBus Timeout! Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: SMBus Timeout! Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed reset at end of transaction (01) Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed reset at end of transaction (01) Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed! (01) Feb 23 19:56:18 mybox kernel: i2c_adapter i2c-0: Failed! (01) Feb 23 19:58:58 mybox last message repeated 106 times When I boot the box up, I see this: Starting lm_sensors: piix4_smbus 0000:00:0f.0: Illegal Interrupt configuration (or code out of date)! Having seen the syslog message, I tried adding "options i2c_piix4 fix_hstcfg=1" to my /etc/modprobe.conf file. Still no dice: Feb 23 20:13:01 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device Feb 23 20:11:29 mybox kernel: last message repeated 7 times Feb 23 20:13:01 mybox kernel: piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device Feb 23 20:13:01 mybox kernel: i2c_adapter i2c-0: Failed! (01) Feb 23 20:13:01 mybox kernel: i2c_adapter i2c-0: Failed! (01) Version-Release number of selected component (if applicable): lm_sensors-2.8.7-2.40.3 How reproducible: always Steps to Reproduce: 1. Run sensors-detect, going with default options the whole way through. 2. Add a line reading "alias char-major-89 i2c-dev" to /etc/modprobe.conf. 3. modprobe i2c-piix4 && modprobe w83781d 4. Run "sensors -s". 5. Add a line reading "options i2c_piix4 fix_hstcfg=1" to /etc/modprobe.conf. 6. Remove all relevant modules with "rmmod w83781d eeprom i2c_piix4 i2c_dev i2c_sensor i2c_core", then reinsert them with "modprobe i2c-piix4 && modprobe w83781d". 7. Run "sensors -s". 8. Download and install Tyan's sensors.conf file: ftp://ftp.tyan.com/software/lms/lms_s2707.tgz 9. Since the file suggests doing so, add "options w83781d init=0" to /etc/modprobe.conf. 9. Remove all relevant modules with "rmmod w83781d eeprom i2c_piix4 i2c_dev i2c_sensor i2c_core", then reinsert them with "modprobe i2c-piix4 && modprobe w83781d". 10. Run "sensors -s". Actual results: Errors in syslog and "sensors -s" fails to find any sensors. Expected results: No errors in syslog, sensors found. Additional info: [bash ~]# uname -a Linux rtp-wbu-sr-backup1 2.6.9-22.0.2.EL #1 Thu Jan 5 17:03:45 EST 2006 i686 i686 i386 GNU/Linux [bash ~]# cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 2) [bash ~]# Motherboad info: http://www.tyan.com/products/html/trinitygcsl.html Broadcom GC SL chipset info: http://www.broadcom.com/products/Enterprise-Small-Office/SystemI-O-Chips/GC-SL Winbond W83782D info: http://www.winbond.com/PDF/sheet/w83782d.pdf
That sounds a lot more like kernel module problems though as they seem to fail to initialize properly. Have you tried this with the latest RHEL-4 update kernel lately? Thanks, Read ya, Phil
The machine I discovered this on is no longer in active service (it's been over two years), but it is still hanging around...I got it up and running with 2.6.9-22.0.2.EL first to make sure I could still reproduce the symptoms. I can, except that now I'm seeing the box reliably crash (three times in a row) when i2c-piix4 is inserted using the fix_hstcfg=1 option. No errors in the syslog or on the console, just hangs until power cycled. I agree that this probably looks kernel-related. I then upgraded to U4 with lm_sensors-2.8.7-2.40.3 and kernel 2.6.9-42.7.EL (the latest in deployment at my organization). I can now reinsert the i2c-piix4 module with fix_hstcfg=1 without crashing, but otherwise the symptoms are the same.
Reassigning to the kernel team then. Read ya, Phil
Mark, Would it be possible to reproduce the original problem on kernel-2.6.9-42.7.EL.TEST.bz182687.1.i686.rpm and then supply the dmesg output? http://people.redhat.com/dmilburn/ I noticed that the upstream driver has removed the "fix_hstcfg" parameter, I did the same in this test kernel so you can remove that from your modprobe.conf. I turned on some debugging in the i2c-piix4.c driver to dump out some of the SMBus registers to see if we can get a better idea why you are seeing the "SMBus Timeout!" messages. Thanks, David
Created attachment 303535 [details] Dmesg from kernel-2.6.9-42.7.EL.TEST.bz182687. Here's the output from dmesg when using kernel-2.6.9-42.7.EL.TEST.bz182687.
Of particular interest in the above attachment will probably be the bus collision: i2c_adapter i2c-0: Transaction (post): CNT=08, CMD=02, ADD=91, DAT0=4b, DAT1=00 i2c_adapter i2c-0: Transaction (pre): CNT=08, CMD=16, ADD=91, DAT0=4b, DAT1=00 i2c_adapter i2c-0: temp 09, timeout 501 MAX_TIMEOUT 500 i2c_adapter i2c-0: SMBus Timeout! i2c_adapter i2c-0: Bus collision! SMBus may be locked until next hard reset. (sorry!) i2c_adapter i2c-0: Failed reset at end of transaction (01)
Looking through the driver code and the PIIX4 SMBus docs, temp 09 is the value read from the SMBus Host status register, showing bit 0 and bit 3 are set. Bit 0 is read-only and if it is set then it should indicate that the host controller is in the process of completing a command. Bit 3 is set by the hardware and does indicate there was a transaction collision. The driver is successfully clearing Bit 3, but, Bit 0 remains set and the driver is unable to send anymore commands since host controller is reporting that it is busy. Each time the driver tries to submit another command it does check the status register to make sure that the host is not busy before hitting the start bit in the SMBus Host controller register. I did see in the PIIX4 errata that there maybe a delay between the time the start bit is set and the host busy bit is set by the controller. The driver currently has a delay between the time the start bit is set and the time that it first polls the host busy bit, the only thing I can think of is maybe in this case the delay isn't long enough meaning the driver hits the start bit and sees in the status register that the host isn't busy, but really the transaction hasn't been started yet. To rule this out I have increased the time delay in the kernel-2.6.9-42.7.EL.TEST.bz182687.2.i686.rpm, would you please repeat the test and supply the dmesg output again. Thanks. http://people.redhat.com/dmilburn/
Created attachment 303775 [details] dmesg from kernel 2.6.9-42.7.EL.TEST.bz182687.2 Here's the new output. No bus collision this time. And, things seem to be working: [root@loafersglory ~]# modprobe i2c-piix4;modprobe w83781d;modprobe eeprom [root@loafersglory ~]# [root@loafersglory ~]# [root@loafersglory ~]# sensors -s [root@loafersglory ~]# sensors eeprom-i2c-0-52 Adapter: SMBus PIIX4 adapter at 0580 Memory type: DDR SDRAM DIMM Memory size (MB): 512 eeprom-i2c-0-50 Adapter: SMBus PIIX4 adapter at 0580 Memory type: DDR SDRAM DIMM Memory size (MB): 512 w83782d-i2c-0-28 Adapter: SMBus PIIX4 adapter at 0580 VCore 1: +1.50 V (min = +0.00 V, max = +0.00 V) ALARM 2.5V: +2.50 V (min = +0.00 V, max = +0.00 V) ALARM 3.3V: +3.30 V (min = +3.14 V, max = +3.46 V) +5 V: +5.03 V (min = +4.73 V, max = +5.24 V) ALARM +12V: +12.04 V (min = +10.82 V, max = +13.19 V) ALARM -12V: -11.87 V (min = -13.18 V, max = -10.88 V) ALARM -5 V: +3.54 V (min = -5.25 V, max = -4.75 V) ALARM V5SB: +5.00 V (min = +4.73 V, max = +5.24 V) ALARM VBat: +0.00 V (min = +2.40 V, max = +3.60 V) ALARM CPU fan1: 0 RPM (min = 21093 RPM, div = 2) ALARM chs fan2: 0 RPM (min = 33750 RPM, div = 2) ALARM chs fan3: 0 RPM (min = 21093 RPM, div = 2) ALARM sys1temp: +43°C (high = +64°C, hyst = +66°C) sensor = 3904 transistor ALARM sys2temp: +45.0°C (high = +80°C, hyst = +75°C) sensor = 3904 transistor CPU temp: +15.5°C (high = +80°C, hyst = +75°C) sensor = PII/Celeron diode ERROR: Can't get VID data! alarms: Chassis intrusion detection ALARM beep_enable: Sound alarm disabled [root@loafersglory ~]#
I have turned off the debugging statements and need to verify the minimal delay needed between hitting the start bit and checking the host busy bit. I have actually built 4 kernels, but, I think kernel-2.6.9-42.7.EL.TEST.bz182687.3 will work. If it doesn't would you please continue testing the other kernels (.4, etc) until you reach one that does work. I appreciate your time, this should be the last test assuming we can get the driver maintainer to accept the change. Thanks again. http://people.redhat.com/dmilburn/
.3 does the trick.
Mark, can you please attach the output of lspci -nnv?
Created attachment 304504 [details] Requested output of lspci -nnv Here's the requested output of lspci -nnv, run on pciutils-2.1.99.test8-3.2 with kernel 2.6.9-42.7.EL.TEST.bz182687.3
Thanks Mark. > 00:0f.0 Class 0601: 1166:0201 That's a CSB5 south bridge, as I expected. For the records, I couldn't reproduce the problem on my OSB4 (which doesn't mean it's not there though - as we don't really know what it takes for the failure to happen.)
Created attachment 304537 [details] Untested patch Jean, would this be a better approach? Thanks.
David, your patch would have a very negative impact on performance for the CSB5. Two consecutive msleep() can't sleep for less than 3 jiffies, which is 30 ms at HZ=100. At HZ=1000 you'd still be waiting for 4 ms while 2 ms appear to be enough. Given that the driver currently works fine at HZ<=250, the code change should have no impact for these values of HZ. The key is to only have one initial msleep. I'll attach a modified patch which should do the trick and works OK on my PIIX4.
Created attachment 304564 [details] Improved patch
Thank you Jean, I will build a test kernel with the patch in Comment #16
Created attachment 304646 [details] Inc delay for csb5 and remove fix_hstcfg Jean, Actually for RHEL4, I will need to use the patch in Comment #16 and add the upstream commit since we saw the messages initially and it is ok for bit 1 to be set when reading the SMBUS host cfg register. commit 54aaa1ca1022d95d854315743241bb6bf59f531f Author: Rudolf Marek <r.marek.cz> Date: Tue Apr 25 13:06:41 2006 +0200 [PATCH] I2C: i2c-piix4: Remove the fix_hstcfg parameter I am currently building some RHEL4 test kernels with this patch.
Mark, Would you please test kernel-2.6.9-42.7.EL.TEST.bz182687.4.i686.rpm? http://people.redhat.com/dmilburn/
David, this kernel works fine.
Patch added to my i2c tree, I will send it to Linus for kernel 2.6.26-rc2.
Patch made it in kernel 2.6.26-rc2: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b1c1759cd192fe1d27989f986c7f6b2939905e0c
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
Committed in 78.11.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
~~ Attention Partners! Snap 1 Released ~~ RHEL 4.8 Snapshot 1 has been released on partners.redhat.com. There should be a fix present, which addresses this bug. NOTE: there is only a short time left to test, please test and report back results on this bug at your earliest convenience. If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have found a NEW bug, clone this bug and describe the issues you encountered. Further questions can be directed to your Red Hat Partner Manager. If you have VERIFIED the bug fix. Please select your PartnerID from the Verified field above. Please leave a comment with your test results details. Include which arches tested, package version and any applicable logs. - Red Hat QE Partner Management
Confirmed that patch verified by cisco and customer linux-fr.org is included in the latest 4.8 kernel (-88.EL)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html