Bug 55602
Summary: | Possible 82559 EPROM corruption | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Need Real Name <shaun.sloan> | ||||
Component: | kudzu | Assignee: | Bill Nottingham <notting> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | David Lawrence <dkl> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.3 | CC: | james.a.caufield, rvokal, shaun.sloan | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | ia64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2001-12-13 18:14:23 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Need Real Name
2001-11-02 19:13:53 UTC
I can't reproduce this here on a workstation machine; it always claims the eeprom is correct. Does it show this failure on all successive boots, or only on the first one? Bill: Yes, we observe the error on all successive boots as well as the first time. - Jim Caufield (no account; using Shaun's) If you run '/usr/sbin/eepro100-diag -ee -f', does it claim the EEPROM checksum is correct? Bill: I ran '/usr/sbin/eepro100-diag -ee -f' and saw: eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker) http://www.scyld.com/diag/index.html Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0x6f00. EEPROM contents, size 256x16: 00: d000 c1b7 735f 0c13 0000 0201 4701 0000 0x08: 7270 9504 48e2 3000 8086 0061 0000 0000 ... 0x30: 002c 0000 0000 0000 0000 0000 0000 0000 ... 0xf8: 0000 0000 0000 0000 0000 0000 0000 a807 ***** The EEPROM checksum is INCORRECT! ***** The checksum is 0x39B, it should be 0xBABA! Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:D0:B7:C1:5F:73. Board assembly 727095-004, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. Sleep mode is enabled. This is not recommended. Under high load the card may not respond to PCI requests, and thus cause a master abort. <Jim Hm, that implies that not only is the checksum wrong, it didn't reset the bits. Can you post the output of the same command on a working machine before anything's touched it? Bill: Here's the command output: eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker) http://www.scyld.com/diag/index.html Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0. This chip has not been assigned a valid I/O address, and will not function. If you have warm-booted from another operating system, a complete shut-down and power cycle may restore the card to normal operation. This LOM was disabled in the kernel. Do you need this run with it enabled? <Jim If possible, yes. Bill: Do you need anything further from us? We understand that you don't have a Lion to play with and want to help. I also took a look at one of the planning docs that was used when we were first wrestling with this issue and found some pseudo code that describes how we flipped the bit in our driver: ------------------------------------------------------------------- 1. Code to set the Standby Enable bit (0x2) in the EEPROM ID (word 0x0A). [psuedo code] if (pci_hw_rev_id >= 8) WORD id_reg = ReadEepromWord(EEPROM_ID_WORD=0x0A) if (id_reg & 0x02) id_reg &= ~0x02 WriteEepromWord(EEPROM_ID_WORD, id_reg) UpdateEepromChecksum() 2. Code to write a word to the EEPROM: #define EE_EWEN_CMD 19 #define EE_EWDS_CMD 16 STATIC void intl100_write_eeprom_word(intl100_ift_t *iftp, ubit16 ee_reg, ubit16 data) { ubit16 bits; bits = intl100_eeprom_address_size(iftp->eeprom_size); INTL100_REG_WRITE16(CSR_EEPROM, EE_CS); intl100_shift_out_bits(iftp, EE_EWEN_CMD, 5); intl100_shift_out_bits(iftp, ee_reg, bits - 2); /* stall for 20ms */ intl100_shift_out_bits(iftp, EE_WRITE_CMD, 3); intl100_shift_out_bits(iftp, ee_reg, bits); intl100_shift_out_bits(iftp, data, 16); /* stall for 20ms */ intl100_shift_out_bits(iftp, EE_EWDS_CMD, 5); intl100_shift_out_bits(iftp, ee_reg, bits - 2); /* stall for 20ms */ intl100_eeprom_cleanup(iftp); } 3. Code to update the EEPROM checksum: STATIC void intl100_update_eeprom_checksum(intl100_ift_t *iftp) { ubit16 idx, xsumIndex, checksum = 0; xsumIndex = iftp->eeprom_size - 1; for (idx = 0; idx < xsumIndex; idx++) checksum += intl100_read_eeprom_word(iftp, idx); checksum = (ubit16)0xBABA - checksum; intl100_write_eeprom_word(iftp, idx, checksum); } ------------------------------------------------------------------------ <Jim Basically, I'd like the output of 'eepro100-diag -ee -f' on a machine that hasn't been changed at all (i.e., before the eeprom gets weird stuff in it); if this requires re-enabling the chip in the bios, that would be good. I can test it on a Lion next week. Here's the output from a Lion that has not had version 0.99.25-1 of Kudzu run on it: eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker) http://www.scyld.com/diag/index.html Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0x6f00. EEPROM contents, size 256x16: 00: d000 f8b7 d4d9 0c13 0000 0201 4701 0000 0x08: 7270 9504 48e0 3000 8086 0061 0000 0000 ... 0x30: 002c 4000 4014 0000 0000 0000 0000 0000 ... 0xf8: 0000 0000 0000 0000 0000 0000 0000 469a The EEPROM checksum is correct. Intel EtherExpress Pro 10/100 EEPROM contents: Station address 00:D0:B7:F8:D9:D4. Board assembly 727095-004, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. <Jim I've just tried to reproduce this on a Lion with an on-board adapter; I reset the bit manually, and then did an install. The bit was correctly changed, and the checksum was correct. Bill, Which release did you perfom this test with? Thanks, <Jim The 1207 tree, on a 4-CPU Dell PE7150, running Dell's A02 firmware (based on Intel's 97, IIRC.) Bill, We repeated our STB test today. We used the e100util program to enable the STB bit on the LOM. Next, we installed Redhat 7.2 build 1207. On reboot, when the system is "Scanning Option ROMs", the following message is printed: PXE-E05: The LAN adapter's configuration is corrupted or has not been initialized. The Boot Agent cannot continue. This message was not there before we installed RedHat, and the message is printed on all subsequent reboots. This is the same problem that existed in previous versions of RedHat 7.2. When e100util is run in query mode after the Redhat install, it reports that the bit is disabled. Jim Can you post the e100util program? OK. The problem appears to be that when eepro100-diag reads the value back from the register it writes to disable the bit, it (for some reason) occasionally gets 0xffff; it uses this to compute the checksum, so the checksum ends up wrong. A workaround has been added in kudzu-0.99.30-1, available from http://people.redhat.com/notting/kudzu/ How to test this outside of an install tree: 1. Enable the bit with e100util. 2. Remove /etc/sysconfig/hwconf 3. Run kudzu -q With the old kudzu, it would exhibit the problem (or, at least, it *could*). With the new one, it should disable the bit, and the eeprom should be correct. (Running /usr/sbin/eepro100-diag -ee -f should tell you if the checksum is correct or not.) Created attachment 40493 [details]
e100util utility for modifying the STB bit
We ran the above test and found that the new version of Kudzu disabled that bit and did not corrupt the checksum. Looks like a good fix. OK, marking resolved. This should be in the final release. |