Bug 55602

Summary: Possible 82559 EPROM corruption
Product: [Retired] Red Hat Linux Reporter: Need Real Name <shaun.sloan>
Component: kudzuAssignee: Bill Nottingham <notting>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: james.a.caufield, rvokal, shaun.sloan
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-12-13 18:14:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
e100util utility for modifying the STB bit none

Description Need Real Name 2001-11-02 19:13:53 UTC
Description of Problem:

Bugzilla Bug 53566 was closed after a modification was made to Kudzu so as 
to disable the Standby Enable Bit on 82559 network adapters. After 
attempting to verify this fix we discovered possible corruption of the 559 
adapter EPROM.

Version-Release number of selected component (if applicable):

0.99.25-1

How Reproducible:

Always.

Steps to Reproduce:

1. On a Lion IA-64 platform, enable the STB bit on the 559 adapter using 
your utility of choice (we used Intel's 'e100util' program).
2. Install Pensacola release candidate 3.
3. When the installation is completed and the machine reboots, watch for 
the message:

"PXE-E05: The LAN adapters configuration is corrupt or has not been 
initialized. The Boot Agent cannot continue."

or:

"PXE-E05: Checksum Error"

Actual Results:
The STB bit is disabled. The machine proceeds to reboot Linux with no 
problems. But a network boot would have failed had it been attempted.

Expected Results:

No errors from PXE regarding adapter corruption.

Additional Information:

We ran this test on two different Lion platforms and observed the two 
different messages listed above. We then removed the adapter and analyzed 
its EPROM checksum through the use of the 'ST' DOS utility. The utility 
showed no change in the checksum from the value we recorded prior to the 
test. In addition, 'ST' was asked to recompute the checksum and showed no 
change in its value.

Comment 1 Bill Nottingham 2001-11-02 19:58:08 UTC
I can't reproduce this here on a workstation machine; it always claims the
eeprom is correct.

Comment 2 Bill Nottingham 2001-11-02 19:59:42 UTC
Does it show this failure on all successive boots, or only on the first one?

Comment 3 Need Real Name 2001-11-02 20:15:31 UTC
Bill: Yes, we observe the error on all successive boots as well as the first 
time. - Jim Caufield (no account; using Shaun's)

Comment 4 Bill Nottingham 2001-11-02 20:21:38 UTC
If you run '/usr/sbin/eepro100-diag -ee -f', does it claim the EEPROM checksum
is correct?

Comment 5 Need Real Name 2001-11-02 20:58:08 UTC
Bill: I ran '/usr/sbin/eepro100-diag -ee -f' and saw:

eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker)
 http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0x6f00.
EEPROM contents, size 256x16:
    00: d000 c1b7 735f 0c13 0000 0201 4701 0000
  0x08: 7270 9504 48e2 3000 8086 0061 0000 0000
      ...
  0x30: 002c 0000 0000 0000 0000 0000 0000 0000
      ...
  0xf8: 0000 0000 0000 0000 0000 0000 0000 a807
 *****  The EEPROM checksum is INCORRECT!  *****
  The checksum is 0x39B, it should be 0xBABA!
Intel EtherExpress Pro 10/100 EEPROM contents:
  Station address 00:D0:B7:C1:5F:73.
  Board assembly 727095-004, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
   Sleep mode is enabled.  This is not recommended.
   Under high load the card may not respond to
    PCI requests, and thus cause a master abort.

<Jim


Comment 6 Bill Nottingham 2001-11-02 21:02:28 UTC
Hm, that implies that not only is the checksum wrong, it didn't reset the bits.

Can you post the output of the same command on a working machine before
anything's touched it?

Comment 7 Need Real Name 2001-11-02 21:14:37 UTC
Bill: Here's the command output:

eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker)
 http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0.
This chip has not been assigned a valid I/O address, and will not function.
 If you have warm-booted from another operating system, a complete 
 shut-down and power cycle may restore the card to normal operation.

This LOM was disabled in the kernel. Do you need this run with it enabled?

<Jim


Comment 8 Bill Nottingham 2001-11-02 21:19:14 UTC
If possible, yes.

Comment 9 Need Real Name 2001-11-02 23:45:42 UTC
Bill: Do you need anything further from us? We understand that you don't have a 
Lion to play with and want to help.

I also took a look at one of the planning docs that was used when we were first 
wrestling with this issue and found some pseudo code that describes how we 
flipped the bit in our driver:
-------------------------------------------------------------------
1. Code to set the Standby Enable bit (0x2) in the EEPROM ID (word 0x0A).  

        [psuedo code]

        if (pci_hw_rev_id >= 8)
                WORD id_reg = ReadEepromWord(EEPROM_ID_WORD=0x0A)
                if (id_reg & 0x02)
                        id_reg &= ~0x02
                        WriteEepromWord(EEPROM_ID_WORD, id_reg)
                        UpdateEepromChecksum()

2. Code to write a word to the EEPROM:

        #define EE_EWEN_CMD             19
        #define EE_EWDS_CMD             16

        STATIC void
        intl100_write_eeprom_word(intl100_ift_t *iftp, ubit16 ee_reg, ubit16 
data)
        {
                ubit16 bits;

                bits = intl100_eeprom_address_size(iftp->eeprom_size);

                INTL100_REG_WRITE16(CSR_EEPROM, EE_CS);

                intl100_shift_out_bits(iftp, EE_EWEN_CMD, 5);
                intl100_shift_out_bits(iftp, ee_reg, bits - 2);
               
                /* stall for 20ms */   

                intl100_shift_out_bits(iftp, EE_WRITE_CMD, 3);         
                intl100_shift_out_bits(iftp, ee_reg, bits);
                intl100_shift_out_bits(iftp, data, 16);

                /* stall for 20ms */   

                intl100_shift_out_bits(iftp, EE_EWDS_CMD, 5);
                intl100_shift_out_bits(iftp, ee_reg, bits - 2);
       
                /* stall for 20ms */   

                intl100_eeprom_cleanup(iftp);
        }


3. Code to update the EEPROM checksum:

        STATIC void
        intl100_update_eeprom_checksum(intl100_ift_t *iftp)
        {
                ubit16 idx, xsumIndex, checksum = 0;

                xsumIndex = iftp->eeprom_size - 1;

                for (idx = 0; idx < xsumIndex; idx++)
                        checksum += intl100_read_eeprom_word(iftp, idx);

                checksum = (ubit16)0xBABA - checksum;
                intl100_write_eeprom_word(iftp, idx, checksum);
        }
------------------------------------------------------------------------

<Jim

Comment 10 Bill Nottingham 2001-11-03 00:22:44 UTC
Basically, I'd like the output of 'eepro100-diag -ee -f' on a machine that
hasn't been changed at all (i.e., before the eeprom gets weird stuff in it); if
this requires re-enabling the chip in the bios, that would be good. I can test
it on a Lion next week.

Comment 11 Need Real Name 2001-11-05 18:03:20 UTC
Here's the output from a Lion that has not had version 0.99.25-1 of Kudzu run 
on it:

eepro100-diag.c:v2.05 6/13/2001 Donald Becker (becker)
 http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557/8/9 EtherExpressPro100 adapter at 0x6f00.
EEPROM contents, size 256x16:
    00: d000 f8b7 d4d9 0c13 0000 0201 4701 0000
  0x08: 7270 9504 48e0 3000 8086 0061 0000 0000
      ...
  0x30: 002c 4000 4014 0000 0000 0000 0000 0000
      ...
  0xf8: 0000 0000 0000 0000 0000 0000 0000 469a
 The EEPROM checksum is correct.
Intel EtherExpress Pro 10/100 EEPROM contents:
  Station address 00:D0:B7:F8:D9:D4.
  Board assembly 727095-004, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.

<Jim

Comment 12 Bill Nottingham 2001-12-11 20:50:24 UTC
I've just tried to reproduce this on a Lion with an on-board adapter; I reset
the bit manually, and then did an install. The bit was correctly changed, and
the checksum was correct.

Comment 13 Need Real Name 2001-12-12 16:52:04 UTC
Bill,

Which release did you perfom this test with?

Thanks,

<Jim

Comment 14 Bill Nottingham 2001-12-12 16:53:47 UTC
The 1207 tree, on a 4-CPU Dell PE7150, running Dell's A02 firmware (based on
Intel's 97, IIRC.)

Comment 15 Need Real Name 2001-12-12 22:36:04 UTC
Bill,

We repeated our STB test today.

We used the e100util program to enable the STB bit on the LOM. Next, we 
installed Redhat 7.2 build 1207. On reboot, when the system is "Scanning Option 
ROMs", the following message is printed:

PXE-E05: The LAN adapter's configuration is corrupted or has not been 
initialized. The Boot Agent cannot continue.

This message was not there before we installed RedHat, and the message is 
printed on all subsequent reboots. This is the same problem that existed in 
previous versions of RedHat 7.2. When e100util is run in query mode after the 
Redhat install, it reports that the bit is disabled.

Jim


Comment 16 Bill Nottingham 2001-12-12 23:59:41 UTC
Can you post the e100util program?

Comment 17 Bill Nottingham 2001-12-13 06:07:14 UTC
OK. The problem appears to be that when eepro100-diag reads the value back from
the register it writes to disable the bit, it (for some reason) occasionally
gets 0xffff; it uses this to compute the checksum, so the checksum ends up wrong.

A workaround has been added in kudzu-0.99.30-1, available from
http://people.redhat.com/notting/kudzu/

How to test this outside of an install tree:

1. Enable the bit with e100util.
2. Remove /etc/sysconfig/hwconf
3. Run kudzu -q

With the old kudzu, it would exhibit the problem (or, at least, it *could*).
With the new one, it should disable the bit, and the eeprom should be correct.

(Running /usr/sbin/eepro100-diag -ee -f should tell you if the checksum is
correct or not.)

Comment 18 Need Real Name 2001-12-13 17:34:02 UTC
Created attachment 40493 [details]
e100util utility for modifying the STB bit

Comment 19 Need Real Name 2001-12-13 18:14:18 UTC
We ran the above test and found that the new version of Kudzu disabled that bit 
and did not corrupt the checksum.

Looks like a good fix.

Comment 20 Bill Nottingham 2001-12-13 19:25:02 UTC
OK, marking resolved. This should be in the final release.