Bug 60592 - eeprom checksum corrupted on CS20/DS20L
Summary: eeprom checksum corrupted on CS20/DS20L
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: alpha
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Phil Copeland
QA Contact: Beth Uptagrafft
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-03-01 21:50 UTC by john.goshdigian
Modified: 2007-04-18 16:40 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2002-03-14 01:20:32 UTC
Embargoed:


Attachments (Terms of Use)
session log (12.59 KB, text/plain)
2002-03-01 23:02 UTC, Diego Novillo
no flags Details
Updated kernel-utils package (rpm --rebuild kernel-utils-2.4-3.6.4.src.rpm) (237.12 KB, application/octet-stream)
2002-03-05 20:17 UTC, Phil Copeland
no flags Details
Output of 'show dev' after power cycling the machine (1.02 KB, text/plain)
2002-03-08 22:52 UTC, Diego Novillo
no flags Details

Description john.goshdigian 2002-03-01 21:50:56 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)
Gecko/20011128 Netscape6/6.2.1

Description of problem:
Upon installing beta2 kit on DS20L (CS20), system comes up and network is
configured and one can ping nodes on the network.
Enter ifconfig on serial line console, the information is displayed,
but then gets hung - can't type anything on the console.
One can power off the console terminal and log back in,
but sooner or later the console hangs again.
Entering ifconfig, guarantees the serial line will lock up quickly.



Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. install 7.2 for Alpha beta2 on DS20l (CS20) on serial line
2. boot system with active network
3. type ifconfig on serial console


Actual Results:  4. witness hung console.	

Expected Results:  serial console continue to operate.

Additional info:

This happened with beta2 on 2 DS20l systems.
It appears that something corrupts the eeprom on the DS20l, 
because upon reboot, the h/w diagnostics complain of eeprom checksum error..

Comment 1 Phil Copeland 2002-03-01 22:07:52 UTC
John, does this also happen on a DS10 or is it DS20 specific? (just it's easier
for me to get my hands on a DS10) I'm wondering to myself if the inteerupt
routing isn't initialized correctly. (same sort of bug as the 4100 patch was
trying to address) by the description, it sounds like it

Cheers

Phil
=--=


Comment 2 john.goshdigian 2002-03-01 22:30:02 UTC
This has only happened on the CS20, not the DS10.
That's the CS20, which is to be called the DS20L..

Comment 3 Diego Novillo 2002-03-01 23:02:49 UTC
Created attachment 47163 [details]
session log

Comment 4 Diego Novillo 2002-03-01 23:03:29 UTC
I can't reproduce this bug on the CS20D we have in Toronto.  Attached session 
log.


Comment 5 john.goshdigian 2002-03-05 15:17:37 UTC
It appears that the eepro100-diag program modified the eeprom.
Diego, did you power down the CS20 and do a fresh install (not upgrade)?

As a result of these installs, we now have 2 CS20 that have bad checksums on the eeprom.
At boot time, message: 
intializing GCT/FRU at 1d400
*** Error (eib0.0.0.4.0), Warning, Bad Checksum on eeprom 
so at SRM>>> show dev only shows eia0 

Upon booting beta2, Entering eepro100-diag -f -ee shows
***** The EEPROM checksum is INCORRECT| *****
the checksum is 0xA125, it should be 0xBABA

Re-installed 7.1, but 7.1 did not have eepro100-diag, so checksum still bad.
Help! How can we reset the eeprom?

Note: eepro100-diag with beta2 is based on v2.05 , on scyld it's v2.07

Comment 6 john.goshdigian 2002-03-05 15:32:54 UTC
Note - the serial line problem went away with a different terminal.
the summary should be eeprom checksum corrupted on CS20/DS20L 


Comment 7 Phil Copeland 2002-03-05 20:14:27 UTC
I had a quick chat with arjan this morning about this.
He'd like us to try using the current cvs version of the eepro100-diag program

I'm attaching the srpm



Comment 8 Phil Copeland 2002-03-05 20:17:35 UTC
Created attachment 47464 [details]
Updated kernel-utils package (rpm --rebuild kernel-utils-2.4-3.6.4.src.rpm)

Comment 9 john.goshdigian 2002-03-06 18:40:43 UTC
tried to reset the eeprom with the eepro10-diag that was in the rpm attachment,
but still no progress. The command used was: 
# /usr/sbin/eepro100-diag -E -f
then 
# /usr/sbin/eepro100-diag -ee -f | less
and again for 
Index #1 checksum is 0x8FBD, it should be 0xBABA!
Index #2 checksum is 0xFF00, it should be 0xBABA!
Upone power up, SRM still complains about bad checksum and 
show dev still doesn't show eib0 

What else can we an try to reset the eeprom?

Comment 10 Bill Nottingham 2002-03-06 19:42:17 UTC
Will be fixed in kudzu-0.99.34-1; however, that won't reset the EEPROM for you.

You can try running eepro100-diag -f -G3 -w -w ; this *should* sanitize the
eeprom, barring a bug in eepro100-diag.

Comment 11 john.goshdigian 2002-03-07 00:31:39 UTC
Tried the command suggested - eepro100-diag -f -G3 -w -w 
still results in eeprom checksum incorrect.
and same SRM problem eib0 - not enabled



Comment 12 Diego Novillo 2002-03-08 22:52:53 UTC
Created attachment 47960 [details]
Output of 'show dev' after power cycling the machine

Comment 13 Diego Novillo 2002-03-08 22:57:12 UTC
I still can't seem to reproduce the problem.  I power cycled the machine and
everything came up normally.

Anything else I could try?

Comment 14 Phil Copeland 2002-03-12 21:08:46 UTC
Ok, talked to Bill about this.
Seepms that eepro100-diag is now pulled from the kudzu package so this should
not be a future issue.
(so for now zap /usr/sbin/eepro100-diag)

As for what to do in the meantime on the machine thats got the corrupted eeprom,
I don't have an answer.
I'm going to guess this is an onboard controller too so not something that you
can lift off onto an intel box.

Phil
=--=

Comment 15 George France 2002-03-13 23:23:52 UTC
It appears that the DS20L was manufactured with two different ethernet chip sets
the Intel 82550 and Intel 82559.  This probably explains why we see the problem
in Nashua and you do not see it in Toronto.  We should probably compare chip
numbers.

--George

Comment 16 Diego Novillo 2002-03-14 01:20:27 UTC
lspci returns:

$ lspci
00:03.0 SCSI storage controller: LSI Logic / Symbios Logic (formerly NCR)
53c1010 66MHz  Ultra3 SCSI Adapter (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev
08)00:07.0 ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge
[Aladdin IV] (rev c3)
00:10.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c2)
00:11.0 Non-VGA unclassified device: Acer Laboratories Inc. [ALi] M7101 PMU
01:03.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)



Comment 17 Beth Uptagrafft 2002-04-03 22:03:01 UTC
fixed in next release


Note You need to log in before you can comment on or make changes to this bug.