Bug 99333
Summary: | (NET TG3) bcm5701/SX @ 33Mhz PCI doesn't init after system reset | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 2.1 | Reporter: | Glen A. Foster <glen.foster> |
Component: | kernel | Assignee: | Jason Baron <jbaron> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.1 | CC: | jgarzik, knoel, riel, shillman, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | ia64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-09-03 14:46:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 87937 |
Description
Glen A. Foster
2003-07-17 19:25:47 UTC
======= ORIGINAL defect report ========= In environment: EV01 in SV Bug reported on: Tuesday 06/03 2003 at 15:59 154 Abe revision: 20.01 ------------------------------------------------------- SYMPTOMS: During RHAS2.1 testing on Madison for Everest, system hang. Reboot the system, OS found all the cards, but GigE-SX in slot 7 doesn't have connection. Sometimes another reboot fixed the problem and connection came back. Sometimes I had to pull out the cable, then plug in the cable again. Pull the cable out always fix the problem. I saw this problem several times and it comes and go. ------------------------------------------------------- CONDITIONS WHICH RELIABLY REPRODUCE BUG SYMPTOMS: 1 Way Everest Madison with 1G memory. SFW 3.1, BMC 1.30 and MP E.02.10 IO config: slot 2: VGA/USB slot 4: GigE-SX slot 5: U160x2 slot 6: HVD/FW slot 7: GigE-SX slot 8: 10/100BT slot 9: U160x2 slot 10: U160 slot 11: U160 slot 12: 10/100BT ======== UPDATED INFO 09-Jul-2003 ======== Updated on: Wednesday 07/09 2003 at 15:28 190 Abe revision: 20.02 ------------------------------------------------------- UPDATE: With RHAS2.1 Q2 udpate, reboot hang and GigE-SX no connection still happens. Notice that the card is running at 33MHz and it shared with HVD/FW card. Following is the log. Jul 8 18:35:04 ev01 kernel: tg3.c:v1.4c (Feb 18, 2003) Jul 8 18:35:04 ev01 kernel: PCI: Found IRQ 56 for device 21:04.0 Jul 8 18:35:04 ev01 kernel: eth0: Tigon3 [partno(A6794-60001) rev 0105 PHY(5701 )] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:30:6e:28:17:6b Jul 8 18:35:04 ev01 kernel: PCI: Found IRQ 68 for device 80:01.0 Jul 8 18:35:04 ev01 kernel: eth1: Tigon3 [partno(A6847-60001) rev 0105 PHY(serd es)] (PCI:66MHz:64-bit) 10/100/1000BaseT Ethernet 00:04:76:df:be:0d Jul 8 18:35:04 ev01 kernel: PCI: Found IRQ 92 for device e0:02.0 Jul 8 18:35:04 ev01 kernel: eth2: Tigon3 [partno(A6847-60101) rev 0105 PHY(serd es)] (PCI:33MHz:64-bit) 10/100/1000BaseT Ethernet 00:30:6e:49:98:7d Jul 8 18:35:04 ev01 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex. Jul 8 18:35:04 ev01 kernel: tg3: eth0: Flow control is off for TX and off for R X. Jul 8 18:35:04 ev01 kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex. Jul 8 18:35:04 ev01 kernel: tg3: eth1: Flow control is off for TX and off for R X. Jul 8 18:35:04 ev01 kernel: tg3: eth2: Link is up at 1000 Mbps, full duplex. Jul 8 18:35:04 ev01 kernel: tg3: eth2: Flow control is off for TX and off for R X. tg3.c:v1.4c (Feb 18, 2003) PCI: Found IRQ 56 for device 21:04.0 divert: allocating divert_blk for eth0 eth0: Tigon3 [partno(A6794-60001) rev 0105 PHY(5701)] (PCI:66MHz:64-bit) 10/100/ 1000BaseT Ethernet 00:30:6e:28:17:6b PCI: Found IRQ 68 for device 80:01.0 divert: allocating divert_blk for eth1 eth1: Tigon3 [partno(A6847-60001) rev 0105 PHY(serdes)] (PCI:66MHz:64-bit) 10/10 0/1000BaseT Ethernet 00:04:76:df:be:0d PCI: Found IRQ 92 for device e0:02.0 divert: allocating divert_blk for eth2 eth2: Tigon3 [partno(A6847-60101) rev 0105 PHY(serdes)] (PCI:33MHz:64-bit) 10/10 0/1000BaseT Ethernet 00:30:6e:49:98:7d tg3: eth0: Link is up at 100 Mbps, full duplex. tg3: eth0: Flow control is off for TX and off for RX. tg3: eth1: Link is up at 1000 Mbps, full duplex. tg3: eth1: Flow control is off for TX and off for RX. tg3: eth2: Link is up at 1000 Mbps, full duplex. tg3: eth2: Flow control is off for TX and off for RX. eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100. html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@sa w.sw.com.sg> and others ev01 -> ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:30:6E:49:98:7D inet addr:10.20.90.121 Bcast:10.20.90.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:22 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 b) TX bytes:1408 (1.3 Kb) Interrupt:92 ev01 -> ping 10.20.90.69 PING 10.20.90.69 (10.20.90.69) from 10.20.90.121 : 56(84) bytes of data. >From 10.20.90.121: Destination Host Unreachable >From 10.20.90.121: Destination Host Unreachable >From 10.20.90.121: Destination Host Unreachable ev01 -> route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.10.14.0 0.0.0.0 255.255.255.0 U 0 0 0 eth3 10.20.91.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 10.10.15.0 0.0.0.0 255.255.255.0 U 0 0 0 eth4 10.20.90.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2 10.0.0.0 0.0.0.0 255.255.254.0 U 0 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo 0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0 ======== UPDATED INFO on 09-Jul-2003 ======== Updated on: Wednesday 07/09 2003 at 17:17 190 Abe revision: 20.02 ------------------------------------------------------- UPDATE: The symptom is one can't ping other hosts on the same subnet. The tg3 driver *thinks* it properly initialized the card. tg3 driver is v1.4c (RHAS 2.1 Q2) seems to be equivalent to 1.5 (Feb 18, 2003). As shown in previous update, "route" has proper entries. The same type of card in a 66Mhz slot (shared with 53c1010) initializes just fine. The failing card is sharing the PCI bus with a 53c875 (33Mhz only) SCSI Controller. Seems to be yet another timing problem with tg3 Phy initialization. Next steps: o backport v1.5->v1.6 tg3 changes and see if that works better. o start inserting MMIO reads to flush pending MMIO writes. Last time I checked, the driver had about 53 instances of "write(); udelay()" and I was only allowed to fix three of them because those are the only ones I could demonstrate caused problems at the time. ======== UPDATED INFO 15-Jul-2003 ======== Updated on: Tuesday 07/15 2003 at 14:36 196 Abe revision: 20.02 ------------------------------------------------------- UPDATE: I rmmod the tg3 driver after verifying "ping 10.20.90.69" fails. ifconfig reports "eth2" (running 33Mhz) is using 10.20.90.121. "insnmod /root/bcm5700.o" came up w/o errors: Broadcom Gigabit Ethernet Driver bcm5700 with Broadcom NIC Extension (NICE) ver. 6.2.11 (05/16/03) divert: allocating divert_blk for eth0 PCI: Found IRQ 56 for device 21:04.0 eth0: Broadcom BCM5701 found at mem 90000000, IRQ 56, node addr 00306e28176b eth0: Broadcom BCM5701 Integrated Copper transceiver found eth0: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON, NAPI ON divert: allocating divert_blk for eth1 PCI: Found IRQ 68 for device 80:01.0 eth1: Broadcom BCM5701 found at mem c0050000, IRQ 68, node addr 000476dfbe0d eth1: Agilent HDMP-1636 SerDes transceiver found eth1: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON, NAPI ON divert: allocating divert_blk for eth2 PCI: Found IRQ 92 for device e0:02.0 eth2: Broadcom BCM5701 found at mem f0010000, IRQ 92, node addr 00306e49987d eth2: Agilent HDMP-1636 SerDes transceiver found eth2: Scatter-gather ON, 64-bit DMA ON, Tx Checksum ON, Rx Checksum ON, 802.1Q VLAN ON, NAPI ON bcm5700: eth1 NIC Link is UP, 1000 Mbps full duplex bcm5700: eth2 NIC Link is UP, 1000 Mbps full duplex bcm5700: eth0 NIC Link is UP, 100 Mbps full duplex [root@ev01 root]# And then "ping" still fails with bcm5700. Like before, after a reboot, tg3 can ping 10.20.90.69. I mentioned this to my counter part at broadcom. Michael Chan <mchan> wrote back: | Since we have the rx5670 in our lab, I'm going to ask our lab guys to | reproduce this problem and then debug it. This is going to take at least a | few days. I will keep you posted. did all these tests pass on qu1? e.25? to help narrow the focus. Our internal records show this defect appears on the original "stock" AS2.1, so that would be the 2.4.18-e.12 kernel. ISSUE TRACKER 26123 OPENED AS SEV 1 -- QU3 BLOCKER FROM ISSUE TRACKER Event posted 08-25-2003 07:47pm by charline.polifka with duration of 0.00 HP Could not reproduce; NOT QU3 Blocker. Status set to: Waiting on Client (Long Term) Closing as WORKSFORME since HP can't recreate the problem. They can reopen the Bug later if it shows up again. |