Bug 214717
Summary: | NETDEV WATCHDOG: peth0: transmit timed out -> no network conectivity | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Andy Bailey <andy> | ||||||||||||||||||
Component: | kernel-xen | Assignee: | Herbert Xu <herbert.xu> | ||||||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||
Priority: | medium | ||||||||||||||||||||
Version: | 6 | CC: | xen-maint | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2008-02-26 23:40:48 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Attachments: |
|
Description
Andy Bailey
2006-11-08 23:15:05 UTC
Nov 8 16:15:13 haz kernel: SoftMAC: ASSERTION FAILED (0) at: net/ieee80211/softmac/ieee80211softmac_wx.c:306:ieee80211softmac_wx_get_rate() Nov 8 16:15:14 haz last message repeated 2 times Nov 8 16:57:34 haz kernel: NETDEV WATCHDOG: peth0: transmit timed out Nov 8 16:57:37 haz kernel: peth0: link up, 100Mbps, full-duplex, lpa 0x41E1 Nov 8 16:57:37 haz kernel: peth0: Promiscuous mode enabled. Nov 8 16:57:49 haz kernel: NETDEV WATCHDOG: peth0: transmit timed out Nov 8 16:57:52 haz kernel: peth0: link up, 100Mbps, full-duplex, lpa 0x41E1 Nov 8 16:57:52 haz kernel: peth0: Promiscuous mode enabled. Nov 8 16:58:04 haz kernel: NETDEV WATCHDOG: peth0: transmit timed out certainly looks like a kernel problem. And networking problems related to Xen usually occur at boot time. Can you reproduce this on non-xen kernels too? I have had fc5 installed on the same machine and never had a network problem like this. Today I installed the non xen kernel and have been connected for over an hour and a half without a problem. with the same 2 programs running pup and wireshark. So the answer to your question is no. However I did a google for "peth0: transmit timed out" and found http://lists.xensource.com/archives/html/xen-users/2006-05/msg00753.html http://lists.xensource.com/archives/cgi-bin/mesg.cgi?a=xen- users&i=44759305.30301%40freemail.hu http://www.linuxquestions.org/questions/showthread.php?p=2137801 http://blog.gmane.org/gmane.comp.emulators.xen.user/page=1? set_skin=leftmenu Not sure if they are related Andy OK, but how often were you having the trouble under Xen? The peth0: isn't my main concern; I'd *expect* the driver to show further problems after an initial assert failure. It's the "SoftMAC: ASSERTION FAILED" that we need to get to the bottom of first. I just installed fc6 yesterday and it happened several times, in fact every time the system was up for a reasonable period of time. Would it have anything to do with a Broadcom wireless card that was present but not configured (it uses the bcwl43xxx driver if I remember rightly). I thought it wouldnt so I didnt send the whole ifconfig. It seems strange that the MAC assertion happened almost 45 minutes before the problem presented itself. What information do you need? Please start by attaching the full dmesg output to show all kernel messages since boot. Thanks. Created attachment 141298 [details]
The dmesg output
I upgraded to the latest xen kernel available via yum
uname -a
Linux haz.hazlorealidad.com 2.6.18-1.2849.fc6xen #1 SMP Fri Nov 10 12:57:36 EST
2006 x86_64 x86_64 x86_64 GNU/Linux
The same problem happened again, this time 20 minutes after boot
the network was working fine up to that point
uptime
20:13:33 up 20 min, 1 user, load average: 0.15, 0.18, 0.23
Forgot to add that I used neat to down eth0 and then up it. Andy, could you email me a copy of the dmesg? I'm having problems downloading it from Bugzilla. Thakns. Nevermind I've got the dmesg now. Thanks for the dmesg. Could you please get a dmesg from a baremetal kernel as well for comparison? Created attachment 141409 [details]
The dmesg for the standard kernel
uname -a for this kernel gives
Linux haz.hazlorealidad.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10 12:34:46 EST
2006 x86_64 x86_64 x86_64 GNU/Linux
When the problem occurs can you check /proc/interrupts to see whether you're getting any more interrupts on the IRQ for your NIC? Also, are there any messages in "xm dmesg" when it happens? Thanks. Created attachment 143006 [details]
Dmesg
I've attached my dmesg to this bug report because I believe I am experiencing similar issues. 1. After booting into FC6 Xen kernel, the network will function for 60 seconds to 30 minutes, then hangs. 2. Pinging from outside the box fails; pinging from the box to the outside fails. 3. Restarting the network does not fix the issue. 4. rmmod'ing the forcedeth NIC module and modprobe'ing it does not fix the issue 5. Interrupts to appear to stop incrementing when the network hangs. 6. Nothing but a reboot appears to bring the network back. I'll attach some additional information that may be useful. Created attachment 143007 [details]
Output of lspci -vvv
Created attachment 143008 [details]
Output of ifconfig
Created attachment 143009 [details]
Output of lsmod
Created attachment 143010 [details]
xend.log
I turned off tx crc's on *ETH0, not *PETH0. PETH0 said the operation was not supported on that device. I'll follow-up here to let you know if it still locks up after some period of time. Below is my xm dmesg after the hang. I don't see anything obvious: __ __ _____ ___ _____ ____ __ __ \ \/ /___ _ __ |___ / / _ \ |___ / _ __ ___| ___| / _| ___ / /_ \ // _ \ '_ \ |_ \| | | | |_ \ __| '__/ __|___ \ | |_ / __| '_ \ / \ __/ | | | ___) | |_| | ___) |__| | | (__ ___) || _| (__| (_) | /_/\_\___|_| |_| |____(_)___(_)____/ |_| \___|____(_)_| \___|\___/ http://www.cl.cam.ac.uk/netos/xen University of Cambridge Computer Laboratory Xen version 3.0.3-rc5-1.2849.fc6 (brewbuilder.com) (gcc version 4. 1.1 20061011 (Red Hat 4.1.1-30)) Fri Nov 10 12:30:42 EST 2006 Latest ChangeSet: unavailable (XEN) Command line: /xen.gz-2.6.18-1.2849.fc6 (XEN) Physical RAM map: (XEN) 0000000000000000 - 000000000009fc00 (usable) (XEN) 000000000009fc00 - 00000000000a0000 (reserved) (XEN) 00000000000e7000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 000000003ffc0000 (usable) (XEN) 000000003ffc0000 - 000000003ffd0000 (ACPI data) (XEN) 000000003ffd0000 - 0000000040000000 (ACPI NVS) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff7c0000 - 0000000100000000 (reserved) (XEN) System RAM: 1023MB (1047932kB) (XEN) Xen heap: 14MB (14420kB) (XEN) found SMP MP-table at 000ff780 (XEN) DMI 2.3 present. (XEN) Using APIC driver default (XEN) ACPI: RSDP (v000 ACPIAM ) @ 0x00000000000fa 6d0 (XEN) ACPI: RSDT (v001 A M I OEMRSDT 0x08000431 MSFT 0x00000097) @ 0x000000003 ffc0000 (XEN) ACPI: FADT (v002 A M I OEMFACP 0x08000431 MSFT 0x00000097) @ 0x000000003 ffc0200 (XEN) ACPI: MADT (v001 A M I OEMAPIC 0x08000431 MSFT 0x00000097) @ 0x000000003 ffc0390 (XEN) ACPI: OEMB (v001 A M I OEMBIOS 0x08000431 MSFT 0x00000097) @ 0x000000003 ffd0040 (XEN) ACPI: DSDT (v001 N8XLA N8XLA308 0x00000308 INTL 0x02002026) @ 0x000000000 0000000 (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) Processor #0 15:4 APIC version 16 (XEN) ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ9 used by override. (XEN) ACPI: IRQ14 used by override. (XEN) ACPI: IRQ15 used by override. (XEN) Enabling APIC mode: Flat. Using 1 I/O APICs (XEN) Using ACPI (MADT) for SMP configuration information (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Initializing CPU#0 (XEN) Detected 2194.523 MHz processor. (XEN) CPU0: AMD Flush Filter disabled (XEN) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) (XEN) CPU: L2 Cache: 1024K (64 bytes/line) (XEN) Intel machine check architecture supported. (XEN) Intel machine check reporting enabled on CPU#0. (XEN) CPU0: AMD Athlon(tm) 64 Processor 3400+ stepping 08 (XEN) Total of 1 processors activated. (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) ..MP-BIOS bug: 8254 timer not connected to IO-APIC (XEN) ...trying to set up timer (IRQ0) through the 8259A ... failed. (XEN) ...trying to set up timer as Virtual Wire IRQ... failed. (XEN) ...trying to set up timer as ExtINT IRQ...spurious 8259A interrupt: IRQ7. (XEN) works. (XEN) Platform timer is 1.193MHz PIT (XEN) Brought up 1 CPUs (XEN) Machine check exception polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) Domain 0 kernel supports features = { 0000001f }. (XEN) Domain 0 kernel requires features = { 00000000 }. (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000003000000->0000000004000000 (234879 pages to be all ocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80200000->ffffffff806b8264 (XEN) Init. ramdisk: ffffffff806b9000->ffffffff80c7ca00 (XEN) Phys-Mach map: ffffffff80c7d000->ffffffff80e4fbf8 (XEN) Start info: ffffffff80e50000->ffffffff80e5049c (XEN) Page tables: ffffffff80e51000->ffffffff80e5c000 (XEN) Boot stack: ffffffff80e5c000->ffffffff80e5d000 (XEN) TOTAL: ffffffff80000000->ffffffff81000000 (XEN) ENTRY ADDRESS: ffffffff80200000 (XEN) Dom0 has maximum 1 VCPUs (XEN) Initrd len 0x5c3a00, start at 0xffffffff806b9000 (XEN) Scrubbing Free RAM: ...........done. (XEN) Xen trace buffers: disabled (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen ). (XEN) (file=io_apic.c, line=2085) (XEN) ioapic_guest_write: apic=0, pin=2, old_irq=-1, new_irq=0 (XEN) ioapic_guest_write: old_entry=00010000, new_entry=000009f0 (XEN) ioapic_guest_write: Installing bogus unmasked IO-APIC entry! Just an update; As I mentioned before I booted FC6 xen kernel and as soon as I could I typed "/usr/sbin/ethtool -K eth0 tx off". The network stayed alive for 27 minutes during which time I was able to get roughly 763 pings to my local router. Restarting the network seems to do nothing. Interrupts on peth0 are static at 171755. Neither dmesg nor xm dmesg show anything different from above. Interestingly enough when I "/sbin/rmmod forcedeth" and then "/sbin/modprobe forcedeth", dmesg shows the module attaching to device "ETH1" instead of "ETH0". Running dhclient on eth1 ("/sbin/dhclient eth1") does not yield an IP address. Rather it simply fails to get a DHCPOFFER. xm dmesg shows nothing different at this point. What other information can I provide? Hi Micah, please use #218733 for your problem. Created attachment 143250 [details]
assorted files
Shortly before the network locked up completely, I had a problem with a NFS
mounted filesystem, I was using mplayer to play some music at first it worked
fine then it would only play 2 seconds or so and stop, I tried it on several
mp3 and the same each time.
The contents of the zip are
Length Date Time Name
-------- ---- ---- ----
2166 12-10-06 12:12 ifconfig
5141 12-10-06 12:10 xm-dmesg
74 12-10-06 12:27 ethtool-i_peth0
847 12-10-06 12:10 int1
847 12-10-06 12:10 int2
37043 12-10-06 12:13 messages
63 12-10-06 12:11 uptime
26796 12-10-06 12:14 dmesg
60 12-10-06 12:30 ethtool-K_peth0_tx_off
320 12-10-06 12:28 ethtool-k_peth0
847 12-10-06 12:13 int3
14431 12-10-06 12:17 lspci-vvv
3313 12-10-06 12:17 lsmod
116 12-10-06 12:11 uname-a
-------- -------
92064 14 files
The only files with the filenames not self explanatory are int1,2,3 these are a
snapshot of /proc/interrupts at different times.
I rebooted and tried
ethtool-K eth0 tx off
and have been up for 60 minutes with no network problem yet (a record) however
yum update is stuck (I dont think its related but who knows)
It does seem to be reproducible with the xen kernel it has happened every time
so far.
I have never had a network problem with the non zen kernel.
Let me know what more information is needed
Andy Bailey
Hi Andy, could you please try disabling xend while running the Xen kernel to see if the problem persists? If it does please attach the dmesg for it. Thanks. change QA contact This report targets FC6, which is now end-of-life. Please re-test against Fedora 7 or later, and if the issue persists, open a new bug. Thanks |