Bug 30839

Summary: timeouts and hiccups with wvlan_cs (RC2 kernel)
Product: [Retired] Red Hat Linux Reporter: James Manning <jmm>
Component: kernelAssignee: Michael K. Johnson <johnsonm>
Status: CLOSED NOTABUG QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 7.1CC: notting
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-03-17 07:47:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log of random packets on the wvlan_cs card none

Description James Manning 2001-03-06 19:37:08 UTC
Server:
celeron 366, Ricoh Co Ltd RL5c476 II Cardbus bridge, wavelan gold

Socket 0:
  product info: "Lucent Technologies", "WaveLAN/IEEE", "Version 01.01", ""
  manfid: 0x0156, 0x0002
  function: 6 (network)

laptop:
pentium II 300, i82365 pcmcia, wavelan gold (same card)

email #1 to testers-list

Message-ID: <20010304204100.A21642.com>                      
     kernel 2.4.2-0.1.16 out of rawhide a day or two ago, on both machines
(i'll be trying -0.1.19 once I can transfer it :)

from laptop:

Mar  4 20:19:01 laptop kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  4 20:19:01 laptop kernel: wvlan_cs: eth0 Tx timed out! Resetting card
Mar  4 20:19:01 laptop kernel: wvlan_cs: MAC address on eth0 is 00 02 2d 09
35 46
Mar  4 20:19:01 laptop kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9
10 11
Mar  4 20:19:37 laptop kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  4 20:19:37 laptop kernel: wvlan_cs: eth0 Tx timed out! Resetting card
Mar  4 20:19:37 laptop kernel: wvlan_cs: MAC address on eth0 is 00 02 2d 09
35 46
Mar  4 20:19:37 laptop kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9
10 11
Mar  4 20:20:29 laptop kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  4 20:20:29 laptop kernel: wvlan_cs: eth0 Tx timed out! Resetting card
Mar  4 20:20:29 laptop kernel: wvlan_cs: MAC address on eth0 is 00 02 2d 09
35 46
Mar  4 20:20:29 laptop kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9
10 11
Mar  4 20:21:37 laptop kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  4 20:21:37 laptop kernel: wvlan_cs: eth0 Tx timed out! Resetting card
Mar  4 20:21:37 laptop kernel: wvlan_cs: MAC address on eth0 is 00 02 2d 09
35 46
Mar  4 20:21:37 laptop kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9
10 11

eth0      Link encap:Ethernet  HWaddr 00:02:2D:09:35:46
          inet addr:192.168.2.2  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:2dff:fe09:3546/10 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:31102 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30313 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0

eth0      IEEE 802.11-DS  ESSID:"sublogic"  Nickname:"laptop"
          Frequency:2.422GHz  Sensitivity:1/3  Mode:Ad-Hoc

          Access Point: 00:02:2D:09:35:46
          Bit Rate:2Mb/s   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link quality:0/92  Signal level:-102 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:0

from server:
Mar  4 20:19:09 bp6 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Mar  4 20:19:09 bp6 kernel: wvlan_cs: eth2 Tx timed out! Resetting card
Mar  4 20:19:09 bp6 kernel: wvlan_cs: MAC address on eth2 is 00 02 2d 09 35 39
Mar  4 20:19:09 bp6 kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11
Mar  4 20:20:13 bp6 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Mar  4 20:20:13 bp6 kernel: wvlan_cs: eth2 Tx timed out! Resetting card
Mar  4 20:20:13 bp6 kernel: wvlan_cs: MAC address on eth2 is 00 02 2d 09 35 39
Mar  4 20:20:13 bp6 kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11
Mar  4 20:20:45 bp6 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Mar  4 20:20:45 bp6 kernel: wvlan_cs: eth2 Tx timed out! Resetting card
Mar  4 20:20:45 bp6 kernel: wvlan_cs: MAC address on eth2 is 00 02 2d 09 35 39
Mar  4 20:20:45 bp6 kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11
Mar  4 20:21:53 bp6 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Mar  4 20:21:53 bp6 kernel: wvlan_cs: eth2 Tx timed out! Resetting card
Mar  4 20:21:53 bp6 kernel: wvlan_cs: MAC address on eth2 is 00 02 2d 09 35 39
Mar  4 20:21:53 bp6 kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11
Mar  4 20:22:33 bp6 kernel: NETDEV WATCHDOG: eth2: transmit timed out
Mar  4 20:22:33 bp6 kernel: wvlan_cs: eth2 Tx timed out! Resetting card
Mar  4 20:22:33 bp6 kernel: wvlan_cs: MAC address on eth2 is 00 02 2d 09 35 39
Mar  4 20:22:33 bp6 kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11


eth2      Link encap:Ethernet  HWaddr 00:02:2D:09:35:39
          inet addr:192.168.2.1  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:2dff:fe09:3539/10 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34303 errors:0 dropped:0 overruns:0 frame:0
          TX packets:35766 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0

eth2      IEEE 802.11-DS  ESSID:"sublogic"  Nickname:"server"
          Frequency:2.422GHz  Sensitivity:1/3  Mode:Ad-Hoc
          Access Point: 00:02:2D:09:35:46
          Bit Rate:2Mb/s   RTS thr:off   Fragment thr:off
          Power Management:off
          Link quality:0/92  Signal level:-102 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:1


On each reset it appears to try cycling through rates (1M, 11M, then
2M again).  I wouldn't care that much, except it works rock-solid with
the same hardware in the same mode rock-solid at 11Mbps under hours of
stressing if I boot the machines to 98.  The two machines are about 20
feet apart, with a direct line of sight between the two cards.  The resets
are more frequent (I think) with a higher load on the network.

email #2 to testers-list:

Message-ID: <20010305233158.A1771.com>                       
                   ok, with 2.4.2-0.1.19:

even holding the cards less than 1 inch apart, the rate now fluctuates
between 2Mbps (usual) and 11Mbps (about 2 or 3 times out of every 10,
when I "while true; do iwconfig | grep Rate; done"

the fluctuation is the same with or without a load on the network.

the weirdest thing is the essid and mode on the bp6 (again, same kernel
on both sides) suddenly (twice so far)

normally:
eth2      IEEE 802.11-DS  ESSID:"sublogic"  Nickname:"bp6.sublogic.com"
          Frequency:2.422GHz  Sensitivity:1/3  Mode:Ad-Hoc
          Access Point: 00:02:2D:09:35:46
          Bit Rate:2Mb/s   RTS thr:off   Fragment thr:off
          Power Management:off
          Link quality:0/92  Signal level:-102 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:0

after the random hiccup #1:
eth2      IEEE 802.11-DS  ESSID:"@mQM ^F^[K?y^B@"  Nickname:"^Y"
          Channel:0  Sensitivity:0/3  Mode:Ad-Hoc
          Access Point: FF:BF:0D:00:01:00
          Bit Rate=0kb/s   RTS thr=0 B   Fragment thr:off
          Power Management:off
          Link quality:92/92  Signal level:-11 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:0

after the random hiccup #2:
eth2      IEEE 802.11-DS  ESSID:"`iQM 8GJ?y^B@"  Nickname:"^Y"
          Channel:0  Sensitivity:0/3  Mode:Ad-Hoc
          Access Point: FF:BF:0D:00:01:00
          Bit Rate=0kb/s   RTS thr=0 B   Fragment thr:off
          Power Management:off
          Link quality:92/92  Signal level:-11 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:1

after the random hiccup #3:
eth2      IEEE 802.11-DS  ESSID:""  Nickname:"^Y"
          Channel:0  Sensitivity:0/3  Mode:Ad-Hoc
          Access Point: FF:BF:01:00:01:00
          Bit Rate=0kb/s   RTS thr=0 B   Fragment thr:off
          Power Management:off
          Link quality:92/92  Signal level:-11 dBm  Noise level:-102 dBm
          Rx invalid nwid:0  invalid crypt:0  invalid misc:0


no module options, ESSID="sublogic" and MODE="Ad-Hoc" are the only
params in wireless.opts

I'm also now noticing:

Mar  5 18:34:57 bp6 kernel: Undo loss 216.27.17.226/61605 c2 l0 ss2/3 p0
Mar  5 19:10:01 bp6 kernel: Undo loss 216.239.46.27/6067 c2 l0 ss2/65535 p0
Mar  5 19:31:48 bp6 kernel: Undo loss 216.239.46.15/10235 c2 l0 ss2/65535 p0
Mar  5 21:38:53 bp6 kernel: Undo loss 216.239.46.41/6750 c2 l0 ss2/65535 p0
Mar  5 23:18:05 bp6 kernel: Undo loss 216.239.46.168/7931 c2 l0 ss2/65535 p0
          
these appear to happen at the same time as the hiccups, which I
"service pcmcia restart" to recover from (although i might be able
to iwconfig, never tried)

anyone know a kernel/pcmcia-cs/hotplug/whatever combo that works
in or around RC2 with wavelan gold cards?

Comment 1 Arjan van de Ven 2001-03-06 19:48:05 UTC
Did any of this work on 0.1.9 or earlier ?

Comment 2 Bill Nottingham 2001-03-06 21:07:20 UTC
I've seen this a couple of times, but it's not very reproducible for
me (this is with the kernel config.) I suppose the driver could be
doing something bad that interacts poorly with zerocopy.

What I've noticed is that when these timeout happen, the entire
machine freezes for the period of the timeout (e.g., interrupts
are disabled.)  One way to get the machine out of this state, if
you don't feel like waiting for the watchdog, is to just eject
the card. :)


Comment 3 James Manning 2001-03-06 22:21:47 UTC
1) I'm going to check with 0.1.9 as soon as I can get home and play with it
tonight

2) to take the cardbus bridge out of the equation, I'd love to try an ad-hoc
between laptops.
    Anyone at RH's 2600 Meridian location willing to do lunch wed/thurs at
Sarah's Empanada's? :)

3) I saw the "freezes" under 2.2.17 (a freeze that would go away once the card
was ejected), but
    only on the celeron (using the cardbus bridge).  I haven't seen that again
since going to 2.4.x
    on the celeron

Comment 4 James Manning 2001-03-07 07:39:46 UTC
same behavior with 2.4.1-0.1.9:

Mar  7 02:39:18 laptop kernel: NETDEV WATCHDOG: eth0: transmit timed out
Mar  7 02:39:18 laptop kernel: wvlan_cs: eth0 Tx timed out! Resetting card
Mar  7 02:39:18 laptop kernel: wvlan_cs: MAC address on eth0 is 00 02 2d 09 35 46 
Mar  7 02:39:18 laptop kernel: wvlan_cs: Valid channels: 1 2 3 4 5 6 7 8 9 10 11

Comment 5 Michael K. Johnson 2001-03-12 19:32:26 UTC
Speaking of Sarah's, did you buy these cards and bridge next door?  Are
they Buffalo cards?

Comment 6 James Manning 2001-03-12 19:56:36 UTC
nope, all on-line.

bridge: http://www.amtron.com/reader/pcdrp202e.htm
cards:  http://www.cdw.com/shop/products/default.asp?EDC=202683

Funny enough, i just got back from Sarah's for lunch :)

mmmmmmm empanadas ....

Comment 7 Michael K. Johnson 2001-03-13 04:48:28 UTC
I have seen these messages with Buffalo cards (re-badged lucent) and
the Ricoh bridge with the 2.2 kernel, but not with the 2.4 kernel.

Comment 8 James Manning 2001-03-15 01:32:07 UTC
Created attachment 12683 [details]
log of random packets on the wvlan_cs card

Comment 9 James Manning 2001-03-15 01:34:10 UTC
the attached log is a tcpdump of packets I noticed.  The activity light on
the cards stays fairly constant even with nothing going on.  From the
laptop, I did a tcpdump and saw nothing.  I put it into promisc and
saw the packets in the log I attached.  I'll go ahead and try 0.1.28
on both ends, though.

Comment 10 James Manning 2001-03-15 02:02:00 UTC
trying on 2.4.2-0.1.28 from rawhide, the activity light still acts
the same, but now even in promisc mode, tcpdump shows nothing! weird.

Comment 11 Michael K. Johnson 2001-03-15 17:38:21 UTC
Could you try iwconfiging the card instead of re-starting pcmcia
when this happens and see if that also correctly re-inits the card?

Comment 12 James Manning 2001-03-15 17:46:33 UTC
what do you mean by "when this happens"?

On the laptop, I just get the eth timeouts, and it reinits the card and
keeps going just fine.  On the celeron (same card, through the Ricoh cardbus
bridge using yenta_socket), I have to keep a loop that iwconfig's to set the
essid (and only the essid) every 10 seconds, since it goes to garbage on some
of the timeouts (not all).

Recap: 
 - nothing but waiting for the timeout on the laptop, resets fine.
 - celeron needs the occasional iwconfig to reset essid from garbage.

Comment 13 James Manning 2001-03-17 07:47:29 UTC
OMFG!

Based on a few entries from the forum over at pcmcia-cs on sourceforge,
I hit wavelan.com and got the update for the firmware.  I upgraded the
firmware on both cards from 6.06 to 6.16, and it's like night and day!

I've had the cards going solid for about an hour now, with nothing
even close to a problem, aside from some kernel messages:

kernel: Undo loss 192.168.2.2/1079 c2 l0 ss2/65535 p0
last message repeated 2 times

Admittedly, that was under load testing and from what I've heard is
probably an issues elsewhere given how many things have collapsed under
load in early 2.4 kernels.

Sorry it took so long to try the firmware upgrade, but I can only find it
from wavelan.com, and only as a win 95/98/me/nt/2k program :(

Comment 14 Arjan van de Ven 2001-03-17 11:06:33 UTC
Thank you for this information!

kernel: Undo loss 192.168.2.2/1079 c2 l0 ss2/65535 p0
is a harmless debugging message which have now turned off.

Since this problem seemed to be a firmware issue, I'm closing this bug
as "NOTABUG" (rather NOTOURBUG but that does not exist :)
If you object to that, please reopen this bug.

Comment 15 James Manning 2001-03-17 16:49:16 UTC
NOTABUG is certainly fine by me... is it acceptable to open this up outside
the beta program?  I ask mainly because I'd like to make it as easy as possible
for others to find this bug and try a firmware upgrade before bothering
an already-overloaded kernel team at Red Hat. :)

I'm going to go ahead and post in the forum of pcmcia-cs at sourceforge about
my success, but anything that helps get less bugs like this filed is hopefully
a Good Thing (tm)

If it's deemed *not* acceptable to kill off the beta-only checkbox from this
bug, do you want me to open a similar public bug, mark this a dup, and paste
at least the initial behavior and solution?

Whatever works best for you is fine by me... I feel pretty bad having
wasted time of Bill, Michael *and* Arjan over a NOTABUG :)

Comment 16 Arjan van de Ven 2001-03-17 16:56:33 UTC
I will try to get this added to the knowledgebase;
That is the place for such things.

Comment 17 James Manning 2001-11-26 17:12:09 UTC
Since this is ancient and knowledge-base fodder, 
i'm gonna try to remove the beta-only toggle