Bug 1022733 - Kernel 3.11 hangs on boot with VIA Velocity network adapters
Kernel 3.11 hangs on boot with VIA Velocity network adapters
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
19
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
AcceptedFreezeException
:
Depends On:
Blocks: F20FinalFreezeException
  Show dependency treegraph
 
Reported: 2013-10-23 17:14 EDT by Juha Heljoranta
Modified: 2013-12-10 01:14 EST (History)
11 users (show)

See Also:
Fixed In Version: kernel-3.11.10-100.fc18
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-04 11:49:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Photo of kernel stack trace (29.39 KB, image/png)
2013-10-23 17:27 EDT, Juha Heljoranta
no flags Details
Revised patch (1.15 KB, patch)
2013-11-15 03:09 EST, Michele Baldessari
no flags Details | Diff

  None (edit)
Description Juha Heljoranta 2013-10-23 17:14:53 EDT
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Juha Heljoranta 2013-10-23 17:23:08 EDT
Description of problem:
Regression. Kernel 3.11 hangs almost immediately after login prompt appears. 3.10 works fine. 

Version-Release number of selected component:
kernel-3.11.6-200.fc19.x86_64

How reproducible:
Every time I boot with 3.11 kernel.

Actual results:
System hang. See attached screen shot.

Expected results:
Kernel booting normally.

Additional info:
I suspect network adapter (via_velocity) has something to do with it:

02:00.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter [1106:3119] (rev 82)
03:00.0 Ethernet controller [0200]: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter [1106:3119] (rev 82)
Comment 2 Juha Heljoranta 2013-10-23 17:27:02 EDT
Created attachment 815568 [details]
Photo of kernel stack trace

Kernel was booted without rhgb and quiet options.
Comment 4 Alex A. Schmidt 2013-10-24 13:23:59 EDT
I have been experiencing this very same bug since I upgraded to fc19.i686, i.e,
started to use kernel 3.11. I am able to boot just fine since my VIA_VELOCITY ethernet card is a secondary one but the system freezes out as soon as the card is put into use -- a ping to its address is enough -- and it is so fast that I don't even get crash messages...

Linux haar 3.11.4-201.fc19.i686 #1 SMP Thu Oct 10 14:59:49 UTC 2013 i686 i686 i386 GNU/Linux

AMD Athlon(tm) II X4 620 Processor

05:00.0 Ethernet controller: VIA Technologies, Inc. VT6120/VT6121/VT6122 Gigabit Ethernet Adapter (rev 82)
Comment 5 Michele Baldessari 2013-11-10 18:40:49 EST
Alex, Juha,

could you try the ML posted patch and confirm that it fixes the issue for you
as well?

diff --git a/drivers/net/ethernet/via/via-velocity.c b/drivers/net/ethernet/via/via-velocity.c
index d022bf9..64c42be 100644
--- a/drivers/net/ethernet/via/via-velocity.c
+++ b/drivers/net/ethernet/via/via-velocity.c
@@ -2172,16 +2172,13 @@ static int velocity_poll(struct napi_struct *napi, int budget)
        unsigned int rx_done;
        unsigned long flags;

-       spin_lock_irqsave(&vptr->lock, flags);
        /*
         * Do rx and tx twice for performance (taken from the VIA
         * out-of-tree driver).
         */
-       rx_done = velocity_rx_srv(vptr, budget / 2);
-       velocity_tx_srv(vptr);
-       rx_done += velocity_rx_srv(vptr, budget - rx_done);
+       rx_done = velocity_rx_srv(vptr, budget);
+       spin_lock_irqsave(&vptr->lock, flags);
        velocity_tx_srv(vptr);
-
        /* If budget not fully consumed, exit the polling mode */
        if (rx_done < budget) {
                napi_complete(napi);

Once you confirm I'll ping Francois/netdev again.

Thanks,
Michele
Comment 6 Alex A. Schmidt 2013-11-11 18:25:37 EST
The proposed fix corrected the problem for me!!

Thanks Michele.

Alex
Comment 7 Michele Baldessari 2013-11-11 18:47:38 EST
Thanks for confirming Alex, I've pinged Francois again to push it upstream.
Comment 8 Michele Baldessari 2013-11-14 04:17:45 EST
I've exchanged mails with Francois and he will push it upstream. I'll put a note here once it hits the net tree or Linus' tree.

Thanks for testing Alex,
Michele
Comment 9 Michele Baldessari 2013-11-15 03:09:01 EST
Created attachment 824353 [details]
Revised patch

Hi Alex & Juha,

could you please test the new revised patch from Francois. This one should 
be safe against MTU changes, whereas the previous one was not. 

If you could test it like the following, that'd be great:
- run some netperf/iperf
- during the above network load change the MTU to a few values in a loop

Let me know if it works for you or if there are any issues.

Thanks again,
Michele
Comment 10 Alex A. Schmidt 2013-11-18 14:09:31 EST
(In reply to Michele Baldessari from comment #9)
> Created attachment 824353 [details]
> Revised patch
> 
> Hi Alex & Juha,
> 
> could you please test the new revised patch from Francois. This one should 
> be safe against MTU changes, whereas the previous one was not. 
> 
> If you could test it like the following, that'd be great:
> - run some netperf/iperf
> - during the above network load change the MTU to a few values in a loop
> 
> Let me know if it works for you or if there are any issues.
> 
> Thanks again,
> Michele

Hello Michelle,

Thanks for the new patch. The driver is working just as well but I am not sure the result regarding MTU change is as expected. What follows is an iperf chitchat between a client and a server, both using the VIA ethernet card and the patched via_veleocity driver (on a 3.11.7-200.fc19.i866 kernel). The chitchat goes on OK until I change the MTU on the server side from 1500 to, say, 3000 (with "ifconfig p4p1 mtu 3000 up"). From that moment onwards the server (or it would be the client?) misses all the messages as can be seen bellow. Is that OK? 

But things get worse if I change the MTU on the client side during the chitchat. The iperf output is held back forevever and never resumes...
Gladly, a ctrl-c can interrupt iperf just fine and no error messages are shown.

Regards,

Alex 

iperf client/server chitchat: ------------------------------------------------

server side:

harten|~> [  9] local 192.168.1.105 port 5001 connected with 192.168.1.102 port 43296
[  9]  0.0-3288.1 sec  79.2 MBytes   202 Kbits/sec
[  9] MSS size 1448 bytes (MTU 1500 bytes, ethernet)


client side: 

mallat|~> iperf -c lc_harten -P 1 -i 1 -f m -t 20
------------------------------------------------------------
Client connecting to lc_harten, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.102 port 43296 connected with 192.168.1.105 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  11.4 MBytes  95.4 Mbits/sec
[  3]  1.0- 2.0 sec  11.1 MBytes  93.3 Mbits/sec
[  3]  2.0- 3.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  3.0- 4.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  4.0- 5.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  5.0- 6.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  6.0- 7.0 sec  11.0 MBytes  92.3 Mbits/sec
[  3]  7.0- 8.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  8.0- 9.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  9.0-10.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 10.0-11.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 11.0-12.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 12.0-13.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 13.0-14.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 14.0-15.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 15.0-16.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 16.0-17.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 17.0-18.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 18.0-19.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 19.0-20.0 sec  0.62 MBytes  5.24 Mbits/sec
[  3]  0.0-20.0 sec  79.2 MBytes  33.2 Mbits/sec
Comment 11 Michele Baldessari 2013-11-18 14:37:52 EST
Hi Alex,

a couple of questions:
- did the previous version of the patch also exhibit this behaviour?
- if both client and server have mtu set to 3000 things work correctly yes? 
  (i.e. it is only the mtu change itself that breaks things)

I'm assuming here that client and server are connected through a medium
that supports L2 frames > 1500, yes?

Thanks,
Michele
Comment 12 Alex A. Schmidt 2013-11-22 14:10:01 EST
(In reply to Michele Baldessari from comment #11)
> Hi Alex,
> 
> a couple of questions:
> - did the previous version of the patch also exhibit this behaviour?
> - if both client and server have mtu set to 3000 things work correctly yes? 
>   (i.e. it is only the mtu change itself that breaks things)
> 
> I'm assuming here that client and server are connected through a medium
> that supports L2 frames > 1500, yes?
> 
> Thanks,
> Michele

Hello Michele,

I apologise for taking so long to answer...

I use these VIA cards as a secondary network between some machines in our lab.
I am not sure about the L2 frames support in this intranet since I will have to look around in the building to find out which switch the cables go to... So, I guess NO to L2 frames support and have made further tests with MTU <= 1500. And this is what I got:

a) both previous and current patches behave in the same way:
   
- there seems to a be a limit in the workable size of MTU on the server. For instance, the iperf would work with 1600 but not with 1700... But I think this is related to the L2 support. More about this below.

- iperf chitchat works with any MTU starting value <= 1500 (on both client and server); an MTU change  -- on either the client or the server -- breaks the message exchange for a while but it is resumed later as can be seen bellow:

mallat|~> iperf -c lc_harten -P 1 -i 1 -f m -t 20
------------------------------------------------------------
Client connecting to lc_harten, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.102 port 43329 connected with 192.168.1.105 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  11.4 MBytes  95.4 Mbits/sec
[  3]  1.0- 2.0 sec  11.1 MBytes  93.3 Mbits/sec
[  3]  2.0- 3.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  3.0- 4.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  4.0- 5.0 sec  10.8 MBytes  90.2 Mbits/sec
[  3]  5.0- 6.0 sec  0.00 MBytes  0.00 Mbits/sec <<<< MTU change from 1500 to 1000 on the server
[  3]  6.0- 7.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  7.0- 8.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  8.0- 9.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  9.0-10.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 10.0-11.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 11.0-12.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 12.0-13.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 13.0-14.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 14.0-15.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 15.0-16.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 16.0-17.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 17.0-18.0 sec  0.88 MBytes  7.34 Mbits/sec
[  3] 18.0-19.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3] 19.0-20.0 sec  11.1 MBytes  93.3 Mbits/sec
[  3]  0.0-20.0 sec  79.1 MBytes  33.2 Mbits/sec

mallat|~> iperf -c lc_harten -P 1 -i 1 -f m -t 20
------------------------------------------------------------
Client connecting to lc_harten, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.102 port 43330 connected with 192.168.1.105 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  11.4 MBytes  95.4 Mbits/sec
[  3]  1.0- 2.0 sec  11.1 MBytes  93.3 Mbits/sec
[  3]  2.0- 3.0 sec  11.2 MBytes  94.4 Mbits/sec
[  3]  3.0- 4.0 sec  5.12 MBytes  43.0 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 MBytes  0.00 Mbits/sec  <<<< MTU change from 1500  to 1000 on the client
[  3]  5.0- 6.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  6.0- 7.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  7.0- 8.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  8.0- 9.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  9.0-10.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 10.0-11.0 sec  2.62 MBytes  22.0 Mbits/sec
[  3] 11.0-12.0 sec  10.8 MBytes  90.2 Mbits/sec
[  3] 12.0-13.0 sec  11.0 MBytes  92.3 Mbits/sec
[  3] 13.0-14.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 14.0-15.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 15.0-16.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 16.0-17.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 17.0-18.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 18.0-19.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3] 19.0-20.0 sec  10.9 MBytes  91.2 Mbits/sec
[  3]  0.0-20.0 sec   140 MBytes  58.5 Mbits/sec

I connected the two VIA cards directly using a cat 5e ethernet cable and repeated the tests with L2 frame support issues involved. The iperf chitchat would now occur for MTU > 1500 (up to 9000), the bandwith was much greater as expected, but the behaviour regarding MTU change during the chitchat was exactly as before:

mallat|.../ethernet/via> iperf -c lc_harten -P 1 -i 1 -f m -t 20 
-----------------------------------------------------------
Client connecting to lc_harten, TCP port 5001
TCP window size: 0.06 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.102 port 35426 connected with 192.168.1.105 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  66.6 MBytes   559 Mbits/sec
[  3]  1.0- 2.0 sec  66.2 MBytes   556 Mbits/sec
[  3]  2.0- 3.0 sec  66.1 MBytes   555 Mbits/sec
[  3]  3.0- 4.0 sec  66.4 MBytes   557 Mbits/sec
[  3]  4.0- 5.0 sec  66.4 MBytes   557 Mbits/sec
[  3]  5.0- 6.0 sec  66.1 MBytes   555 Mbits/sec
[  3]  6.0- 7.0 sec  66.4 MBytes   557 Mbits/sec
[  3]  7.0- 8.0 sec  66.2 MBytes   556 Mbits/sec
[  3]  8.0- 9.0 sec  66.4 MBytes   557 Mbits/sec
[  3]  9.0-10.0 sec  66.2 MBytes   556 Mbits/sec
[  3] 10.0-11.0 sec  19.4 MBytes   163 Mbits/sec
[  3] 11.0-12.0 sec  0.00 MBytes  0.00 Mbits/sec <<<MTU change from 9000 to 3000 on the server
[  3] 12.0-13.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 13.0-14.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 14.0-15.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 15.0-16.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3] 16.0-17.0 sec  28.0 MBytes   235 Mbits/sec
[  3] 17.0-18.0 sec  66.2 MBytes   556 Mbits/sec
[  3] 18.0-19.0 sec  66.1 MBytes   555 Mbits/sec
[  3] 19.0-20.0 sec  66.1 MBytes   555 Mbits/sec
[  3]  0.0-20.0 sec   909 MBytes   381 Mbits/sec


mallat|.../ethernet/via> iperf -c lc_harten -P 1 -i 1 -f m -t 20 
------------------------------------------------------------
Client connecting to lc_harten, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[  3] local 192.168.1.102 port 35427 connected with 192.168.1.105 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  93.5 MBytes   784 Mbits/sec
[  3]  1.0- 2.0 sec  93.2 MBytes   782 Mbits/sec
[  3]  2.0- 3.0 sec  93.2 MBytes   782 Mbits/sec
[  3]  3.0- 4.0 sec  93.2 MBytes   782 Mbits/sec
[  3]  4.0- 5.0 sec  55.8 MBytes   468 Mbits/sec
[  3]  5.0- 6.0 sec  0.00 MBytes  0.00 Mbits/sec <<< MTU change from 9000 to 3000 on the client
[  3]  6.0- 7.0 sec  0.00 MBytes  0.00 Mbits/sec
[  3]  7.0- 8.0 sec  28.8 MBytes   241 Mbits/sec
[  3]  8.0- 9.0 sec  93.1 MBytes   781 Mbits/sec
[  3]  9.0-10.0 sec  93.4 MBytes   783 Mbits/sec
[  3] 10.0-11.0 sec  93.1 MBytes   781 Mbits/sec
[  3] 11.0-12.0 sec  93.1 MBytes   781 Mbits/sec
[  3] 12.0-13.0 sec  93.4 MBytes   783 Mbits/sec
[  3] 13.0-14.0 sec  93.1 MBytes   781 Mbits/sec
[  3] 14.0-15.0 sec  93.4 MBytes   783 Mbits/sec
[  3] 15.0-16.0 sec  93.4 MBytes   783 Mbits/sec
[  3] 16.0-17.0 sec  93.2 MBytes   782 Mbits/sec
[  3] 17.0-18.0 sec  93.4 MBytes   783 Mbits/sec
[  3] 18.0-19.0 sec  90.4 MBytes   758 Mbits/sec
[  3] 19.0-20.0 sec  92.4 MBytes   775 Mbits/sec
[  3]  0.0-20.0 sec  1573 MBytes   660 Mbits/sec

Regards,

Alex
Comment 13 Michele Baldessari 2013-11-23 13:24:38 EST
Hi Alex,

thanks for your tests. Well as long as the traffic eventually starts again, a certain amount of downtime after an MTU can be reasonable depending on what
the driver needs to do to do.

I'll tell Francois that this patch works (equally) well as the other one so
we can get this upstream.

Thanks,
Michele
Comment 14 Michele Baldessari 2013-11-26 05:28:04 EST
Patch has been submitted upstream and has been asked to be included stable 3.11.x and 3.12.y.
Comment 15 Josh Boyer 2013-11-26 14:03:03 EST
I've added the patch.  It isn't going to make 3.11.10, and that's the last 3.11 stable release.  F20 is shipping with that, so we need this as a patch there anyway.

Proposing as an F20 freeze exception.
Comment 16 Adam Williamson 2013-11-26 14:29:08 EST
+1 FE. Hanging on boot is bad, yo.
Comment 17 Mike Ruckman 2013-11-26 14:35:13 EST
+1 FE.
Comment 18 Adam Williamson 2013-11-27 14:25:57 EST
Discussed at 2013-11-27 freeze exception review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-11-27/f20-blocker-review-3.2013-11-27-17.01.log.txt . Accepted as a freeze exception issue; hanging on boot is clearly not good, and can't be cleanly fixed post-release.
Comment 19 Fedora Update System 2013-11-30 09:11:43 EST
kernel-3.11.10-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.10-300.fc20
Comment 20 Fedora Update System 2013-12-01 12:41:35 EST
Package kernel-3.11.10-300.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.10-300.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-22531/kernel-3.11.10-300.fc20
then log in and leave karma (feedback).
Comment 21 Fedora Update System 2013-12-02 23:43:26 EST
kernel-3.11.10-200.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.10-200.fc19
Comment 22 Fedora Update System 2013-12-02 23:45:04 EST
kernel-3.11.10-100.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.11.10-100.fc18
Comment 23 Fedora Update System 2013-12-04 11:49:34 EST
kernel-3.11.10-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 24 Fedora Update System 2013-12-07 01:57:39 EST
kernel-3.11.10-200.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 25 Fedora Update System 2013-12-10 00:27:04 EST
kernel-3.11.10-100.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 26 Fedora Update System 2013-12-10 01:14:15 EST
kernel-3.11.10-100.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.