Bug 1026359
Summary: | Unstable link speed with e1000e module | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | John Greene <jogreene> | ||||||
Component: | kernel | Assignee: | John Greene <jogreene> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 20 | CC: | anton, asn, cwawak, dnelson, gansalmon, itamar, jcall, jeder, jogreene, jonathan, jskarvad, kernel-maint, madhu.chinakonda | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-05-15 16:33:15 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
John Greene
2013-11-04 13:53:50 UTC
John, Attach the whole dmesg log please. It looks like your link is staying up for long periods of time then flapping once. That kind of suggests that perhaps some local system event is predicating the link going down. It might be caused by the link parter too of course, but its easier to start looking on the local end. Given you've eliminated cables, switches and aspm now, I am wondering: at locations where this others may share the router setup and possibly have identical hardware (T520 with same OS even) does anyone else report this problem on same hardware? That would point away from your particular machine and eliminate your laptop being "bad" a bit. Have you heard of this problem with co-workers similarly situated? I am encountering the same behavior, and I believe I tracked it down to the "powersave" tuned profile. I typically use the "virtual-host" profile while plugged in, but to reproduce, I have switched my laptop to "powersave", and hope to trigger this behavior soon. It typically takes an hour or so before I notice the instability. I'll upload a dmesg once I trigger it. .. that didn't take long.
Nov 4 10:42:27 localhost gnome-session[1766]: [3243:3243:1104/104227:ERROR:vsync_provider.cc(70)] glXGetSyncValuesOML should not return TRUE with a media stream counter of 0.
Nov 4 10:52:27 localhost gnome-session[1766]: [3243:3243:1104/105227:ERROR:vsync_provider.cc(70)] glXGetSyncValuesOML should not return TRUE with a media stream counter of 0.
Nov 4 11:02:28 localhost gnome-session[1766]: [3243:3243:1104/110228:ERROR:vsync_provider.cc(70)] glXGetSyncValuesOML should not return TRUE with a media stream counter of 0.
Nov 4 11:12:28 localhost gnome-session[1766]: [3243:3243:1104/111228:ERROR:vsync_provider.cc(70)] glXGetSyncValuesOML should not return TRUE with a media stream counter of 0.
Nov 4 11:15:19 localhost systemd-logind[658]: New session 379 of user cwawak.
>>> MARK tuned-adm profile powersave <<<
Nov 4 11:16:14 localhost NetworkManager[770]: <info> (em1): carrier now OFF (device state 100, deferring action for 4 seconds)
Nov 4 11:16:17 localhost kernel: [70964.594522] e1000e: em1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
Nov 4 11:16:17 localhost kernel: [70964.594528] e1000e 0000:00:19.0 em1: 10/100 speed: disabling TSO
Nov 4 11:16:17 localhost NetworkManager[770]: <info> (em1): carrier now ON (device state 100)
... and another big hmm, this time from /var/log/tuned/tuned.log:
2013-11-04 11:15:14,318 INFO tuned.plugins.plugin_cpu: setting new cpu latency 100
2013-11-04 11:16:14,325 INFO tuned.plugins.plugin_net: wlp3s0: setting 100Mbps
2013-11-04 11:16:14,325 INFO tuned.plugins.plugin_net: em1: setting 100Mbps
2013-11-04 11:16:44,659 INFO tuned.plugins.plugin_cpu: setting new cpu latency 1000
This doesn't explain why the link seems to flap, though.
(In reply to Christopher Wawak from comment #3) > I am encountering the same behavior, and I believe I tracked it down to the > "powersave" tuned profile. I typically use the "virtual-host" profile while > plugged in, but to reproduce, I have switched my laptop to "powersave", and > hope to trigger this behavior soon. It typically takes an hour or so before > I notice the instability. I'll upload a dmesg once I trigger it. Christopher: appreciate you sharing..To put a finer edge on this: You don't see the problem with the virtual host profile, but do when you switch over to power save? A good clue if that's true. John C: does this help your situation? Have you used this virtual-host profile or were you aware of it? Give it a try if you haven't, at least it might give your hair a break till we get a better solution..lol (In reply to John Greene from comment #5) > Christopher: appreciate you sharing..To put a finer edge on this: > You don't see the problem with the virtual host profile, but do when you > switch over to power save? A good clue if that's true. John, that's correct. Looking from the tuned logs in my last comment, it seems that plugin_net is setting my interface to 100Mbps. Digging through plugin_net (/usr/lib/python2.7/site-packages/tuned/plugin_net.py), I see the following: if idle["level"] == 0 and idle["read"] >= self._level_steps and idle["write"] >= self._level_steps: idle["level"] = 1 log.info("%s: setting 100Mbps" % device) ethcard(device).set_speed(100) elif idle["level"] == 1 and (idle["read"] == 0 or idle["write"] == 0): idle["level"] = 0 log.info("%s: setting max speed" % device) ethcard(device).set_max_speed() So it certainly looks like tuned is setting the interface to 100Mbps, which is a known best practice for saving power. It seems like it's supposed to only tweak at idle, but I've never seen "setting max speed" in my logs, and I don't know what idle means in this setting. I'd be curious to see what John's /var/log/tuned/tuned.log looks like, and I'm going to guess this is what is going on. In any case, I don't think this is a bug in e1000e, but you might be able to argue that a modification should be made in tuned if the speed changes are disruptive enough. I wonder if there's a way to tweak tuned so it ignores a particular network interface. John Call, Can you do the following: 1. Set tuned to the powersave profile (# tuned-adm profile powersave) 2. Let the interface drop to 100. 3. Once it does, can you send us the last few lines of /var/log/tuned/tuned.log? We're looking for something like "tuned.plugins.plugin_net: em1: setting 100Mbps". (In reply to John Greene from comment #5) > (In reply to Christopher Wawak from comment #3) > > I am encountering the same behavior, and I believe I tracked it down to the > > "powersave" tuned profile. I typically use the "virtual-host" profile while > > plugged in, but to reproduce, I have switched my laptop to "powersave", and > > hope to trigger this behavior soon. It typically takes an hour or so before > > I notice the instability. I'll upload a dmesg once I trigger it. > > Christopher: appreciate you sharing..To put a finer edge on this: > You don't see the problem with the virtual host profile, but do when you > switch over to power save? A good clue if that's true. > > John C: does this help your situation? Have you used this virtual-host > profile or were you aware of it? Give it a try if you haven't, at least it > might give your hair a break till we get a better solution..lol JohnG, Yes, tuned is a problem for me. I observed my system drop to 100Mb/s when configured to use "powersave" mode -- my initial complaint. After setting my tuned mode to "virtual-host" the link automatically re-established at 1,000Mb/s. I'm using my T520 laptop, FYI. My preference would be to have full gigabit speed in addition to using the balanced/power-save mode of tuned. I recognize that some additional requests have been made of me in subsequent comments and will reply later to those specific questions shortly... (In reply to Christopher Wawak from comment #6) > (In reply to John Greene from comment #5) > > > Christopher: appreciate you sharing..To put a finer edge on this: > > You don't see the problem with the virtual host profile, but do when you > > switch over to power save? A good clue if that's true. > > John, that's correct. Looking from the tuned logs in my last comment, it > seems that plugin_net is setting my interface to 100Mbps. Digging through > plugin_net (/usr/lib/python2.7/site-packages/tuned/plugin_net.py), I see the > following: > > if idle["level"] == 0 and idle["read"] >= self._level_steps > and idle["write"] >= self._level_steps: > idle["level"] = 1 > log.info("%s: setting 100Mbps" % device) > ethcard(device).set_speed(100) > elif idle["level"] == 1 and (idle["read"] == 0 or > idle["write"] == 0): > idle["level"] = 0 > log.info("%s: setting max speed" % device) > ethcard(device).set_max_speed() > > So it certainly looks like tuned is setting the interface to 100Mbps, which > is a known best practice for saving power. It seems like it's supposed to > only tweak at idle, but I've never seen "setting max speed" in my logs, and > I don't know what idle means in this setting. I'd be curious to see what > John's /var/log/tuned/tuned.log looks like, and I'm going to guess this is > what is going on. > > In any case, I don't think this is a bug in e1000e, but you might be able to > argue that a modification should be made in tuned if the speed changes are > disruptive enough. I wonder if there's a way to tweak tuned so it ignores a > particular network interface. Christopher, if the intent was to save power by capping the NIC at 100Mb/s, that's fine -- but I still have issue with the link going up and down all the time (flapping). Comment #1 shows the link flapping 5 times and each time coming back up at the same speed. These link up/down events tend to break VPN connections. I'm not sure if they happen in the middle of large file transfers, but I do know that they impact web browsing and SSH connections. Created attachment 819518 [details]
tuned.log
The tuned log which shows powersave profile (configured prior to boot) which produces an unstable link at 100Mb/s. The virtual-host profile was enabled, which stabilized the link and increased the speed to 1,000Mb/s.
Created attachment 819519 [details]
messages
The output from /var/log/messages which aligns with the times reported in the tuned.log file attached to this bug.
(In reply to Christopher Wawak from comment #7) > John Call, > > Can you do the following: > > 1. Set tuned to the powersave profile > (# tuned-adm profile powersave) > > 2. Let the interface drop to 100. > > 3. Once it does, can you send us the last few lines of > /var/log/tuned/tuned.log? > > We're looking for something like "tuned.plugins.plugin_net: em1: setting > 100Mbps". Chris, I've attached the logs. While I was preparing the logs, I counted 30 link up/down events from boot (powersave profile) until I changed to virtual-host profile. The tuned log reports only two messages about changing the link speed, but the kernel reports 30 up/down events. Are you seeing similar instabilities? I wonder if this is due to the NIC being set at a static 100Mb/s Full duplex configuration, while the switch is configured for automatic negotiation. I was connected to a WesternDigital wifi router (N750). I wonder if the amount of link flap or up/down would change if I had been connected to the Cisco or IBM enterprise-class switches. Took a quick look at your log, need to look a bit more. I'm thinking out loud a bit here, maybe Chris knows: could the flap issue be caused by NetworkManager and the driver rate negotiation with the switch, and tuned all banging heads? Need to look at that more..Perhaps making the nic unmanaged for testing might be interesting.Thoughts? *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.13.4-200.fc20. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those. Thank you for the additional background info, Jeremy! I'm using tuned-2.3.0-2.fc20.noarch I disabled dynamic_tuning and set my profile to 'powersave'. The NIC link and speed are stable. As an aside, does the "kernel-tools" package need to be marked as a dependency of tuned? Nasty errors show up in tuned.log about missing "cpupower" and "x86_energy_perf_policy" without that package installed. See below... Thanks again! 2014-03-04 14:39:51,016 ERROR tuned.utils.commands: Executing cpupower error: [Errno 2] No such file or directory 2014-03-04 14:39:51,016 WARNING tuned.plugins.plugin_cpu: using sysfs fallback, is cpupower installed? 2014-03-04 14:39:51,019 ERROR tuned.utils.commands: Executing x86_energy_perf_policy error: [Errno 2] No such file or directory 2014-03-04 14:39:51,019 WARNING tuned.plugins.plugin_cpu: error executing x86_energy_perf_policy tool, ignoring CPU energy performance bias, is the tool installed? (In reply to John Call from comment #17) > Thank you for the additional background info, Jeremy! > > I'm using tuned-2.3.0-2.fc20.noarch > Dynamic tuning is not globally disabled in Fedora. > I disabled dynamic_tuning and set my profile to 'powersave'. The NIC link > and speed are stable. > Or alternatively you can create your own powersave profile which disable the network plugin (which do the dynamic tuning of the network): # mkdir /etc/tuned/custom-powersave && cat << :EOF > /etc/tuned/custom-powersave/tuned.conf [main] include=powersave [net] disabled=true :EOF # tuned-adm profile custom-powersave > As an aside, does the "kernel-tools" package need to be marked as a > dependency of tuned? Nasty errors show up in tuned.log about missing > "cpupower" and "x86_energy_perf_policy" without that package installed. See > below... Thanks for spotting this, I created bug 1072981. (In reply to Jaroslav Škarvada from comment #18) > Or alternatively you can create your own powersave profile which disable the > network plugin (which do the dynamic tuning of the network): > # mkdir /etc/tuned/custom-powersave && cat << :EOF > > /etc/tuned/custom-powersave/tuned.conf > [main] > include=powersave > > [net] > disabled=true > :EOF Jaroslav, I tried this, and I see the following errors in tuned.log: 2014-03-05 15:10:11,360 INFO tuned.plugins.plugin_net: devices: set([u'wlp3s0', u'em1']) 2014-03-05 15:10:11,360 WARNING tuned.plugins.base: Unknown option 'disabled' for plugin 'NetTuningPlugin'. 2014-03-05 15:10:11,360 INFO tuned.plugins.base: instance net: assigning devices wlp3s0, em1 2014-03-05 15:11:01,670 INFO tuned.plugins.plugin_net: wlp3s0: setting 100Mbps 2014-03-05 15:11:11,681 INFO tuned.plugins.plugin_net: em1: setting 100Mbps So, I'm not sure if the disabled=true is working as inteded for the net plugin. > Jaroslav, I tried this, and I see the following errors in tuned.log:
>
> 2014-03-05 15:10:11,360 INFO tuned.plugins.plugin_net: devices:
> set([u'wlp3s0', u'em1'])
> 2014-03-05 15:10:11,360 WARNING tuned.plugins.base: Unknown option
> 'disabled' for plugin 'NetTuningPlugin'.
> 2014-03-05 15:10:11,360 INFO tuned.plugins.base: instance net: assigning
> devices wlp3s0, em1
> 2014-03-05 15:11:01,670 INFO tuned.plugins.plugin_net: wlp3s0: setting
> 100Mbps
> 2014-03-05 15:11:11,681 INFO tuned.plugins.plugin_net: em1: setting
> 100Mbps
>
> So, I'm not sure if the disabled=true is working as inteded for the net
> plugin.
I am sorry, it should be:
[net]
enabled=false
:)
Closing this bug, as "notabug". My issue was caused by the dynamic tuning of tuned and was solved by choosing a more appropriate profile (virtual-host) or customizing the existing profiles. Thanks everybody! |