1074605 – Unstable network connectivity, frequent timeouts

Bug 1074605 - Unstable network connectivity, frequent timeouts

Summary: Unstable network connectivity, frequent timeouts

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Dan Williams
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-10 15:57 UTC by Sijis Aviles
Modified:	2014-04-12 00:08 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-04-12 00:08:30 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ip route output (178 bytes, text/plain) 2014-03-11 14:26 UTC, Sijis Aviles	no flags	Details
ip addr output (646 bytes, text/plain) 2014-03-11 14:26 UTC, Sijis Aviles	no flags	Details
nmcli con show active output (160 bytes, text/plain) 2014-03-11 14:27 UTC, Sijis Aviles	no flags	Details
nmcli dev output (128 bytes, text/plain) 2014-03-11 14:27 UTC, Sijis Aviles	no flags	Details
dmesg output (62.43 KB, text/plain) 2014-03-11 14:28 UTC, Sijis Aviles	no flags	Details
journalctl output from today (16.94 KB, text/plain) 2014-03-11 14:29 UTC, Sijis Aviles	no flags	Details
same commands with NM turned off (57.04 KB, application/gzip) 2014-03-11 14:39 UTC, Sijis Aviles	no flags	Details
journalctl output, ethtool output and sample ping response to gateway (155.48 KB, text/plain) 2014-03-14 16:25 UTC, Sijis Aviles	no flags	Details
Show Obsolete (1) View All

Description Sijis Aviles 2014-03-10 15:57:21 UTC

Description of problem:

After a recent yum update this past Thursday (03/06), my system is seeing frequent timeouts. A constant ping to google.com (internal servers or the gateway) is spotty as I may get the first 10 responses are ok, then no response for 30-40, then ok again for the next 15-20 and so on.

However, I've noticed that if i stop NetworkManager (service NetworkManager stop) and then do 'ifdown em1; ifup em1', pings to any server are constant with no skipping icmp_seq numbers. 

Version-Release number of selected component (if applicable):
kernel-3.13.5-200.fc20.x86_64
kernel-3.13.5-202.fc20.x86_64
kernel-3.13.6-200.fc20.x86_64 <-- running kernel
kernel-debug-devel-3.13.5-200.fc20.x86_64
kernel-debug-devel-3.13.5-202.fc20.x86_64
kernel-debug-devel-3.13.6-200.fc20.x86_64
kernel-devel-3.13.5-200.fc20.x86_64
kernel-devel-3.13.5-202.fc20.x86_64
kernel-devel-3.13.6-200.fc20.x86_64
kernel-headers-3.13.6-200.fc20.x86_64
kernel-modules-extra-3.13.5-200.fc20.x86_64
kernel-modules-extra-3.13.5-202.fc20.x86_64
kernel-modules-extra-3.13.6-200.fc20.x86_64
NetworkManager-0.9.9.0-31.git20131003.fc20.x86_64
NetworkManager-glib-0.9.9.0-31.git20131003.fc20.x86_64
NetworkManager-l2tp-0.9.8.6-1.fc20.x86_64
NetworkManager-openconnect-0.9.8.0-2.fc20.x86_64
NetworkManager-openvpn-0.9.9.0-0.1.git20140128.fc20.x86_64
NetworkManager-openvpn-gnome-0.9.9.0-0.1.git20140128.fc20.x86_64
NetworkManager-pptp-0.9.8.2-3.fc20.x86_64
NetworkManager-pptp-gnome-0.9.8.2-3.fc20.x86_64
NetworkManager-vpnc-0.9.8.2-2.fc20.x86_64
NetworkManager-vpnc-gnome-0.9.8.2-2.fc20.x86_64

How reproducible:
Always


Steps to Reproduce:
1. boot system and login (get ip via dhcp)
2. ping gateway (spotty)
3. kill ping
4. service NetworkManager stop
5. ifdown em1; ifup em1
6. ping gateway (ok)

Actual results:
ping spotty and timeouts often

Expected results:
constant ping

Additional info:
[    0.491934] drop_monitor: Initializing network drop monitor service
[    3.542617] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
[    3.719471] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[    3.891007] systemd-udevd[477]: renamed network interface eth0 to em1
[    4.015895] systemd-udevd[466]: renamed network interface wlan0 to wlp3s0

Comment 1 Jirka Klimes 2014-03-11 14:01:13 UTC

It is strange that NM would influence ping times, because it just configures the interface and then kernel/driver handles the traffic.

The only reason that comes to my mind is that there could be some re-activations (disconnects) of the network connection. Also you might be connected to a Wi-Fi with a weak signal.

Would you provide more information, so that we can analyse?
a) /var/log/messages (or journalctl -u NetworkManager)
b) dmesg output
c) nmcli dev
   nmcli con show active
when NetworkManager is running
d) ip addr
   ip route
for both with and without NM

BTW, does running with older kernel help?

Comment 2 Sijis Aviles 2014-03-11 14:26:12 UTC

Created attachment 873162 [details]
ip route output

Comment 3 Sijis Aviles 2014-03-11 14:26:35 UTC

Created attachment 873163 [details]
ip addr output

Comment 4 Sijis Aviles 2014-03-11 14:27:02 UTC

Created attachment 873164 [details]
nmcli con show active output

Comment 5 Sijis Aviles 2014-03-11 14:27:36 UTC

Created attachment 873165 [details]
nmcli dev output

Comment 6 Sijis Aviles 2014-03-11 14:28:09 UTC

Created attachment 873166 [details]
dmesg output

Comment 7 Sijis Aviles 2014-03-11 14:29:01 UTC

Created attachment 873167 [details]
journalctl output from today

Comment 8 Sijis Aviles 2014-03-11 14:39:45 UTC

Created attachment 873173 [details]
same commands with NM turned off

note: the journal output is from everything, not just NetworkManager

Comment 9 Sijis Aviles 2014-03-11 14:43:45 UTC

(In reply to Jirka Klimes from comment #1)
> .... Also you might be
> connected to a Wi-Fi with a weak signal.

I'm not connected via wifi. I'm on a docking station though.

> 
> Would you provide more information, so that we can analyse?
> a) /var/log/messages (or journalctl -u NetworkManager)
> b) dmesg output
> c) nmcli dev
>    nmcli con show active
> when NetworkManager is running
> d) ip addr
>    ip route
> for both with and without NM
> 

I've uploaded them.
(I should have zipped the first set of outputs.. sorry)

> BTW, does running with older kernel help?
I did try both previous kernel versions and neither helped.
I even reverted the update via yum history undo # but that had no success either.

Please let me know if you need any more info.

Sijis

Comment 10 collura 2014-03-12 07:35:50 UTC

though this bug isnt kernel sensitive, maybe related to one (https://bugzilla.redhat.com/show_bug.cgi?id=1075443) 
that is since started about same time?

Comment 11 Sijis Aviles 2014-03-13 15:52:15 UTC

(In reply to collura from comment #10)
> though this bug isnt kernel sensitive, maybe related to one
> (https://bugzilla.redhat.com/show_bug.cgi?id=1075443) 
> that is since started about same time?
Yeah around the same time, but i don't use wireless much.

Comment 12 Jirka Klimes 2014-03-14 10:12:45 UTC

You can run "ethtool -S em1" to see if there are some dropped packets or some other problem.

Also please try switching off SELinux if that helps, as I see many issues in the logs.
# setenforce Permissive

It might be good to relabel the system:
https://ask.fedoraproject.org/en/question/37289/selinux-is-alerting-everything-after-upgrade/

Mar 11 09:20:53 saviles-t440.gogo.local NetworkManager[2162]: suspect value in domain_search option - discarded
Mar 11 09:20:53 saviles-t440.gogo.local NetworkManager[2162]: Error: could not connect to NetworkManager DBus socket: (org.freedesktop.DBus.Error.NoServer) Failed to connect to socket /var/run/NetworkManager/private-dhcp: Connection refused
Mar 11 09:20:53 saviles-t440.gogo.local NetworkManager[2162]: Fatal error occured, killing dhclient instance with pid 2419.

Also, you have bad domain_search:
Mar 11 09:20:53 saviles-t440.gogo.local NetworkManager[2162]: suspect value in domain_search option - discarded
Mar 11 09:21:09 saviles-t440.gogo.local dhclient[2780]: /var/lib/dhclient/dhclient-fb3a1d3f-b5b3-4736-845f-5f08c9610d6e-em1.lease line 26: Expecting a domain string.
Mar 11 09:21:09 saviles-t440.gogo.local dhclient[2780]: option domain-search ;

What you have in:
$ nmcli con show em1 | grep dns-search

Comment 13 Sijis Aviles 2014-03-14 16:23:11 UTC

(In reply to Jirka Klimes from comment #12)
> You can run "ethtool -S em1" to see if there are some dropped packets or
> some other problem.

I will attach the full output, however there were no errors
     rx_errors: 0
     tx_errors: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     rx_csum_offload_errors: 0
     uncorr_ecc_errors: 0
     corr_ecc_errors: 0


> 
> Also please try switching off SELinux if that helps, as I see many issues in
> the logs.
> # setenforce Permissive
Done. also set it up in /etc/sysconfig/selinux
> 
> It might be good to relabel the system:
> https://ask.fedoraproject.org/en/question/37289/selinux-is-alerting-
> everything-after-upgrade/
I also did this as requested.

> 
> Also, you have bad domain_search:
> What you have in:
> $ nmcli con show em1 | grep dns-search

the output of 'nmcli con show active em1 | grep dns' was blank.
Although name resolution is fine on the system.

I am also attaching a newer journalctl after all these changes to see if something else comes to mind.

Comment 14 Sijis Aviles 2014-03-14 16:25:25 UTC

Created attachment 874512 [details]
journalctl output, ethtool output and sample ping response to gateway

Comment 15 Sijis Aviles 2014-03-19 14:44:55 UTC

I just realized something interesting.

When i'm using NetworkManager i'm getting an IP ending in .160 from DHCP, however, after i stop NetworkManager and do the ifdown/ifup em1, i'm getting a different ip .153.

With DHCP I should be getting the same IP, as the mac is the same.. so i thought.

I reached out to our network team and they see two different macs for .160 and .153.

(output from switch)
10.1.8.160      4437-e634-0388 108      BAGG40                   20    D
10.1.8.153      28d2-444f-90ed 108      BAGG40                   20    D

I do not know where the mac for .160 (03:88) is coming from as the mac registered on my system is ending with 90:ed.

I am on a docking station but this setup worked prior to any update a few weeks ago.

Anything else I should look into?

Comment 16 Dan Williams 2014-03-19 15:29:49 UTC

(In reply to Sijis Aviles from comment #15)
> I just realized something interesting.
> 
> When i'm using NetworkManager i'm getting an IP ending in .160 from DHCP,
> however, after i stop NetworkManager and do the ifdown/ifup em1, i'm getting
> a different ip .153.
> 
> With DHCP I should be getting the same IP, as the mac is the same.. so i
> thought.

You'll only get the same IP address if either (a) the DHCP server remembers your MAC address and hands you the same address or (b) manual DHCP and NetworkManager use the same leasefile.

(a) is rare

(b) is unlikely because a manual dhclient run does not use the same leasefile as NetworkManager does; NM uses connection-specific leasefiles because (obviously) the lease is different with each network you connect to.  But a manual dhclient run uses a single leasefile per interface.  So it's not surprising you'd get a different IP address from a manual dhclient run.

> I reached out to our network team and they see two different macs for .160
> and .153.
> 
> (output from switch)
> 10.1.8.160      4437-e634-0388 108      BAGG40                   20    D
> 10.1.8.153      28d2-444f-90ed 108      BAGG40                   20    D
> 
> I do not know where the mac for .160 (03:88) is coming from as the mac
> registered on my system is ending with 90:ed.

44:37:E6 is the OUI for Hon Hai Precision, which is FoxCon.
28:d2:44 is LCFC (HeFei) Electronics Technology Co. which is a Lenovo/Compal joint-venture.

If your laptop has onboard ethernet, and the dock itself has ethernet too, they hardware will be slightly different even though they look like the same ethernet device.  So it could be that the dock actually has a different MAC address than the laptop itself.

Comment 17 Sijis Aviles 2014-03-19 21:09:03 UTC

(In reply to Dan Williams from comment #16)
> If your laptop has onboard ethernet, and the dock itself has ethernet too,
> they hardware will be slightly different even though they look like the same
> ethernet device.  So it could be that the dock actually has a different MAC
> address than the laptop itself.

Thanks for the explanation. I am using a dock so that's likely the reason for the 2 macs. I'll see if there's a way to find out the mac that's on the dock just to validate.

Unfortunately none of this still explains the lost ping when using NM.

Comment 18 Sijis Aviles 2014-04-04 21:07:42 UTC

The issue seems to have been solved in the last week or so.

The only major event I recall, was that I literally plugged in a cable right into the nic port on the laptop and then I docked without a reboot.

The next day I saw kernel, NM, and dhclient updates which i updated and still no issues.

I'm not sure what to attribute the solution but plugging in a network cable directly into the laptop had some effect.

Note You need to log in before you can comment on or make changes to this bug.