Bug 1193127

Summary: dhclient is started by NetworkManager at disconnection instead of at connection time
Product: [Fedora] Fedora Reporter: Pavel Šimerda (pavlix) <psimerda>
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 22CC: awilliam, danofsatx, dcbw, erik-fedora, jpopelka, moez.roy, pbrobinson, petersen, pschindl, psimerda, robatino, satellitgo, sgallagh, thozza
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: NetworkManager-1.0.0-7.fc22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-07 00:08:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1043121, 1199098    
Attachments:
Description Flags
dmesg
none
Debugging journal none

Description Pavel Šimerda (pavlix) 2015-02-16 15:48:46 UTC
1) NetworkManager doesn't start dhclient nor does it configure network via other means after the cable is connected. The logs show that kernel and NetworkManager learn about the connected cable.

2) On the other hand, when the cable is disconnected, NetworkManager starts dhclient which then attempts to perform a DHCP transaction over no link at all.


Version-Release number of selected component (if applicable):

NetworkManager-1.0.0-4.fc22.x86_64
dhcp-client-4.3.2-0.2b1.fc23.x86_64


How reproducible:

100%


Steps to Reproduce:

When the cable is physically connected:

1) Disconnect it.

2) Start watching the logs (e.g. using journalctl -f)

3) Connect it and wait a bit.

4) Disconnect it and wait a bit.


Actual results:

After #3, no DHCP transaction is reported and the network does not get configured. After #4, dhclient is started and attempts to perform configuration.


Expected results:

After #3, network should be configured via DHCP. After #4, dhclient should *not* be started.


Additional info:

With a good timing, you can start the dhclient by disconnecting and catch its results by connecting in time.

Wireless works without issues.

Tested on Lenovo T540P, recent installation of F22, just updated.

Comment 1 Jens Petersen 2015-02-18 13:29:42 UTC
Dunno if it is related but I can't get an ipv4 address on a fresh F22 install
(via F21 boot.iso) nor in F22 boot.iso.

Comment 2 Pavel Šimerda (pavlix) 2015-02-19 09:20:01 UTC
(In reply to Jens Petersen from comment #1)
> Dunno if it is related but I can't get an ipv4 address on a fresh F22 install
> (via F21 boot.iso) nor in F22 boot.iso.

You can run `journalctl -f` and see whether you experience the same behavior. Pushing in the cable results in a few messages informing that kernel and NM knows about it. Pulling it out results in a lot of messages including those from dhclient.

Comment 3 Jiri Popelka 2015-02-19 09:48:18 UTC
(In reply to Pavel Šimerda (pavlix) from comment #0)
> 4) Disconnect it and wait a bit.
> After #4, dhclient is started and attempts to perform configuration.

I don't see anything related in dhcp-4.3.2 changelog.
The only vague idea I have is that it's related to a new feature in bind-9.10
"Network interfaces are re-scanned automatically whenever they change."
http://fedoraproject.org/wiki/Changes/BIND_9.10

(In reply to Jens Petersen from comment #1)
> Dunno if it is related but I can't get an ipv4 address on a fresh F22 install
> (via F21 boot.iso) nor in F22 boot.iso.

There are also bug #1193944 and bug #1193988.

Comment 4 Pavel Šimerda (pavlix) 2015-02-19 10:07:43 UTC
(In reply to Jiri Popelka from comment #3)
> The only vague idea I have is that it's related to a new feature in bind-9.10
> "Network interfaces are re-scanned automatically whenever they change."
> http://fedoraproject.org/wiki/Changes/BIND_9.10

I don't see how that would explain the behavior. After all it's NetworkManager's dhclient instance. With a bit of woodoo described in comment #0, NetworkManager can even pick up the result.

Comment 5 Jiri Popelka 2015-02-19 10:12:23 UTC
Sure, I for some reason thought the BZ has been reassigned to me (dhcp), while you just added me to CC. Sorry for the noise.

Comment 6 Fedora Blocker Bugs Application 2015-02-23 09:56:16 UTC
Proposed as a Blocker for 22-alpha by Fedora user pavlix using the blocker tracking app because:

 Wired networking is critical to many Fedora installations. We cannot expect users to configure network manually nor can we expect them to use the woodoo-style workaround mentioned in the bug report.

Comment 7 Pavel Šimerda (pavlix) 2015-02-23 10:24:54 UTC
Wait a bit... I may have misidentified the system, will create a new F22 installation and re-test today.

Comment 8 Dan Mossor [danofsatx] 2015-02-23 18:06:16 UTC
Discussed at today's blocker review meeting [1].

Punt - A decision on this bug has been postponed in order to perform further testing.

http://meetbot.fedoraproject.org/fedora-blocker-review/2015-02-23/

Comment 9 Pavel Šimerda (pavlix) 2015-02-23 18:35:44 UTC
So I wasn't able to install F22 and test it. Any reliable way to get an F22 installation up and running?

Comment 10 Pavel Šimerda (pavlix) 2015-02-23 21:36:13 UTC
Finally got it installed and it seems to behave as described. Will add more information later. So far I insist on the blocker proposal.

Comment 11 Stephen Gallagher 2015-02-24 14:57:01 UTC
I also have a T540p system, and can confirm this behavior, though I'm experiencing it far less consistently than Pavel apparently. It worked fine if I was booted up while connected.

Once I disconnect and reconnect the cable, I get the following behavior:

* I receive an IP address
* I can successfully ping any address on the same subnet
* I cannot ping 8.8.8.8

I'm casting a vote in favor of blocker status.


(Probably related: when this machine is on the docking station, the wired network connection bounces to and from connectivity every few minutes).

Comment 12 Dan Mossor [danofsatx] 2015-03-02 14:40:54 UTC
+1 AlphaBlocker. This is a conditional requirement for the Alpha Post-Install requirements:

Updates
The installed system must be able to download and install updates with the default console package manager. 

The system having network connectivity is required for the ability to download updates.

Comment 13 Petr Schindler 2015-03-02 18:46:13 UTC
Discussed at today's blocker review meeting [1].

This bug was accepted as Alpha Blocker - This bug violates the Alpha criterion: "The installed system must be able to download and install updates with the default console package manager."

http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-02/

Comment 14 Dan Williams 2015-03-03 15:33:40 UTC
Can we get some NetworkManager log output for this?  Bonus points if you run "sudo nmcli gen log level debug" and then get the log output.

Comment 15 Dan Williams 2015-03-03 15:35:11 UTC
Further bonus points if, when NM doesn't start dhclient, you grab "/sys/class/net/<interface>/carrier" and then do the same right after you disconnect the cable.

'dmesg' would also be helpful here.

Comment 16 Stephen Gallagher 2015-03-03 18:04:51 UTC
Created attachment 997626 [details]
dmesg

Comment 17 Stephen Gallagher 2015-03-03 18:14:30 UTC
Created attachment 997627 [details]
Debugging journal

Comment 18 Dan Williams 2015-03-03 20:52:23 UTC
Based on Stephen's logs, here's what I see:

Mar 03 13:07:51 sgallagh540.sgallagh.rht kernel: e1000e: enp0s25 NIC Link is Down
Mar 03 13:07:55 sgallagh540.sgallagh.rht NetworkManager[1084]: <info>  (enp0s25): device state change: activated -> unavailable (reason 'carrier-changed') [100 20 40]

** So far so good; no carrier so connection is terminated after a 4-second delay.

Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <info>  (enp0s25): found matching connection 'enp0s25'
Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <debug> [1425406076.509458] [nm-manager.c:1611] assume_connection(): (enp0s25): will attempt to assume connection
Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <info>  (enp0s25): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]

** Not good, NM apparently assumes an existing connection on the interface, which may be due to an IPv6LL address.  NM shouldn't be doing this, especially if the interface is unavailable.

Mar 03 13:08:05 sgallagh540.sgallagh.rht NetworkManager[1084]: <info>  (enp0s25): link connected
Mar 03 13:08:05 sgallagh540.sgallagh.rht kernel: e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Mar 03 13:08:05 sgallagh540.sgallagh.rht dhclient[14872]: DHCPREQUEST on enp0s25 to 255.255.255.255 port 67 (xid=0x103fd021)
Mar 03 13:08:05 sgallagh540.sgallagh.rht dhclient[14872]: DHCPACK from 10.0.0.1 (xid=0x103fd021)

** A short time later, the device gets a link again and the ongoing DHCP request is answered.


So at least from Stephen's logs, I'm not seeing the originally reported problem where NM disconnects when it should be connecting.  But I am seeing an error where NM tries to "assume" a connection that probably isn't there.

Comment 19 Dan Williams 2015-03-03 20:53:23 UTC
Pavel, if you could attach some logs showing the issue you originally reported, then we can diagnose that issue.  Thanks!

Comment 20 Stephen Gallagher 2015-03-04 01:06:28 UTC
I can at least state that, once I'm in the situation of those logs, I cannot get network access to anything any longer. All pings, etc. return "no route to destination" or something equivalent. I have to reboot to get any network connectivity back. So it seems like it's breaking something in the routing table. If this is a different bug, we can split the BZs, but I suspect that the DHCP thing may actually be a symptom of this, rather than a separate bug.

Comment 21 Moez Roy 2015-03-04 15:14:02 UTC
(In reply to Dan Williams from comment #14)
> Can we get some NetworkManager log output for this? 

You should also be able to reproduce this. Download yesterdays rawhide livecd image and run it in VirtualBox. Be sure to enable EFI boot so X starts up. 

NetworkMager is able to get an IP from the DHCP server and says it is connected, but in reality there is no connectivity.

I was able to get networking up after putting in a manual IP address,  gateway IP, and DNS IP. Also set IPv6 to ignore.

Here is the link to today's rawhide image https://kojipkgs.fedoraproject.org//work/tasks/5680/9135680/Fedora-Live-Workstation-x86_64-rawhide-20150304.iso

Comment 22 Dan Williams 2015-03-04 22:27:49 UTC
Running the linked iso via live USB on a real machine I am able to ping external addresses like 4.2.2.1.

I do see two issues:

1) DNS doesn't work; /etc/resolv.conf is a symlink to /run/systemd/resolve/resolv.conf, I think this was discussed on fedora-devel@ earlier this week in the "systemd-219 issues with 22 and Rawhide composes" thread

2) NM continuously tries to check for connectivity and when DNS fails; this looks like https://bugzilla.gnome.org/show_bug.cgi?id=742823 which has been fixed upstream but after NM 1.0 that is in F22.

Comment 23 Adam Williamson 2015-03-04 22:32:30 UTC
The resolv.conf change was reverted for F22, so that is not likely to be involved.

Comment 24 Adam Williamson 2015-03-04 22:51:08 UTC
I seem to be able to reproduce this (at least the case where NM behaves badly on disconnect/reconnect) in a KVM as follows:

1. Boot 22 Alpha TC8, with 'virtio' network adapter (though I don't know if the model actually matters)
2. Verify network works
3. From the host, do virsh domif-setlink (domain) (interface) down
4. From the host, do virsh domif-setlink (domain) (interface) up

repeat 3, 4 as desired, while watching logs in the guest. (domain) is the virt domain, (interface) is the 'interface device', which for me is 'vnet0' (try the interfaces you see in 'ip addr' on the host, not the guest).

Indeed I see a lot of DHCP activity and such when I do 'down', and nothing happening when I do 'up'.

Comment 25 Dan Williams 2015-03-04 23:52:43 UTC
Adams logs have this bit which is probably the original issue:

Mar 04 17:44:37 localhost NetworkManager[1190]: <info>  (eth0): link disconnected (deferring action for 4 seconds)
Mar 04 17:44:41 localhost NetworkManager[1190]: <info>  (eth0): link disconnected (calling deferred action)
Mar 04 17:44:41 localhost NetworkManager[1190]: <info>  (eth0): device state change: activated -> unavailable (reason 'carrier-changed') [100 20 40]
Mar 04 17:44:41 localhost NetworkManager[1190]: <info>  (eth0): deactivating device (reason 'carrier-changed') [40]
Mar 04 17:44:42 localhost NetworkManager[1190]: <info>  (eth0): canceled DHCP transaction, DHCP client pid 1543
Mar 04 17:44:42 localhost NetworkManager[1190]: <info>  (eth0): DHCPv4 state changed bound -> done
Mar 04 17:44:42 localhost NetworkManager[1190]: <info>  NetworkManager state is now CONNECTED_LOCAL
Mar 04 17:44:42 localhost.localdomain NetworkManager[1190]: <info>  (eth0): found matching connection 'Wired connection 1'
Mar 04 17:44:42 localhost.localdomain NetworkManager[1190]: <info>  (eth0): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41]

I was able to reproduce that locall, and I've cherry-picked a patch from upstream stable 1.0 branch that fixes this issue for me, please test these builds:

http://koji.fedoraproject.org/koji/taskinfo?taskID=9139901

Comment 26 Fedora Update System 2015-03-05 02:53:16 UTC
NetworkManager-1.0.0-5.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/NetworkManager-1.0.0-5.fc22

Comment 27 Stephen Gallagher 2015-03-05 13:18:21 UTC
I tested that update today. It does not fix the more serious issue I was seeing (that of networking being nonfunctional when a wired network connection is present). It did seem to make the wired issue more consistently fail, which I suppose aids in reproducibility. I've opened BZ #1199098 to track that separately.

Comment 28 Dan Williams 2015-03-05 13:40:56 UTC
Stephen, can you post logs from the updated NM showing the carrier-loss & dhclient stuff?  I was able to reproduce the issue that your previous logs showed, and that adam's logs showed, and the patch fixes the issue for me...  since there seem to be a few things going on here, I'd like to confirm/deny whether the original issue is fixed or not.

(also posted a request for more info in bz #1199098)

Comment 29 Fedora Update System 2015-03-05 15:26:53 UTC
NetworkManager-1.0.0-7.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/NetworkManager-1.0.0-7.fc22

Comment 30 Fedora Update System 2015-03-05 17:12:01 UTC
Package NetworkManager-1.0.0-7.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-1.0.0-7.fc22'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-3298/NetworkManager-1.0.0-7.fc22
then log in and leave karma (feedback).

Comment 31 Fedora Update System 2015-03-07 00:08:33 UTC
NetworkManager-1.0.0-7.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.