Bug 1193127
Summary: | dhclient is started by NetworkManager at disconnection instead of at connection time | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Pavel Šimerda (pavlix) <psimerda> | ||||||
Component: | NetworkManager | Assignee: | Dan Williams <dcbw> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 22 | CC: | awilliam, danofsatx, dcbw, erik-fedora, jpopelka, moez.roy, pbrobinson, petersen, pschindl, psimerda, robatino, satellitgo, sgallagh, thozza | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | AcceptedBlocker | ||||||||
Fixed In Version: | NetworkManager-1.0.0-7.fc22 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-03-07 00:08:33 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1043121, 1199098 | ||||||||
Attachments: |
|
Description
Pavel Šimerda (pavlix)
2015-02-16 15:48:46 UTC
Dunno if it is related but I can't get an ipv4 address on a fresh F22 install (via F21 boot.iso) nor in F22 boot.iso. (In reply to Jens Petersen from comment #1) > Dunno if it is related but I can't get an ipv4 address on a fresh F22 install > (via F21 boot.iso) nor in F22 boot.iso. You can run `journalctl -f` and see whether you experience the same behavior. Pushing in the cable results in a few messages informing that kernel and NM knows about it. Pulling it out results in a lot of messages including those from dhclient. (In reply to Pavel Šimerda (pavlix) from comment #0) > 4) Disconnect it and wait a bit. > After #4, dhclient is started and attempts to perform configuration. I don't see anything related in dhcp-4.3.2 changelog. The only vague idea I have is that it's related to a new feature in bind-9.10 "Network interfaces are re-scanned automatically whenever they change." http://fedoraproject.org/wiki/Changes/BIND_9.10 (In reply to Jens Petersen from comment #1) > Dunno if it is related but I can't get an ipv4 address on a fresh F22 install > (via F21 boot.iso) nor in F22 boot.iso. There are also bug #1193944 and bug #1193988. (In reply to Jiri Popelka from comment #3) > The only vague idea I have is that it's related to a new feature in bind-9.10 > "Network interfaces are re-scanned automatically whenever they change." > http://fedoraproject.org/wiki/Changes/BIND_9.10 I don't see how that would explain the behavior. After all it's NetworkManager's dhclient instance. With a bit of woodoo described in comment #0, NetworkManager can even pick up the result. Sure, I for some reason thought the BZ has been reassigned to me (dhcp), while you just added me to CC. Sorry for the noise. Proposed as a Blocker for 22-alpha by Fedora user pavlix using the blocker tracking app because: Wired networking is critical to many Fedora installations. We cannot expect users to configure network manually nor can we expect them to use the woodoo-style workaround mentioned in the bug report. Wait a bit... I may have misidentified the system, will create a new F22 installation and re-test today. Discussed at today's blocker review meeting [1]. Punt - A decision on this bug has been postponed in order to perform further testing. http://meetbot.fedoraproject.org/fedora-blocker-review/2015-02-23/ So I wasn't able to install F22 and test it. Any reliable way to get an F22 installation up and running? Finally got it installed and it seems to behave as described. Will add more information later. So far I insist on the blocker proposal. I also have a T540p system, and can confirm this behavior, though I'm experiencing it far less consistently than Pavel apparently. It worked fine if I was booted up while connected. Once I disconnect and reconnect the cable, I get the following behavior: * I receive an IP address * I can successfully ping any address on the same subnet * I cannot ping 8.8.8.8 I'm casting a vote in favor of blocker status. (Probably related: when this machine is on the docking station, the wired network connection bounces to and from connectivity every few minutes). +1 AlphaBlocker. This is a conditional requirement for the Alpha Post-Install requirements: Updates The installed system must be able to download and install updates with the default console package manager. The system having network connectivity is required for the ability to download updates. Discussed at today's blocker review meeting [1]. This bug was accepted as Alpha Blocker - This bug violates the Alpha criterion: "The installed system must be able to download and install updates with the default console package manager." http://meetbot.fedoraproject.org/fedora-blocker-review/2015-03-02/ Can we get some NetworkManager log output for this? Bonus points if you run "sudo nmcli gen log level debug" and then get the log output. Further bonus points if, when NM doesn't start dhclient, you grab "/sys/class/net/<interface>/carrier" and then do the same right after you disconnect the cable. 'dmesg' would also be helpful here. Created attachment 997626 [details]
dmesg
Created attachment 997627 [details]
Debugging journal
Based on Stephen's logs, here's what I see: Mar 03 13:07:51 sgallagh540.sgallagh.rht kernel: e1000e: enp0s25 NIC Link is Down Mar 03 13:07:55 sgallagh540.sgallagh.rht NetworkManager[1084]: <info> (enp0s25): device state change: activated -> unavailable (reason 'carrier-changed') [100 20 40] ** So far so good; no carrier so connection is terminated after a 4-second delay. Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <info> (enp0s25): found matching connection 'enp0s25' Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <debug> [1425406076.509458] [nm-manager.c:1611] assume_connection(): (enp0s25): will attempt to assume connection Mar 03 13:07:56 sgallagh540.sgallagh.rht NetworkManager[1084]: <info> (enp0s25): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41] ** Not good, NM apparently assumes an existing connection on the interface, which may be due to an IPv6LL address. NM shouldn't be doing this, especially if the interface is unavailable. Mar 03 13:08:05 sgallagh540.sgallagh.rht NetworkManager[1084]: <info> (enp0s25): link connected Mar 03 13:08:05 sgallagh540.sgallagh.rht kernel: e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Mar 03 13:08:05 sgallagh540.sgallagh.rht dhclient[14872]: DHCPREQUEST on enp0s25 to 255.255.255.255 port 67 (xid=0x103fd021) Mar 03 13:08:05 sgallagh540.sgallagh.rht dhclient[14872]: DHCPACK from 10.0.0.1 (xid=0x103fd021) ** A short time later, the device gets a link again and the ongoing DHCP request is answered. So at least from Stephen's logs, I'm not seeing the originally reported problem where NM disconnects when it should be connecting. But I am seeing an error where NM tries to "assume" a connection that probably isn't there. Pavel, if you could attach some logs showing the issue you originally reported, then we can diagnose that issue. Thanks! I can at least state that, once I'm in the situation of those logs, I cannot get network access to anything any longer. All pings, etc. return "no route to destination" or something equivalent. I have to reboot to get any network connectivity back. So it seems like it's breaking something in the routing table. If this is a different bug, we can split the BZs, but I suspect that the DHCP thing may actually be a symptom of this, rather than a separate bug. (In reply to Dan Williams from comment #14) > Can we get some NetworkManager log output for this? You should also be able to reproduce this. Download yesterdays rawhide livecd image and run it in VirtualBox. Be sure to enable EFI boot so X starts up. NetworkMager is able to get an IP from the DHCP server and says it is connected, but in reality there is no connectivity. I was able to get networking up after putting in a manual IP address, gateway IP, and DNS IP. Also set IPv6 to ignore. Here is the link to today's rawhide image https://kojipkgs.fedoraproject.org//work/tasks/5680/9135680/Fedora-Live-Workstation-x86_64-rawhide-20150304.iso Running the linked iso via live USB on a real machine I am able to ping external addresses like 4.2.2.1. I do see two issues: 1) DNS doesn't work; /etc/resolv.conf is a symlink to /run/systemd/resolve/resolv.conf, I think this was discussed on fedora-devel@ earlier this week in the "systemd-219 issues with 22 and Rawhide composes" thread 2) NM continuously tries to check for connectivity and when DNS fails; this looks like https://bugzilla.gnome.org/show_bug.cgi?id=742823 which has been fixed upstream but after NM 1.0 that is in F22. The resolv.conf change was reverted for F22, so that is not likely to be involved. I seem to be able to reproduce this (at least the case where NM behaves badly on disconnect/reconnect) in a KVM as follows: 1. Boot 22 Alpha TC8, with 'virtio' network adapter (though I don't know if the model actually matters) 2. Verify network works 3. From the host, do virsh domif-setlink (domain) (interface) down 4. From the host, do virsh domif-setlink (domain) (interface) up repeat 3, 4 as desired, while watching logs in the guest. (domain) is the virt domain, (interface) is the 'interface device', which for me is 'vnet0' (try the interfaces you see in 'ip addr' on the host, not the guest). Indeed I see a lot of DHCP activity and such when I do 'down', and nothing happening when I do 'up'. Adams logs have this bit which is probably the original issue: Mar 04 17:44:37 localhost NetworkManager[1190]: <info> (eth0): link disconnected (deferring action for 4 seconds) Mar 04 17:44:41 localhost NetworkManager[1190]: <info> (eth0): link disconnected (calling deferred action) Mar 04 17:44:41 localhost NetworkManager[1190]: <info> (eth0): device state change: activated -> unavailable (reason 'carrier-changed') [100 20 40] Mar 04 17:44:41 localhost NetworkManager[1190]: <info> (eth0): deactivating device (reason 'carrier-changed') [40] Mar 04 17:44:42 localhost NetworkManager[1190]: <info> (eth0): canceled DHCP transaction, DHCP client pid 1543 Mar 04 17:44:42 localhost NetworkManager[1190]: <info> (eth0): DHCPv4 state changed bound -> done Mar 04 17:44:42 localhost NetworkManager[1190]: <info> NetworkManager state is now CONNECTED_LOCAL Mar 04 17:44:42 localhost.localdomain NetworkManager[1190]: <info> (eth0): found matching connection 'Wired connection 1' Mar 04 17:44:42 localhost.localdomain NetworkManager[1190]: <info> (eth0): device state change: unavailable -> disconnected (reason 'connection-assumed') [20 30 41] I was able to reproduce that locall, and I've cherry-picked a patch from upstream stable 1.0 branch that fixes this issue for me, please test these builds: http://koji.fedoraproject.org/koji/taskinfo?taskID=9139901 NetworkManager-1.0.0-5.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/NetworkManager-1.0.0-5.fc22 I tested that update today. It does not fix the more serious issue I was seeing (that of networking being nonfunctional when a wired network connection is present). It did seem to make the wired issue more consistently fail, which I suppose aids in reproducibility. I've opened BZ #1199098 to track that separately. Stephen, can you post logs from the updated NM showing the carrier-loss & dhclient stuff? I was able to reproduce the issue that your previous logs showed, and that adam's logs showed, and the patch fixes the issue for me... since there seem to be a few things going on here, I'd like to confirm/deny whether the original issue is fixed or not. (also posted a request for more info in bz #1199098) NetworkManager-1.0.0-7.fc22 has been submitted as an update for Fedora 22. https://admin.fedoraproject.org/updates/NetworkManager-1.0.0-7.fc22 Package NetworkManager-1.0.0-7.fc22: * should fix your issue, * was pushed to the Fedora 22 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing NetworkManager-1.0.0-7.fc22' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2015-3298/NetworkManager-1.0.0-7.fc22 then log in and leave karma (feedback). NetworkManager-1.0.0-7.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. |