Bug 1475279 - dhclient doesn't update /etc/resolv.conf if NetworkManager crashes upon start
Summary: dhclient doesn't update /etc/resolv.conf if NetworkManager crashes upon start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: dhcp
Version: 26
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pavel Zhukov
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-26 11:34 UTC by Jaroslav Škarvada
Modified: 2017-10-26 11:22 UTC (History)
6 users (show)

Fixed In Version: dhcp-4.3.5-9.fc26
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-06 22:25:37 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jaroslav Škarvada 2017-07-26 11:34:26 UTC
Description of problem:
It's faulty condition, but dhclient should handle it. In fact it doesn't need NetworkManager to work.

The problem is that if NetworkManager crashes it doesn't create:
/var/run/NetworkManager/resolv.conf

/etc/resolv.conf is symlink to /var/run/NetworkManager/resolv.conf

dhclient tries to create the resolv.conf file but it fails because there is no /var/run/NetworkManager directory.

Fix is simple, just add the following to the dhclient-script:
mkdir -p /var/run/NetworkManager

Version-Release number of selected component (if applicable):
dhcp-client-4.3.5-7.fc26.x86_64

How reproducible:
Always

Steps to Reproduce:
1. make NetworkManager crash upon start (in my case it crashed due to broken openssl lib) 
2. dhclient INTERFACE

Actual results:
/usr/sbin/dhclient-script: line 704: /etc/resolv.conf: No such file or directory
And /etc/resolv.conf is not created and resolver is not working

Expected results:
No error, /etc/resolv.conf created and resolver working

Additional info:
I didn't test whether it's reproducible by just disabling the NetworkManager service, but I think it could be.

Comment 1 Jaroslav Škarvada 2017-07-26 11:36:30 UTC
Or simple reproducer:
 systemctl stop NetworkManager
 pkill dhclient
 rm -rf /var/run/NetworkManager/resolv.conf
 dhclient INTERFACE

Comment 2 Jaroslav Škarvada 2017-07-26 11:37:35 UTC
(In reply to Jaroslav Škarvada from comment #1)
> Or simple reproducer:
>  systemctl stop NetworkManager
>  pkill dhclient
>  rm -rf /var/run/NetworkManager/resolv.conf
>  dhclient INTERFACE

systemctl stop NetworkManager
pkill dhclient
rm -rf /var/run/NetworkManager
dhclient INTERFACE

Of course.

Comment 3 Pavel Zhukov 2017-07-26 11:42:12 UTC
Hi,
Thank you for the report.
Agreed, I've hit the problem few times as well. In my case NM created symlink and then deleted target for some reason. So we ended up with dangling symlink /etc/resolv.conf -> <some NM dir>/resolv.conf and not possible to update it by dhclient.

Workaround rm /etc/resolv.conf && touch /etc/resolv.conf && ifup <iface>

Comment 4 Petr Pisar 2017-08-01 09:02:12 UTC
The dhcp-client-4.3.6-3.fc27.x86_64 does not work for me:

# dhclient ens3
/usr/sbin/dhclient-script: line 964: unexpected EOF while looking for matching `"'
/usr/sbin/dhclient-script: line 970: syntax error: unexpected end of file
/usr/sbin/dhclient-script: line 964: unexpected EOF while looking for matching `"'
/usr/sbin/dhclient-script: line 970: syntax error: unexpected end of file
^C

And it prints these two messages until I kill it. It does not add IPv4 address to the interface.

I think the problem is in this change:

+validate_resolv_conf()
+{
+    # It's possible to have broken symbolic link $RESOLVCONF -> <some_nm_dir>
+    # https://bugzilla.redhat.com/1475279
+    # Remove broken link and hope NM will survive
+    if [ -h "${RESOLVCONF}" -a ! -e "${RESOLVCONF}" ];
+    then
+        logmessage "${RESOLVCONF) is broken symlink. Recreating..."
+        unlink "${RESOLVCONF}"
+        touch "${RESOLVCONF}"
+    fi;
+}
+
+

The logmessage function has misquoted RESOLVCONF variable. There is a parentheses instead of a curly bracket. If I fix it, it works again.

Comment 5 Jaroslav Škarvada 2017-08-01 09:24:24 UTC
I am not sure whether unlinking is the right way to go. Maybe you could check the link target, create all the directory structure if missing (by mkdir -p), touch the target, if it fails, then unlink and create new.

Comment 6 Pavel Zhukov 2017-08-01 09:59:09 UTC
(In reply to Jaroslav Škarvada from comment #5)
> I am not sure whether unlinking is the right way to go. Maybe you could
> check the link target, create all the directory structure if missing (by
> mkdir -p), touch the target, if it fails, then unlink and create new.

But NM doesn't own /etc/resolv.conf nor dhclient does so you can remove NM after stopping it in that case recreating of /var/run/NM (each time?) doesn't make any sense for me. 
Anyway in fresh installation of F26 /etc/resolv.conf is regular file so looks like it's leftover after upgrade only. Do you have upgraded F26 installation?

Comment 7 Pavel Zhukov 2017-08-01 10:03:12 UTC
(In reply to Petr Pisar from comment #4)
> The dhcp-client-4.3.6-3.fc27.x86_64 does not work for me:
> 
> # dhclient ens3
> /usr/sbin/dhclient-script: line 964: unexpected EOF while looking for
> matching `"'
> /usr/sbin/dhclient-script: line 970: syntax error: unexpected end of file
> /usr/sbin/dhclient-script: line 964: unexpected EOF while looking for
> matching `"'
> /usr/sbin/dhclient-script: line 970: syntax error: unexpected end of file
> ^C
> 
> And it prints these two messages until I kill it. It does not add IPv4
> address to the interface.
> 
> I think the problem is in this change:
> 
> +validate_resolv_conf()
> +{
> +    # It's possible to have broken symbolic link $RESOLVCONF ->
> <some_nm_dir>
> +    # https://bugzilla.redhat.com/1475279
> +    # Remove broken link and hope NM will survive
> +    if [ -h "${RESOLVCONF}" -a ! -e "${RESOLVCONF}" ];
> +    then
> +        logmessage "${RESOLVCONF) is broken symlink. Recreating..."
> +        unlink "${RESOLVCONF}"
> +        touch "${RESOLVCONF}"
> +    fi;
> +}
> +
> +
> 
> The logmessage function has misquoted RESOLVCONF variable. There is a
> parentheses instead of a curly bracket. If I fix it, it works again.
Thank you. There's one more typo in the script. Fixing now

Comment 8 Pavel Zhukov 2017-08-01 10:17:25 UTC
Seems like unlinking should be safe enough. systemd-resolved can handle it 
https://bugzilla.redhat.com/show_bug.cgi?id=1313085#c9 as well as NetworkManager

Comment 9 Jaroslav Škarvada 2017-08-01 11:03:14 UTC
(In reply to Pavel Zhukov from comment #6)
> (In reply to Jaroslav Škarvada from comment #5)
> > I am not sure whether unlinking is the right way to go. Maybe you could
> > check the link target, create all the directory structure if missing (by
> > mkdir -p), touch the target, if it fails, then unlink and create new.
> 
> But NM doesn't own /etc/resolv.conf nor dhclient does so you can remove NM
> after stopping it in that case recreating of /var/run/NM (each time?)
> doesn't make any sense for me.

/var/run is tmpfs
 
> Anyway in fresh installation of F26 /etc/resolv.conf is regular file so
> looks like it's leftover after upgrade only. Do you have upgraded F26
> installation?

It's probably handled by networkmanager (I didn't check). It makes some sense to me, with DHCP it's more like state than configuration.

I am not against unlinking, I was just against statement:
# Remove broken link and hope NM will survive
But if unlinking works OK according to comment 8, I am OK with it.

Comment 10 Pavel Zhukov 2017-08-01 11:12:24 UTC
(In reply to Jaroslav Škarvada from comment #9)
> 
> /var/run is tmpfs
Right and I don't like the idea to recreate /var/run/NM each time even if NM is deleted from the system many years ago and replaced with systemd-whateverd

> I am not against unlinking, I was just against statement:
> # Remove broken link and hope NM will survive
> But if unlinking works OK according to comment 8, I am OK with it.
I've checked in on my F25 system.  

sctl stop NM
rm /etc/resolv.conf 
echo "nameserver 127.0.0.1" > /etc/resolv.conf
sctl start NM

$ ll -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 35 Aug  1 12:15 /etc/resolv.conf -> /var/run/NetworkManager/resolv.conf


F26 (default Server installation iirc) doesn't have symlink created at all so should be safe. 

In any case having regular /etc/resolv.conf should be forward compatible.

Comment 11 Fedora Update System 2017-08-01 14:08:50 UTC
dhcp-4.3.5-9.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-7c5356a635

Comment 12 Fedora Update System 2017-08-03 00:52:28 UTC
dhcp-4.3.5-9.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-7c5356a635

Comment 13 Fedora Update System 2017-08-06 22:25:37 UTC
dhcp-4.3.5-9.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 14 Oneata Mircea Teodor 2017-10-26 09:53:39 UTC
(In reply to Pavel Zhukov from comment #10)
> (In reply to Jaroslav Škarvada from comment #9)
> > 
> > /var/run is tmpfs
> Right and I don't like the idea to recreate /var/run/NM each time even if NM
> is deleted from the system many years ago and replaced with systemd-whateverd
> 
> > I am not against unlinking, I was just against statement:
> > # Remove broken link and hope NM will survive
> > But if unlinking works OK according to comment 8, I am OK with it.
> I've checked in on my F25 system.  
> 
> sctl stop NM
> rm /etc/resolv.conf 
> echo "nameserver 127.0.0.1" > /etc/resolv.conf
> sctl start NM
> 
> $ ll -l /etc/resolv.conf
> lrwxrwxrwx. 1 root root 35 Aug  1 12:15 /etc/resolv.conf ->
> /var/run/NetworkManager/resolv.conf
> 
> 
> F26 (default Server installation iirc) doesn't have symlink created at all
> so should be safe. 
> 
> In any case having regular /etc/resolv.conf should be forward compatible.

Hi Pavol,
This morning got into the same issues. Could you ping me please if available.
Thanks Teo

Comment 15 Pavel Zhukov 2017-10-26 11:05:29 UTC
(In reply to Oneata Mircea Teodor from comment #14)

> Hi Pavol,
> This morning got into the same issues. Could you ping me please if available.
> Thanks Teo

Which version are you using? Can you elaborate please?

Comment 16 Oneata Mircea Teodor 2017-10-26 11:22:42 UTC
(In reply to Pavel Zhukov from comment #15)
> (In reply to Oneata Mircea Teodor from comment #14)
> 
> > Hi Pavol,
> > This morning got into the same issues. Could you ping me please if available.
> > Thanks Teo
> 
> Which version are you using? Can you elaborate please?

Solve it with Network team, conflict with another network manager, I removed connman-1.34-1.fc26.x86_64 and now all seems in order.


Note You need to log in before you can comment on or make changes to this bug.