Bug 1358193

Summary: nameserver received from dhcp, but missing in /etc/resolv.conf
Product: Red Hat Enterprise Virtualization Manager Reporter: Yedidyah Bar David <didi>
Component: rhev-hypervisorAssignee: Douglas Schilling Landgraf <dougsland>
Status: CLOSED NEXTRELEASE QA Contact: Huijuan Zhao <huzhao>
Severity: high Docs Contact:
Priority: medium    
Version: 3.6.8CC: cshao, dguo, didi, dougsland, gklein, huzhao, leiwang, lsurette, mgoldboi, pstehlik, sbonazzo, srevivo, weiwang, yaniwang, ycui, ykaul
Target Milestone: ovirt-3.6.11Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-31 14:43:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2016-07-20 09:32:12 UTC
Description of problem:

$subject

Version-Release number of selected component (if applicable):

Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160711.0.el7ev)

How reproducible:

Not sure

Steps to Reproduce:
1. Install and deploy hosted-engine 3.5 on two hosts with rhel6
2. Remove one host from engine, reinstall with rhev-h 3.5/el7 rhev-hypervisor7-7.2-20160219.0.iso and add to engine
3. On engine, yum install rhev-hypervisor7-7.2-20160711.0.el6ev (which is from 3.6 QE repo)
4. From web admin: Move the host to maintenance, Upgrade

Actual results:

# cat /etc/resolv.conf 
; Please make changes through the TUI or management server. Manual edits to this file will be lost on reboot
search home.local

# grep name-server /var/lib/dhclient/dhclient--rhevm.lease 
  option domain-name-servers 192.168.3.1;

Expected results:

To have 192.168.3.1 in /etc/resolv.conf

Additional info:

Looked at this a bit, consulted others, did not manage to find what process wrote this file. Was pointed at bug 1351095, but that one is about static conf, not dhcp. I am trying this flow as a solution for bug 1328382, so need to know that it works...

stat says it was written at 2016-07-20 08:23:04. Can't find this timestamp anywhere in /var/log.

/var/log/messages has:
Jul 20 08:22:41 didi-box1 NET[2147]: /usr/sbin/dhclient-script : updated /etc/resolv.conf

So dhclient wrote it, and 23 seconds later it was rewritten.

Comment 2 Yedidyah Bar David 2016-07-20 10:31:05 UTC
'service network restart' temporarily fixes

'service ovirt-post restart' kills it again.

Comment 3 Fabian Deutsch 2016-07-20 12:00:42 UTC
Comment 2 (ovirt-post restrat) indicates that it is a Node issue.

Comment 4 Ying Cui 2016-10-10 06:48:59 UTC
Huijuan, could you reproduce this issue on QE env.?

Comment 5 Huijuan Zhao 2016-10-12 03:26:30 UTC

(In reply to Yedidyah Bar David from comment #0)
> Description of problem:
> 
> $subject
> 
> Version-Release number of selected component (if applicable):
> 
> Red Hat Enterprise Virtualization Hypervisor release 7.2 (20160711.0.el7ev)
> 
> How reproducible:
> 
> Not sure
> 
> Steps to Reproduce:
> 1. Install and deploy hosted-engine 3.5 on two hosts with rhel6

Hi  Yedidyah Bar David,

Could you offer hosted-engine 3.5 ova build version please? 
We reproduced this bug with 20160222.0-1.3.5.ova on RHEV-H release 7.2 for 3.6.8 (20160711.0.el7ev), but deployed failed, the host(7.2_3.6.8) can not be up in hosted-engine 3.5 due to network is inconsistent between 3.5 and 3.6.

Thanks!

Comment 6 Yedidyah Bar David 2016-10-13 08:04:47 UTC
(In reply to Huijuan Zhao from comment #5)
> Hi  Yedidyah Bar David,
> 
> Could you offer hosted-engine 3.5 ova build version please? 

In initial 3.5 setup I didn't use an appliance or node (rhev-h image). Both engine and hosts were plain rhel6.

> We reproduced this bug with 20160222.0-1.3.5.ova on RHEV-H release 7.2 for
> 3.6.8 (20160711.0.el7ev), but deployed failed, the host(7.2_3.6.8) can not
> be up in hosted-engine 3.5 due to network is inconsistent between 3.5 and
> 3.6.

Not sure I fully understand. What exact flow did you follow, and what error did you get?

Generally speaking, the flow described in the bug did work for me, when applying the workaround in comment 2.

Comment 7 Huijuan Zhao 2016-10-13 10:04:51 UTC
(In reply to Yedidyah Bar David from comment #6)
> In initial 3.5 setup I didn't use an appliance or node (rhev-h image). Both
> engine and hosts were plain rhel6.
>
> Not sure I fully understand. What exact flow did you follow, and what error
> did you get?
> 
> Generally speaking, the flow described in the bug did work for me, when
> applying the workaround in comment 2.

Thanks for your explanation, I misunderstood that you deployed rhevm-appliance-3.5.ova on RHEVH 3.6 20160711.0.el7ev , so I followed these builds to reproduce, and this is not support. sorry for that.

I will reproduce this bug according to comment 0 with rhevm-appliance-3.6.ova on RHEVH 3.6 20160711.0.el7ev later.

Comment 8 Douglas Schilling Landgraf 2016-10-13 17:52:36 UTC
Hi Huijuan,

Could you please test the following scenario? To me, ovirt-post service is reconfiguring the dns and removing 'nameserver 192.168.123.1' from resolv.conf like Didi reported. This will happen also in upgrade scenario as requires reboot.

 - Install: rhev-hypervisor7-7.2-20160219.0.iso
 - setup DHCP network
 - Press F2, check /etc/resolv.conf
   # cat /etc/resolv.conf
   <save the result in a place for later comparation>
 - Reboot
 - In TUI it should show network as configured
 - Press F2 to go again to shell
 - Compare the resolv.conf you have now with previous
   # cat /etc/resolv.conf

Comment 9 Huijuan Zhao 2016-10-14 03:20:41 UTC
(In reply to Douglas Schilling Landgraf from comment #8)
> Hi Huijuan,
> 
> Could you please test the following scenario? To me, ovirt-post service is
> reconfiguring the dns and removing 'nameserver 192.168.123.1' from
> resolv.conf like Didi reported. This will happen also in upgrade scenario as
> requires reboot.
> 
>  - Install: rhev-hypervisor7-7.2-20160219.0.iso
>  - setup DHCP network
>  - Press F2, check /etc/resolv.conf
>    # cat /etc/resolv.conf
>    <save the result in a place for later comparation>
>  - Reboot
>  - In TUI it should show network as configured
>  - Press F2 to go again to shell
>  - Compare the resolv.conf you have now with previous
>    # cat /etc/resolv.conf

Hi Douglas,

I tested according to your above steps, result is the same as you predicted, detailed info as below.

Test steps:
1. Install: rhev-hypervisor7-7.2-20160219.0.iso
2. setup DHCP network
3. Press F2, check /etc/resolv.conf
------------------------------
# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search nay.redhat.com. redhat.com.
nameserver 10.72.17.5
nameserver 10.68.5.26
------------------------------
4. Reboot host
5. In TUI it shows network as configured
6. Press F2 to go again to shell
7. Compare the resolv.conf I have now with previous
------------------------------
# cat /etc/resolv.conf
; Please make changes through the TUI or management server. Manual edits to this file will be lost on reboot
search nay.redhat.com. redhat.com.
nameserver 10.68.5.26
------------------------------

Test results:
Compared /etc/resolv.conf in Step3 and Step7, "nameserver 10.72.17.5" disappeared!

Comment 10 Huijuan Zhao 2016-10-14 03:23:30 UTC
Douglas, is it enough to reproduce this bug as comment 9 ? I do not have to reproduce comment 0, is it right?
Thanks

Comment 11 Douglas Schilling Landgraf 2016-10-18 04:40:50 UTC
(In reply to Huijuan Zhao from comment #10)
> Douglas, is it enough to reproduce this bug as comment 9 ? I do not have to
> reproduce comment 0, is it right?
> Thanks

That's the way, I could reproduce to what I believe it's the report. For now, I am discussing with Fabian about this one. Even seeting PEERDNS="no" didn't help as init script call configure_dns() with no args and 'reset' /etc/resolv.conf.

We could check if users set PEERDNS="no" in ifcfg and do not reset /etc/resolv.conf in the "force mode" as we do today.

Comment 15 Douglas Schilling Landgraf 2017-01-31 14:43:41 UTC
This is very specific RHEV-H 3.x behaviour. The RHVH 4.x doesn't have the same boot sequence and is not affected by this report. 

The workaround for 3.6 is changing the resolv.conf manually.