Bug 1613094
Summary: | 99-origin-dns.sh cannot handle unexpected order of ip command columns | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Daein Park <dapark> |
Component: | Installer | Assignee: | Michael Gugino <mgugino> |
Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | aos-bugs, dapark, jokerman, mmccomas |
Target Milestone: | --- | ||
Target Release: | 3.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-11 07:24:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daein Park
2018-08-07 00:57:58 UTC
Above workaround has typo, the following one is correct one. ~~ def_route_ip=$(/sbin/ip route get to ${def_route} | awk -F 'src' '{print $2}' | awk '{print $1}') ~~ And I've opened a PR here: https://github.com/openshift/openshift-ansible/pull/9448 For suppressing issues due to dependences of output order, we should filter to get required values. Do you know what needs to be done to get a host into the condition where it fails? I'm not familiar enough with routing tables to know what introduced the problematic line but it'd be helpful for QE to verify the fix. @Scott, That's good point. Usual "ip route get to" output format is as follows. # /sbin/ip route get to 10.0.1.68 10.0.1.68 dev eth0 src 10.0.1.68 cache --- return the network interface # /sbin/ip route get to 10.0.1.68 | awk '{print $3}' eth0 --- return the src ip address(eth0 ip address) # /sbin/ip route get to 10.0.1.68 | awk '{print $5}' 10.0.1.68 My case is as follows. # /sbin/ip route get to 10.0.1.68 local 10.0.1.68 dev lo src 10.0.1.68 cache <local> --- not returning device name # /sbin/ip route get to 10.0.1.68 | awk '{print $3}' dev --- not returning ip address using the device. # /sbin/ip route get to 10.0.1.68 | awk '{print $5}' src Should be in openshift-ansible-3.11.0-0.15.0 Go through all the comments, seem like this is related to user specific network environment, user is running node install on a host, and the host's eth0 is the Gateway of local network. In QE's environment, the network Gateway is always located in other place (out of my control), never run a node install on network Gateway, I am not a expert of network, I have no way to run a real verification just like customer's environment. I only run some regression testing against the latest playbook to avoid introduce some new install, result seem good to me. Version: openshift-ansible-3.11.0-0.16.0.git.0.e82689aNone.noarch In my env, my eth0 ip is "172.18.14.180", and Gateway is "172.18.0.1" #[root@ip-172-18-14-180 ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 0e:d7:a3:c0:67:28 brd ff:ff:ff:ff:ff:ff inet 172.18.14.180/20 brd 172.18.15.255 scope global noprefixroute dynamic eth0 valid_lft 2651sec preferred_lft 2651sec inet6 fe80::cd7:a3ff:fec0:6728/64 scope link valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:a5:71:e3:d4 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever [root@ip-172-18-14-180 ~]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 172.18.0.1 0.0.0.0 UG 100 0 0 eth0 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 172.18.0.0 0.0.0.0 255.255.240.0 U 100 0 0 eth0 [root@ip-172-18-14-180 ~]# cat /etc/resolv.conf # nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh # Generated by NetworkManager search cluster.local ec2.internal nameserver 172.18.14.180 /etc/resolv.conf is updated and pointing to node dnsmasq successfully. On the env, run some command to add Gateway IP to the eth0 interface to emulate the customer env. [root@ip-172-18-14-180 ~]# ip addr add 172.18.0.1 dev eth0 [root@ip-172-18-14-180 ~]# ip addr <--snip--> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 0e:d7:a3:c0:67:28 brd ff:ff:ff:ff:ff:ff inet 172.18.14.180/20 brd 172.18.15.255 scope global noprefixroute dynamic eth0 valid_lft 2561sec preferred_lft 2561sec inet 172.18.0.1/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::cd7:a3ff:fec0:6728/64 scope link valid_lft forever preferred_lft forever <--snip--> [root@ip-172-18-14-180 ~]# /sbin/ip route get to 172.18.0.1 local 172.18.0.1 dev lo src 172.18.0.1 cache <local> Now the output is the same as user case, restart NetworkManager, /etc/resolv.conf is updated, but not pointing to node dnsmasq as expected result. [root@ip-172-18-14-180 ~]# cat /etc/resolv.conf # Generated by NetworkManager search ec2.internal nameserver 172.18.0.2 Adding some echo into /etc/NetworkManager/dispatcher.d/99-origin-dns.sh to do some debug. <--snip--> def_route=$(/sbin/ip route list match 0.0.0.0/0 | awk '{print $3 }') def_route_int=$(/sbin/ip route get to ${def_route} | awk -F 'dev' '{print $2}' | head -n1 | awk '{print $1}') def_route_ip=$(/sbin/ip route get to ${def_route} | awk -F 'src' '{print $2}' | head -n1 | awk '{print $1}') echo "def_route_int=${def_route_int} def_route_ip=${def_route_ip} DEVICE_IFACE=${DEVICE_IFACE}" >/tmp/test <--snip--> [root@ip-172-18-14-180 ~]# cat /tmp/test def_route_int=lo def_route_ip=172.18.0.1 DEVICE_IFACE=eth0 The PR is working well as expectation, but ${DEVICE_IFACE} != ${def_route_int}, that would lead the following code would be skip, no any update to /etc/resolv.conf. I am not sure what is the network environment in customer case, only could be double-confirmed by customer (according initial report and comment 1, seem like the PR is working well against customer env) PR is working well like what is expected by reporter, and no regression is introduced, so I move this bug to VERIFIED, if not that case, feel free to move back and provide more info about how to re-create such special network env. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |