Bug 2003231
| Summary: | anything above 19.2 of cloud init it fails to assign default route and connect to the meta data service | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jean-Francois Beaudoin <jbeaudoi> |
| Component: | cloud-init | Assignee: | Emanuele Giuseppe Esposito <eesposit> |
| Status: | CLOSED ERRATA | QA Contact: | Huijuan Zhao <huzhao> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.9 | CC: | adevolder, bgalvani, eesposit, eterrell, huzhao, jgreguske, jreznik, ldu, lrintel, mrezanin, nm-team, pveiga, ribarry, thaller, till, xiachen, xiliang, yacao, yuxisun |
| Target Milestone: | rc | Keywords: | Triaged, ZStream |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | cloud-init-19.4-7.el7_9.6 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-02-22 17:03:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2028028, 2028031 | ||
I agree with Priscila that this is more like a configuration issue, as it is failed when manually setting two routes (the default and a static) to the same IP and interface.
I can not reproduce this issue with normal network(no duplicate routes) with the same cloud-init version as customer(cloud-init-20.3-10.el8_4.2.noarch).
I checked the sosreport-hoontestserver84nodrive-03028339-2021-09-09-prwgwbz.tar.xz customer provided, found the duplicate routes configuration maybe comes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases
-----------------------------
2021-09-09 19:28:34,021 - dhcp.py[DEBUG]: Performing a dhcp discovery on eth0
2021-09-09 19:28:34,021 - util.py[DEBUG]: Copying /usr/sbin/dhclient to /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient
2021-09-09 19:28:34,023 - subp.py[DEBUG]: Running command ['ip', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,037 - subp.py[DEBUG]: Running command ['/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid', 'eth0', '-sf', '/bin/true'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,066 - util.py[DEBUG]: All files appeared after 0 seconds: ['/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases']
2021-09-09 19:28:34,066 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid (quiet=False)
2021-09-09 19:28:34,066 - util.py[DEBUG]: Read 4 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid
2021-09-09 19:28:34,066 - util.py[DEBUG]: Reading from /proc/852/stat (quiet=True)
2021-09-09 19:28:34,066 - util.py[DEBUG]: Read 0 bytes from /proc/852/stat
2021-09-09 19:28:34,076 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid (quiet=False)
2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 4 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid
2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /proc/855/stat (quiet=True)
2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 297 bytes from /proc/855/stat
2021-09-09 19:28:34,077 - dhcp.py[DEBUG]: killing dhclient with pid=855
2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases (quiet=False)
2021-09-09 19:28:34,078 - util.py[DEBUG]: Read 640 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases
2021-09-09 19:28:34,078 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.140.248/255.255.252.0
2021-09-09 19:28:34,079 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.140.248/22 brd 10.112.143.255
2021-09-09 19:28:34,079 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.140.248/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,083 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,087 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '0.0.0.0', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,089 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,092 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: FAIL: no local data found from DataSourceOpenStackLocal
2021-09-09 19:28:34,092 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
2021-09-09 19:28:34,092 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 770, in find_source
if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 659, in update_metadata
result = self.get_data()
File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 278, in get_data
return_value = self._get_data()
File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 134, in _get_data
with EphemeralDHCPv4(self.fallback_interface):
File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 57, in __enter__
return self.obtain_lease()
File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 110, in obtain_lease
ephipv4.__enter__()
File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1004, in __enter__
self._bringup_static_routes()
File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1058, in _bringup_static_routes
['dev', self.interface], capture=True)
File "/usr/lib/python3.6/site-packages/cloudinit/subp.py", line 293, in subp
cmd=args)
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0']
Exit code: 2
Reason: -
Stdout:
Stderr: RTNETLINK answers: File exists
-----------------------------
I also checked the cloud-init.log_18.5 which does not has this issue, there is no actions of config duplicate routes.
--------------------------------
2021-09-15 18:50:54,705 - util.py[DEBUG]: Read 568 bytes from /var/tmp/cloud-init/cloud-init-dhcp-lDKZf8/dhcp.leases
2021-09-15 18:50:54,706 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.143.127/255.255.252.0
2021-09-15 18:50:54,706 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.143.127/22 brd 10.112.143.255
2021-09-15 18:50:54,706 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.143.127/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-15 18:50:54,714 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-15 18:50:54,719 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.002 seconds
--------------------------------
@jbeaudoi, could you please help to check and provide the file /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases? As no this file in sosreport.
And could you please also check/provide the below file/config:
1. The file /run/cloud-init/instance-data.json
2. Are the two systems(cloud-init-20.3-10.el8_4.2, cloud-init-18.5) using the same network during instances deployment?
Thanks!
dhclient applies lease configuration through dhclient-script, and the script calls "ip route replace" instead of "ip route add": https://src.fedoraproject.org/rpms/dhcp/blob/f33/f/dhclient-script#_474 NetworkManager uses netlink commands equivalent of "ip route append" to configure routes, and has additional logic to delete unwanted existing routes. (In reply to Beniamino Galvani from comment #15) > dhclient applies lease configuration through dhclient-script, and the script > calls "ip route replace" instead of "ip route add": > > https://src.fedoraproject.org/rpms/dhcp/blob/f33/f/dhclient-script#_474 > > NetworkManager uses netlink commands equivalent of "ip route append" to > configure routes, and has additional logic to delete unwanted existing > routes. Thank you so much Beniamino and Till. Eduardo, could cloud-init refer to the solution Beniamino provided? How about using "ip route replace" instead of "ip route add", and adding additional logic to handle the specific routes such as 0.0.0.0 or unwanted existing routes(e.g. when the same destination has gw 0.0.0.0 and other route, choose other route instead of 0.0.0.0) Move to VERIFIED per comment 40 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (cloud-init bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0626 |
Description of problem: Anything above 19.2 of cloud init it fails to assign default route and connect to the meta data service Version-Release number of selected component (if applicable): cloud-init-20.3-10.el8_4.2.noarch Tue May 4 13:20:11 2021 cloud-utils-growpart-0.31-1.el8.noarch Tue May 4 13:20:14 2021 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Being able to assign route and connect to metadata service. Additional info: We could also confirm the same with RHEL 8.4 Observed in cloud-init log; ~~~ 2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 297 bytes from /proc/855/stat 2021-09-09 19:28:34,077 - dhcp.py[DEBUG]: killing dhclient with pid=855 2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases (quiet=False) 2021-09-09 19:28:34,078 - util.py[DEBUG]: Read 640 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases 2021-09-09 19:28:34,078 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.140.248/255.255.252.0 2021-09-09 19:28:34,079 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.140.248/22 brd 10.112.143.255 2021-09-09 19:28:34,079 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.140.248/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2021-09-09 19:28:34,083 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True) 2021-09-09 19:28:34,087 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '0.0.0.0', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2021-09-09 19:28:34,089 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True) 2021-09-09 19:28:34,092 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: FAIL: no local data found from DataSourceOpenStackLocal 2021-09-09 19:28:34,092 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed 2021-09-09 19:28:34,092 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 770, in find_source if s.update_metadata([EventType.BOOT_NEW_INSTANCE]): File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 659, in update_metadata result = self.get_data() File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 278, in get_data return_value = self._get_data() File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 134, in _get_data with EphemeralDHCPv4(self.fallback_interface): File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 57, in __enter__ return self.obtain_lease() File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 110, in obtain_lease ephipv4.__enter__() File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1004, in __enter__ self._bringup_static_routes() File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1058, in _bringup_static_routes ['dev', self.interface], capture=True) File "/usr/lib/python3.6/site-packages/cloudinit/subp.py", line 293, in subp cmd=args) cloudinit.subp.ProcessExecutionError: Unexpected error while running command. Command: ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0'] Exit code: 2 Reason: - Stdout: Stderr: RTNETLINK answers: File exists 2021-09-09 19:28:34,102 - main.py[DEBUG]: No local datasource found 2021-09-09 19:28:34,102 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False) 2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/eth0/address 2021-09-09 19:28:34,103 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False) 2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address 2021-09-09 19:28:34,103 - __init__.py[DEBUG]: Stable ifnames disabled by net.ifnames=0 in /proc/cmdline 2021-09-09 19:28:34,103 - util.py[DEBUG]: Reading from /sys/class/net/eth0/carrier (quiet=False) 2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/carrier ~~~