Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2003231

Summary: anything above 19.2 of cloud init it fails to assign default route and connect to the meta data service
Product: Red Hat Enterprise Linux 7 Reporter: Jean-Francois Beaudoin <jbeaudoi>
Component: cloud-initAssignee: Emanuele Giuseppe Esposito <eesposit>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.9CC: adevolder, bgalvani, eesposit, eterrell, huzhao, jgreguske, jreznik, ldu, lrintel, mrezanin, nm-team, pveiga, ribarry, thaller, till, xiachen, xiliang, yacao, yuxisun
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: cloud-init-19.4-7.el7_9.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-22 17:03:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2028028, 2028031    

Description Jean-Francois Beaudoin 2021-09-10 17:35:10 UTC
Description of problem:
Anything above 19.2 of cloud init it fails to assign default route and connect to the meta data service

Version-Release number of selected component (if applicable):
cloud-init-20.3-10.el8_4.2.noarch                           Tue May  4 13:20:11 2021
cloud-utils-growpart-0.31-1.el8.noarch                      Tue May  4 13:20:14 2021


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:
Being able to assign route and connect to metadata service.


Additional info:
We could also confirm the same with RHEL 8.4

Observed in cloud-init log;
~~~
2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 297 bytes from /proc/855/stat
2021-09-09 19:28:34,077 - dhcp.py[DEBUG]: killing dhclient with pid=855
2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases (quiet=False)
2021-09-09 19:28:34,078 - util.py[DEBUG]: Read 640 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases
2021-09-09 19:28:34,078 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.140.248/255.255.252.0
2021-09-09 19:28:34,079 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.140.248/22 brd 10.112.143.255
2021-09-09 19:28:34,079 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.140.248/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,083 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,087 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '0.0.0.0', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,089 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,092 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: FAIL: no local data found from DataSourceOpenStackLocal
2021-09-09 19:28:34,092 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
2021-09-09 19:28:34,092 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 770, in find_source
    if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 659, in update_metadata
    result = self.get_data()
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 278, in get_data
    return_value = self._get_data()
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 134, in _get_data
    with EphemeralDHCPv4(self.fallback_interface):
  File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 57, in __enter__
    return self.obtain_lease()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 110, in obtain_lease
    ephipv4.__enter__()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1004, in __enter__
    self._bringup_static_routes()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1058, in _bringup_static_routes
    ['dev', self.interface], capture=True)
  File "/usr/lib/python3.6/site-packages/cloudinit/subp.py", line 293, in subp
    cmd=args)
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0']
Exit code: 2
Reason: -
Stdout: 
Stderr: RTNETLINK answers: File exists
2021-09-09 19:28:34,102 - main.py[DEBUG]: No local datasource found
2021-09-09 19:28:34,102 - util.py[DEBUG]: Reading from /sys/class/net/eth0/address (quiet=False)
2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/eth0/address
2021-09-09 19:28:34,103 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2021-09-09 19:28:34,103 - __init__.py[DEBUG]: Stable ifnames disabled by net.ifnames=0 in /proc/cmdline
2021-09-09 19:28:34,103 - util.py[DEBUG]: Reading from /sys/class/net/eth0/carrier (quiet=False)
2021-09-09 19:28:34,103 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/carrier
~~~

Comment 6 Huijuan Zhao 2021-09-30 08:02:35 UTC
I agree with Priscila that this is more like a configuration issue, as it is failed when manually setting two routes (the default and a static) to the same IP and interface.

I can not reproduce this issue with normal network(no duplicate routes) with the same cloud-init version as customer(cloud-init-20.3-10.el8_4.2.noarch).

I checked the sosreport-hoontestserver84nodrive-03028339-2021-09-09-prwgwbz.tar.xz customer provided, found the duplicate routes configuration maybe comes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases

-----------------------------
2021-09-09 19:28:34,021 - dhcp.py[DEBUG]: Performing a dhcp discovery on eth0
2021-09-09 19:28:34,021 - util.py[DEBUG]: Copying /usr/sbin/dhclient to /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient
2021-09-09 19:28:34,023 - subp.py[DEBUG]: Running command ['ip', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,037 - subp.py[DEBUG]: Running command ['/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient', '-1', '-v', '-lf', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases', '-pf', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid', 'eth0', '-sf', '/bin/true'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,066 - util.py[DEBUG]: All files appeared after 0 seconds: ['/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid', '/var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases']
2021-09-09 19:28:34,066 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid (quiet=False)
2021-09-09 19:28:34,066 - util.py[DEBUG]: Read 4 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid
2021-09-09 19:28:34,066 - util.py[DEBUG]: Reading from /proc/852/stat (quiet=True)
2021-09-09 19:28:34,066 - util.py[DEBUG]: Read 0 bytes from /proc/852/stat
2021-09-09 19:28:34,076 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid (quiet=False)
2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 4 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhclient.pid
2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /proc/855/stat (quiet=True)
2021-09-09 19:28:34,077 - util.py[DEBUG]: Read 297 bytes from /proc/855/stat
2021-09-09 19:28:34,077 - dhcp.py[DEBUG]: killing dhclient with pid=855
2021-09-09 19:28:34,077 - util.py[DEBUG]: Reading from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases (quiet=False)
2021-09-09 19:28:34,078 - util.py[DEBUG]: Read 640 bytes from /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases
2021-09-09 19:28:34,078 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.140.248/255.255.252.0
2021-09-09 19:28:34,079 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.140.248/22 brd 10.112.143.255
2021-09-09 19:28:34,079 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.140.248/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,083 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,087 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '0.0.0.0', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,089 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-09 19:28:34,092 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: FAIL: no local data found from DataSourceOpenStackLocal
2021-09-09 19:28:34,092 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
2021-09-09 19:28:34,092 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceOpenStack.DataSourceOpenStackLocal'> failed
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 770, in find_source
    if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 659, in update_metadata
    result = self.get_data()
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/__init__.py", line 278, in get_data
    return_value = self._get_data()
  File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceOpenStack.py", line 134, in _get_data
    with EphemeralDHCPv4(self.fallback_interface):
  File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 57, in __enter__
    return self.obtain_lease()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/dhcp.py", line 110, in obtain_lease
    ephipv4.__enter__()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1004, in __enter__
    self._bringup_static_routes()
  File "/usr/lib/python3.6/site-packages/cloudinit/net/__init__.py", line 1058, in _bringup_static_routes
    ['dev', self.interface], capture=True)
  File "/usr/lib/python3.6/site-packages/cloudinit/subp.py", line 293, in subp
    cmd=args)
cloudinit.subp.ProcessExecutionError: Unexpected error while running command.
Command: ['ip', '-4', 'route', 'add', '169.254.169.254/31', 'via', '10.112.140.248', 'dev', 'eth0']
Exit code: 2
Reason: -
Stdout:
Stderr: RTNETLINK answers: File exists
-----------------------------


I also checked the cloud-init.log_18.5 which does not has this issue, there is no actions of config duplicate routes.
--------------------------------
2021-09-15 18:50:54,705 - util.py[DEBUG]: Read 568 bytes from /var/tmp/cloud-init/cloud-init-dhcp-lDKZf8/dhcp.leases
2021-09-15 18:50:54,706 - dhcp.py[DEBUG]: Received dhcp lease on eth0 for 10.112.143.127/255.255.252.0
2021-09-15 18:50:54,706 - __init__.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.112.143.127/22 brd 10.112.143.255
2021-09-15 18:50:54,706 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.112.143.127/22', 'broadcast', '10.112.143.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2021-09-15 18:50:54,714 - util.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2021-09-15 18:50:54,719 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.002 seconds
--------------------------------


@jbeaudoi, could you please help to check and provide the file /var/tmp/cloud-init/cloud-init-dhcp-f0rie5tm/dhcp.leases? As no this file in sosreport.
And could you please also check/provide the below file/config:
1. The file /run/cloud-init/instance-data.json
2. Are the two systems(cloud-init-20.3-10.el8_4.2, cloud-init-18.5) using the same network during instances deployment?

Thanks!

Comment 15 Beniamino Galvani 2021-10-25 11:50:47 UTC
dhclient applies lease configuration through dhclient-script, and the script calls "ip route replace" instead of "ip route add":

https://src.fedoraproject.org/rpms/dhcp/blob/f33/f/dhclient-script#_474

NetworkManager uses netlink commands equivalent of "ip route append" to configure routes, and has additional logic to delete unwanted existing routes.

Comment 16 Huijuan Zhao 2021-10-26 01:37:14 UTC
(In reply to Beniamino Galvani from comment #15)
> dhclient applies lease configuration through dhclient-script, and the script
> calls "ip route replace" instead of "ip route add":
> 
> https://src.fedoraproject.org/rpms/dhcp/blob/f33/f/dhclient-script#_474
> 
> NetworkManager uses netlink commands equivalent of "ip route append" to
> configure routes, and has additional logic to delete unwanted existing
> routes.

Thank you so much Beniamino and Till.

Eduardo, could cloud-init refer to the solution Beniamino provided? 
How about using "ip route replace" instead of "ip route add", and adding additional logic to handle the specific routes such as 0.0.0.0 or unwanted existing routes(e.g. when the same destination has gw 0.0.0.0 and other route, choose other route instead of 0.0.0.0)

Comment 43 Huijuan Zhao 2022-01-26 03:07:34 UTC
Move to VERIFIED per comment 40

Comment 48 errata-xmlrpc 2022-02-22 17:03:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (cloud-init bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0626