Bug 2179718

Summary: [RHEL-9.3][AWS]nm-cloud-setup fail to get EC2 metadata: timeout expired(NetworkManager-cloud-setup)
Product: Red Hat Enterprise Linux 9 Reporter: libhe
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: CLOSED ERRATA QA Contact: David Jaša <djasa>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.3CC: bgalvani, djasa, fge, linl, lrintel, rkhan, sfaye, sukulkar, thaller, till, vbenes, xiliang, ymao
Target Milestone: rcKeywords: CustomerScenariosInitiative, Regression, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.43.5-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:38:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description libhe 2023-03-20 03:27:36 UTC
Description of problem:

nm-cloud-setup service fails to get EC2 metadata: timeout expired. As a result, our automation test case os_tests.tests.test_network_test.TestNetworkTest.test_second_ip_hotplug failed to get the second ip.IP.

Below are part of trace logs of nm-cloud-setup service:
"http://169.254.169.254/latest/api/token"]: start get ...
Mar 20 01:52:45 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [899.08098] <debug> http-request[d8da25f9ee1b9fbc, "http://169.254.169.254/latest/api/token"]: success getting 0 bytes (response code 405)
Mar 20 01:52:45 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [899.08098] <trace> http-request[d8da25f9ee1b9fbc, "http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08109] <debug> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: start get ...
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08117] <debug> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: success getting 0 bytes (response code 405)
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08117] <trace> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08049] <info>  provider ec2 not detected: failure to get EC2 metadata: timeout expired
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08051] <info>  no provider detected
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Deactivated successfully.
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: Finished Automatically configure NetworkManager in cloud.

RHEL Version:
RHEL9.3(5.14.0-285.el9.x86_64)
NetworkManager-cloud-setup-1.43.3-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Launch an aws instance with the latest RHEL-9.3 AMI.
2.Enable trace " Environment=NM_CLOUD_SETUP_LOG=TRACE" for nm-cloud-setup service
3. Restart nm-cloud-setup service
4. Check journal log 

Actual results:
"http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08049] <info>  provider ec2 not detected: failure to get EC2 metadata: timeout expired
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08051] <info>  no provider detected
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Deactivated successfully.
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: Finished Automatically configure NetworkManager in cloud.

See more logs at http://10.73.196.244/results/iscsi/ec2_xen/tmo/journal_nm_cloud_setup.log

Expected results:
nm-cloud-setup is able to get EC2 metadata.

Additional info:
- No such issue in RHEL-9.2 with NetworkManager-cloud-setup-1.42.2-1.el9.x86_64

Comment 1 Thomas Haller 2023-03-27 08:38:54 UTC
bug 2181466 is the same issue for rhel-8.9

Comment 4 David Jaša 2023-05-01 19:44:37 UTC
VERIFIED: Fixed in NetworkManager-1.43.6-1.el9 together with bug 2151986, HTTP method PUT is now correctly used:

# tcpdump -i any -w .tmp/ec2.pcap 'host 169.254.169.254
(...)

# tshark -r .tmp/ec2.pcap -V -2 -R http
(...)
Hypertext Transfer Protocol
    PUT /latest/api/token HTTP/1.1\r\n
        [Expert Info (Chat/Sequence): PUT /latest/api/token HTTP/1.1\r\n]
            [PUT /latest/api/token HTTP/1.1\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Method: PUT
        Request URI: /latest/api/token
        Request Version: HTTP/1.1
    Host: 169.254.169.254\r\n
    Accept: */*\r\n
    X-aws-ec2-metadata-token-ttl-seconds: 180\r\n
    \r\n
    [Full request URI: http://169.254.169.254/latest/api/token]
    [HTTP request 1/1]
    [Response in frame: 2]

Comment 8 errata-xmlrpc 2023-11-07 08:38:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6585