Bug 2179718

Summary: [RHEL-9.3][AWS]nm-cloud-setup fail to get EC2 metadata: timeout expired(NetworkManager-cloud-setup)
Product: Red Hat Enterprise Linux 9 Reporter: libhe
Component: NetworkManagerAssignee: Lubomir Rintel <lrintel>
Status: VERIFIED --- QA Contact: David Jaša <djasa>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.3CC: bgalvani, djasa, linl, lrintel, rkhan, sfaye, sukulkar, thaller, till, vbenes, xiliang, ymao
Target Milestone: rcKeywords: CustomerScenariosInitiative, Regression, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.43.5-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description libhe 2023-03-20 03:27:36 UTC
Description of problem:

nm-cloud-setup service fails to get EC2 metadata: timeout expired. As a result, our automation test case os_tests.tests.test_network_test.TestNetworkTest.test_second_ip_hotplug failed to get the second ip.IP.

Below are part of trace logs of nm-cloud-setup service:
"http://169.254.169.254/latest/api/token"]: start get ...
Mar 20 01:52:45 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [899.08098] <debug> http-request[d8da25f9ee1b9fbc, "http://169.254.169.254/latest/api/token"]: success getting 0 bytes (response code 405)
Mar 20 01:52:45 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [899.08098] <trace> http-request[d8da25f9ee1b9fbc, "http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08109] <debug> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: start get ...
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08117] <debug> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: success getting 0 bytes (response code 405)
Mar 20 01:52:46 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [900.08117] <trace> http-request[9147277c3b08c9b9, "http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08049] <info>  provider ec2 not detected: failure to get EC2 metadata: timeout expired
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08051] <info>  no provider detected
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Deactivated successfully.
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: Finished Automatically configure NetworkManager in cloud.

RHEL Version:
RHEL9.3(5.14.0-285.el9.x86_64)
NetworkManager-cloud-setup-1.43.3-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Launch an aws instance with the latest RHEL-9.3 AMI.
2.Enable trace " Environment=NM_CLOUD_SETUP_LOG=TRACE" for nm-cloud-setup service
3. Restart nm-cloud-setup service
4. Check journal log 

Actual results:
"http://169.254.169.254/latest/api/token"]: received 0 bytes: [[(null)]]
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08049] <info>  provider ec2 not detected: failure to get EC2 metadata: timeout expired
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal nm-cloud-setup[1263]: [901.08051] <info>  no provider detected
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Deactivated successfully.
Mar 20 01:52:47 ip-10-116-2-10.us-west-2.compute.internal systemd[1]: Finished Automatically configure NetworkManager in cloud.

See more logs at http://10.73.196.244/results/iscsi/ec2_xen/tmo/journal_nm_cloud_setup.log

Expected results:
nm-cloud-setup is able to get EC2 metadata.

Additional info:
- No such issue in RHEL-9.2 with NetworkManager-cloud-setup-1.42.2-1.el9.x86_64

Comment 1 Thomas Haller 2023-03-27 08:38:54 UTC
bug 2181466 is the same issue for rhel-8.9

Comment 4 David Jaša 2023-05-01 19:44:37 UTC
VERIFIED: Fixed in NetworkManager-1.43.6-1.el9 together with bug 2151986, HTTP method PUT is now correctly used:

# tcpdump -i any -w .tmp/ec2.pcap 'host 169.254.169.254
(...)

# tshark -r .tmp/ec2.pcap -V -2 -R http
(...)
Hypertext Transfer Protocol
    PUT /latest/api/token HTTP/1.1\r\n
        [Expert Info (Chat/Sequence): PUT /latest/api/token HTTP/1.1\r\n]
            [PUT /latest/api/token HTTP/1.1\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Method: PUT
        Request URI: /latest/api/token
        Request Version: HTTP/1.1
    Host: 169.254.169.254\r\n
    Accept: */*\r\n
    X-aws-ec2-metadata-token-ttl-seconds: 180\r\n
    \r\n
    [Full request URI: http://169.254.169.254/latest/api/token]
    [HTTP request 1/1]
    [Response in frame: 2]