Bug 2163657
Summary: | [RHEL-8]Launching EC2 Instance in IPv6-Only subnet leads to unreachable instance | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Neil Hanlon <neil> | ||||
Component: | cloud-init | Assignee: | Ani Sinha <anisinha> | ||||
Status: | CLOSED MIGRATED | QA Contact: | xiachen | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | CentOS Stream | CC: | andavis, anisinha, b2b-redhat-augustineas-2011, bstinson, carl, eesposit, eterrell, huzhao, jgreguske, jwboyer, riehecky, toracat, xiliang, yacao | ||||
Target Milestone: | rc | Keywords: | MigratedToJIRA, Triaged | ||||
Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 2227771 (view as bug list) | Environment: | |||||
Last Closed: | 2023-09-22 15:44:28 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 2172811 | ||||||
Bug Blocks: | 2227771 | ||||||
Attachments: |
|
Description
Neil Hanlon
2023-01-24 06:08:38 UTC
This is also present in RHEL/CentOS Stream 9. Should I file another bug against that product as well? I am not sure what the proper procedure is. Thank you! Hi Huijuan, While investigating this I found a handful of other bugs which I feel should be addressed in RHEL 8 (and 9). I rebased cloud-init to version 22.2.2 in a copr [1], and have tested this on Rocky Linux 8 and 9. In addition to the IPv6 issue in this issue, the following problems are also resolved by this rebase: * cloud-init no longer poisons the /etc/NetworkManager/NetworkManager.conf file owned by the NetworkManager package * The system behaviour after cloud-init is now consistent with the RHEL docs which states that keyfiles have priority over ifcfg * cloud-init no longer generates antique ifcfg files and generates keyfiles instead * The new cloud-init generated network configuration has priority 0 (the default) making it possible to override the cloud-init configuration using your own keyfiles. (Previously, the generated ifcfg files had maximum priority and couldn't be overridden). Given this information, I would like to suggest (and request) that cloud-init be rebased to 22.2.2 for RHEL 8 and 9. I understand that RHEL 8 is coming to a point where it may not be possible to perform this rebase. If so, I still think RHEL 9 would benefit from this change. I am happy to close this ticket in favor of new ones requesting a rebase in RHEL 9, if that would be preferable. Please let me know your thoughts and if there are any further clarifications I can provide. Best, Neil [1] https://copr.fedorainfracloud.org/coprs/neil/cloud-init/ Hi Neil, Thanks a lot for raising the issue and all the investigations. Regarding the investigations in comment 2, you are right, rebasing to version 22.2 will change the network config methods, from ifcfg to keyfiles. This change comes from patch[1], which was added to cloud-init-22.2 [1] https://github.com/canonical/cloud-init/pull/1224 We backported the patch[1] to 22.1, but there were some additional issues in rhel produced by patch[1], due to the release time and resource limit, we dropped it from 22.1. Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=2118235 for detail information. We will rebase to 22.2 or higher version in next release when the resource is available. For this issue described in comment 0, not sure if we can catch up with this release(rhel-8.8/9.1) if only backport the patch[2]. Will discuss this in team ASAP. [2] https://github.com/canonical/cloud-init/pull/1160/ And for this issue, I have a question about the test steps/environment. Could you please help to clarify it? Thanks! For the IPv6-Only subnet, there is no ipv4 address, right? If yes, does the metadata http service(169.254.169.254) has ipv6 address to make the IPv6-Only network reach out it to get the metadata? Sharing some information, IPv6 endpoints has been available on AWS for the Amazon EC2 Instance Metadata Service since Aug 2021. https://aws.amazon.com/about-aws/whats-new/2021/08/Ipv6-amazon-ec2-metadata-time-sync-vpc-dns/ About how to retrieve instance metadata, I got more information from AWS docs https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-ipv6-only-subnets-and-ec2-instances/ IPv4 http://169.254.169.254/latest/meta-data/ IPv6 http://[fd00:ec2::254]/latest/meta-data/ Just as this bug description, support was added in cloud-init by commit: https://github.com/canonical/cloud-init/pull/1160/ cloudinit/sources/DataSourceEc2.py metadata_urls = [ "http://169.254.169.254", "http://[fd00:ec2::254]", "http://instance-data.:8773", ] In addition, The Support for IPv6 metadata to `DataSourceOpenStack` was added in cloud-init by another commit, which is included in upstream 22.4.1 https://github.com/canonical/cloud-init/pull/1805 - Add openstack IPv6 metadata url `fe80::a9fe:a9fe` - Enable requesting multiple metadata sources in parallel This PR is very similar to #1160, reusing the provided `url_heper` logic. https://bugs.launchpad.net/cloud-init/+bug/1906849 (In reply to Huijuan Zhao from comment #3) > Hi Neil, > > Thanks a lot for raising the issue and all the investigations. > > Regarding the investigations in comment 2, you are right, rebasing to > version 22.2 will change the network config methods, from ifcfg to keyfiles. > This change comes from patch[1], which was added to cloud-init-22.2 > [1] https://github.com/canonical/cloud-init/pull/1224 > > We backported the patch[1] to 22.1, but there were some additional issues in > rhel produced by patch[1], due to the release time and resource limit, we > dropped it from 22.1. Please refer to > https://bugzilla.redhat.com/show_bug.cgi?id=2118235 for detail information. That issue is restricted, but I understand the scenario and why it was dropped. > > We will rebase to 22.2 or higher version in next release when the resource > is available. Thank you! > > For this issue described in comment 0, not sure if we can catch up with this > release(rhel-8.8/9.1) if only backport the patch[2]. Will discuss this in > team ASAP. > [2] https://github.com/canonical/cloud-init/pull/1160/ > Again, thank you. I appreciate it. As I mentioned in comment 0, I was able to backport patch[2]. I have attached said patch (9998-Add-Ec2-IPV6-IMDS.patch) to this ticket for posterity, and in case it is helpful. > > And for this issue, I have a question about the test steps/environment. > Could you please help to clarify it? Thanks! > For the IPv6-Only subnet, there is no ipv4 address, right? If yes, does the > metadata http service(169.254.169.254) has ipv6 address to make the > IPv6-Only network reach out it to get the metadata? Absolutely. In this test scenario, the server has no IPv4 address or route table besides the loopback. This is a relative new feature[1a] and is probably a fairly rare use case, but it does seem like a good thing to fix (along with the similar OpenStack patch that was mentioned in comment 4 (Thank you!)). So in this case, the current behavior is that cloud-init eventually times out due to `no route to host`, since there is no route installed for ipv4 in this subnet. There are instructions for setting up such a subnet here: [2b] Thank you! [1a] https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-ipv6-only-subnets-and-ec2-instances/ [2b] https://docs.aws.amazon.com/vpc/latest/userguide/vpc-subnets-commands-example-ipv6-v2.html Created attachment 1945360 [details]
backport of EC2 IPv6 fix & tests to EL8
Thanks Neil and Amy for the update/information. Not sure if we have enough resource/time to backport this patch to rhel-8.8/9.2, there were maybe some conflict, just like Neil mentioned in comment 0: "There were three total chunks which failed to apply cleanly; two of them were in tests and trivial to fix. One patch for cloudinit/url_helper.py needed to be rebased slightly to match the EL8 source." Emanuele, could you please help to review and evaluate this issue? Is it possible to backport the patch in rhel-8.8/9.2 or rebase in next release? Thanks! Filed another bug BZ#2171957 to track the Support for IPv6 metadata to `DataSourceOpenStack`. (In reply to Emanuele Giuseppe Esposito from comment #9) > Hi Huijuan, do you think the issue is high priority enough to ask for > exception+? > > Otherwise we could wait for the next rebase. Thanks Emanuele, I agree with Neil's comment 5 this is probably a fairly rare use case, IMO it is ok to wait for the next rebase. Neil, is it ok to fix the issue in next rebase/release? As the rhel-8.8/9.2 is coming to the exception+ phase, it means high priority bugs can still be fixed currently. (In reply to Huijuan Zhao from comment #10) > (In reply to Emanuele Giuseppe Esposito from comment #9) > > Hi Huijuan, do you think the issue is high priority enough to ask for > > exception+? > > > > Otherwise we could wait for the next rebase. > > Thanks Emanuele, I agree with Neil's comment 5 this is probably a fairly > rare use case, IMO it is ok to wait for the next rebase. > > Neil, is it ok to fix the issue in next rebase/release? As the rhel-8.8/9.2 > is coming to the exception+ phase, it means high priority bugs can still be > fixed currently. Thank you Huijuan, Emanuele - I think it is acceptable to wait until the next release for this bug (in comment 0 ) related to IPv6 in AWS. Regarding the other bugs patched in 22.2 mentioned in comment 2 - it would be great if these could be fixed in the current release, but I understand the timelines and deadlines for the release support. I am OK providing an updated cloud-init for my purposes via my COPR repository until 8.8 and 9.2 are released. Best, Neil (In reply to Neil Hanlon from comment #11) > Thank you Huijuan, Emanuele - > > I think it is acceptable to wait until the next release for this bug (in > comment 0 ) related to IPv6 in AWS. > > Regarding the other bugs patched in 22.2 mentioned in comment 2 - it would > be great if these could be fixed in the current release, but I understand > the timelines and deadlines for the release support. I am OK providing an > updated cloud-init for my purposes via my COPR repository until 8.8 and 9.2 > are released. > > Best, > Neil Neil, thank you so much for the understanding. Regarding the other bugs patched in 22.2 mentioned in comment 2 - I think they will also be fixed in next rebase/release as the time and resource limitation now. And will update here once the official cloud-init version available, then you can update in your COPR repository. Emanuele, please correct me if anything incorrect. Thanks! Agree with you, thanks! Did these fixes get merged into the 9.2 release? I seem to be having similar symptoms with a RHEL 9.2 box in an AWS IPv6-only subnet. It does eventually boot (after minutes), but is unresponsive to pings or SSH. It also fails the "Instance status checks 1/2: Instance reachability check failure". It is different in that there don't appear to be /any/ cloud-init lines or actual network interface configuration in the system log. I have cross checked with a similarly configured (at least as far as security groups and ACLs go) Amazon Linux instance and it does boot quickly, passes the reachability check, is pingable, and I can SSH to it. Hi Adam, The fix did not merged into the 9.2 release, it will be merged into 9.3 by plan. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |