Description of problem: Bootstrap node fails downloading the ignition fragment from s3, while s3-eu-central-1 prefix lists are accessible. We can see in the logs that Ignition component tries to reach S3 in us-east-1 and therefore is failing as access to this network is not allowed and blocked by the VPC. > Bootstrap node is in a eu-central-1 VPC > rhcos-42.80.20191002.0-hvm (ami-092b69120ecf915ed) Ignition snippet from user_data > {"ignition":{"config":{"append":[{"source":"s3://foo-eu-central-1-bar-ocp-cluster-management/bootstrap.ign","verification":{}}]}, Logs found > [K[ 127.326019] ignition[636]: Ignition failed: RequestError: send request failed > [ 127.326019] caused by: dial tcp 52.217.10.22:443: i/o timeout > [[0;1;31mFAILED[0m] Failed to start Ignition (disks)[ 127.339017] systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE When using S3 source, and network isn't ready for its first metadata retrieval, the regionHint is nil and so defaults to us-east-1 in https://github.com/coreos/ignition/blob/v0.33.0/internal/providers/ec2/ec2.go#L62 Version-Release number of selected component (if applicable): RHEL Core OS 42.80.20191002.0 How reproducible: Always Steps to Reproduce: 1. Restrict S3 access to S3 eu-central-1 only and bootstrap may fail Actual results: Bootstrap is failing Expected results: Bootstrap to work Additional info: Likely the fix from https://github.com/coreos/ignition/pull/830 is needed
We should produce a newer version of Ignition for RHCOS 4.2 that has that PR, as well as other fixes that have landed. However, the tricky part is that the new Ignition would need to also be included in the boot images that are used to initially provision the bootstrap node and the rest of the nodes. We currently do not have a good mechanism for doing that, so just noting here for visibility. @behoward this seems like something the Tools team could handle: the rebuild of Ignition for 4.2.
Right, it would need new bootimages, which so far we haven't done. I'm not sure this warrants it versus getting 4.3 out.
I've tagged Ignition v0.34.0 with the fix needed. Not sure what needs to happen to get it into the relevant RHCOS.
Verified 4.4 nightly has the correct version of ignition $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2019-12-16-124946 True False 4h11m Cluster version is 4.4.0-0.nightly-2019-12-16-124946 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-59.ec2.internal Ready master 4h30m v1.16.2 ip-10-0-133-136.ec2.internal Ready worker 4h20m v1.16.2 ip-10-0-154-85.ec2.internal Ready worker 4h21m v1.16.2 ip-10-0-157-203.ec2.internal Ready master 4h31m v1.16.2 ip-10-0-160-63.ec2.internal Ready worker 4h21m v1.16.2 ip-10-0-163-177.ec2.internal Ready master 4h31m v1.16.2 $ oc debug node/ip-10-0-130-59.ec2.internal Starting pod/ip-10-0-130-59ec2internal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# ls bin boot dev etc home lib lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var sh-4.4# rpm -q ignition ignition-0.34.0-0.rhaos4.3.git92f874c.el8.x86_64 sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days