Description of problem:
Bootstrap node fails downloading the ignition fragment from s3, while s3-eu-central-1 prefix lists are accessible. We can see in the logs that Ignition component tries to reach S3 in us-east-1 and therefore is failing as access to this network is not allowed and blocked by the VPC.
> Bootstrap node is in a eu-central-1 VPC
> rhcos-42.80.20191002.0-hvm (ami-092b69120ecf915ed)
Ignition snippet from user_data
> [K[ 127.326019] ignition: Ignition failed: RequestError: send request failed
> [ 127.326019] caused by: dial tcp 184.108.40.206:443: i/o timeout
> [[0;1;31mFAILED[0m] Failed to start Ignition (disks)[ 127.339017] systemd: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
When using S3 source, and network isn't ready for its first metadata retrieval, the regionHint is nil and so defaults to us-east-1 in https://github.com/coreos/ignition/blob/v0.33.0/internal/providers/ec2/ec2.go#L62
Version-Release number of selected component (if applicable):
RHEL Core OS 42.80.20191002.0
Steps to Reproduce:
1. Restrict S3 access to S3 eu-central-1 only and bootstrap may fail
Bootstrap is failing
Bootstrap to work
Likely the fix from https://github.com/coreos/ignition/pull/830 is needed
We should produce a newer version of Ignition for RHCOS 4.2 that has that PR, as well as other fixes that have landed. However, the tricky part is that the new Ignition would need to also be included in the boot images that are used to initially provision the bootstrap node and the rest of the nodes.
We currently do not have a good mechanism for doing that, so just noting here for visibility.
@behoward this seems like something the Tools team could handle: the rebuild of Ignition for 4.2.
Right, it would need new bootimages, which so far we haven't done. I'm not sure this warrants it versus getting 4.3 out.
I've tagged Ignition v0.34.0 with the fix needed. Not sure what needs to happen to get it into the relevant RHCOS.
Verified 4.4 nightly has the correct version of ignition
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.4.0-0.nightly-2019-12-16-124946 True False 4h11m Cluster version is 4.4.0-0.nightly-2019-12-16-124946
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-130-59.ec2.internal Ready master 4h30m v1.16.2
ip-10-0-133-136.ec2.internal Ready worker 4h20m v1.16.2
ip-10-0-154-85.ec2.internal Ready worker 4h21m v1.16.2
ip-10-0-157-203.ec2.internal Ready master 4h31m v1.16.2
ip-10-0-160-63.ec2.internal Ready worker 4h21m v1.16.2
ip-10-0-163-177.ec2.internal Ready master 4h31m v1.16.2
$ oc debug node/ip-10-0-130-59.ec2.internal
Starting pod/ip-10-0-130-59ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
bin boot dev etc home lib lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var
sh-4.4# rpm -q ignition
Removing debug pod ...
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.