Bug 1902996 - [AWS] UPI on USGov, bootstrap machine can not fetch ignition file via s3:// URI
Summary: [AWS] UPI on USGov, bootstrap machine can not fetch ignition file via s3:// URI
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: slowrie
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1915617
TreeView+ depends on / blocked
 
Reported: 2020-12-01 05:07 UTC by Yunfei Jiang
Modified: 2021-02-24 15:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:36:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
bootstrap console output (64.00 KB, text/plain)
2020-12-01 09:22 UTC, Yunfei Jiang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github coreos ignition pull 1155 0 None closed internal/providers: Run platform Init function before fetching config 2021-02-07 06:22:41 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:37:05 UTC

Description Yunfei Jiang 2020-12-01 05:07:19 UTC
Install an UPI private cluster on US-Gov, using s3:// URI for bootstrap ignition file location, the bootstrap machine can not start successfully due to ignition file is missing.

error from bootstrap instance system log:
<--snip-->
Displaying logs from failed units: ignition-fetch.service
-- Logs begin at Tue 2020-12-01 01:07:45 UTC, end at Tue 2020-12-01 01:07:49 UTC. --
Dec 01 01:07:48 ignition[706]: GET http://169.254.169.254/2019-10-01/user-data: attempt #2
Dec 01 01:07:48 ignition[706]: GET result: OK
Dec 01 01:07:48 ignition[706]: [0;2;37m[0;1;31m[0;2;37mparsing config with SHA512: 6e5554447563ef436c94ad7631c340275bf0bf6dec3b4347212ae9c07190f5ae024ec0c309b9f16746364604abad3451abfa0f9b42d3aabda1cb011bb1d43b20[0m
Dec 01 01:07:48 ignition[706]: [0;1;39m[0;1;31m[0;1;39mfailed to fetch config: couldn't determine the region for bucket "yunjiang-47s3b-2020-11-30-05-17-13": NotFound: Not Found[0m
                               [0;1;39m        status code: 404, request id: 529FD52ED261BB4F, host id: /dRzrIGSMh0tnzS4BLCWTapx4jDOcjPdCcTRN2dvRuvwzKGYlU3sy4QPqGcerVfV6YGwtQKwM00=[0m
<--snip-->
Dec 01 01:07:48 systemd[1]: [0;1;39m[0;1;31m[0;1;39mignition-fetch.service: Main process exited, code=exited, status=1/FAILURE[0m
Dec 01 01:07:48 systemd[1]: [0;1;39m[0;1;31m[0;1;39mignition-fetch.service: Failed with result 'exit-code'.[0m
Dec 01 01:07:48 systemd[1]: [0;1;31m[0;1;39m[0;1;31mFailed to start Ignition (fetch).[0m
Dec 01 01:07:48 systemd[1]: ignition-fetch.service: Triggering OnFailure= dependencies.
[?25l[m[H[J[1;1H[20;7H[mUse the ^ and v keys to change the selection.                       
<--snip-->

Create bootstrap machine command:
aws --region us-gov-west-1 cloudformation create-stack --stack-name yunjiang-47s3b-bs --template-body 'file:///home/jenkins/workspace/Launch Environment Flexy/private-templates/functionality-testing/aos-4_7/hosts/upi_on_aws-cloudformation-templates/04_cluster_bootstrap-private_cluster.yaml' --parameters ParameterKey=InfrastructureName,ParameterValue=yunjiang-47s3b-b7mdm ParameterKey=RhcosAmi,ParameterValue=ami-d3f2cab2 ParameterKey=PublicSubnet,ParameterValue=subnet-0ed111fd26fa40392  ParameterKey=MasterSecurityGroupId,ParameterValue=sg-09a2d9d4052eb206e ParameterKey=VpcId,ParameterValue=vpc-0aa033e3fa7f57441 ParameterKey=BootstrapIgnitionLocation,ParameterValue='s3://yunjiang-47s3b-2020-11-30-05-17-13/bootstrap_2020-11-30-05-17-13.ign' ParameterKey=RegisterNlbIpTargetsLambdaArn,ParameterValue=arn:aws-us-gov:lambda:us-gov-west-1:225746144451:function:yunjiang-47s3b-inf-RegisterNlbIpTargets-1KJUA4FFRPYXJ ParameterKey=InternalApiTargetGroupArn,ParameterValue=arn:aws-us-gov:elasticloadbalancing:us-gov-west-1:225746144451:targetgroup/yunji-Inter-1S9UI10G8IO2X/3500e06b65781529 ParameterKey=InternalServiceTargetGroupArn,ParameterValue=arn:aws-us-gov:elasticloadbalancing:us-gov-west-1:225746144451:targetgroup/yunji-Inter-1X85EOK1W1VG8/5c6a448b0963f5c0 --capabilities CAPABILITY_NAMED_IAM



install-config.yaml:
apiVersion: v1
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  replicas: 3
  platform:
    aws:
      amiID: ami-d3f2cab2
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  replicas: 0
  platform:
    aws:
      amiID: ami-d3f2cab2
metadata:
  name: yunjiang-47s3b
platform:
  aws:
    region: us-gov-west-1
pullSecret: HIDDEN
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  networkType: OpenShiftSDN
  machineNetwork:
  - cidr: 10.0.0.0/16
publish: Internal
baseDomain: qe.devcluster.openshift.com
sshKey: HIDDEN

Version-Release number of the following components: 
4.7.0-0.nightly-2020-11-29-133728

How reproducible: 
always 

Steps to Reproduce: 
1. Following UPI process, create an private cluster on us-gov-west-1, use s3:// URI for bootstrap ignition file location

Actual results: 
Bootstrap machine fetch ignition file failed.

Expected results: 
Bootstrap machine could fetch ignition files successfully.

Additional info:
The s3:// URI for bootstrap machine location works on 4.6
The pre-sign URL for bootstrap machine location works on 4.6
The pre-sign URL for bootstrap machine location works on 4.7

Comment 1 Yunfei Jiang 2020-12-01 09:22:52 UTC
Created attachment 1735153 [details]
bootstrap console output

Comment 2 Matthew Staebler 2020-12-02 01:54:41 UTC
Unless I am misunderstanding something, this seems to be a problem with Ignition rather than the installer.

Comment 3 Micah Abbott 2020-12-02 15:55:08 UTC
Targeting for 4.7; this appears to be a regression from 4.6

Comment 4 Micah Abbott 2020-12-03 14:55:30 UTC
This looks similar to BZ#1892521

Comment 5 Colin Walters 2020-12-03 15:27:07 UTC
It might be that Ignition needs to be updated to understand "alternative" AWS endpoints; that has been happening in other parts of OpenShift.  xref https://github.com/openshift/enhancements/pull/163

Possibly though the installer could mitigate this by providing an explicit region?  Needs investigation.

I filed https://github.com/coreos/ignition/pull/1139 related to this.

Comment 6 Benjamin Gilbert 2020-12-03 17:56:54 UTC
Ignition doesn't support specifying fully custom endpoints, but does understand non-standard partitions (GovCloud and China).  That functionality was previously broken but some fixes went into Ignition 2.7.0, which probably explains the change of behavior.

Yunfei, what region is that S3 bucket actually in?

Comment 7 Colin Walters 2020-12-03 18:03:13 UTC
> Possibly though the installer could mitigate this by providing an explicit region?  Needs investigation.

OK the installer is using the AWS SDK to generate a "pre-signed" URL https://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html
Which...hm, I guess doesn't include the region, but maybe we can convince it to do so?

Comment 8 Benjamin Gilbert 2020-12-03 18:07:04 UTC
A pre-signed URL should use the HTTPS scheme with a suitable endpoint, so those should work fine.

Comment 9 Benjamin Gilbert 2020-12-03 18:26:24 UTC
It appears that 2.7.0 broke region detection for child configs hosted in S3; see https://github.com/coreos/ignition/pull/1139#issuecomment-738194344.

I'd still like to get confirmation of the bucket region for that S3 bucket, though.

Comment 10 Yunfei Jiang 2020-12-04 00:57:00 UTC
(In reply to Micah Abbott from comment #4)
> This looks similar to BZ#1892521

differences: 
  1. failed in private UPI vs. BZ#1892521 works in private UPI, but failed in disconnected env
  2. more clear error message "couldn't determine the region for bucket" vs. BZ#1892521 no clear message indicates that can not fetch ignition file

Comment 11 Yunfei Jiang 2020-12-04 00:59:42 UTC
(In reply to Benjamin Gilbert from comment #6)
> Ignition doesn't support specifying fully custom endpoints, but does
> understand non-standard partitions (GovCloud and China).  That functionality
> was previously broken but some fixes went into Ignition 2.7.0, which
> probably explains the change of behavior.
> 
> Yunfei, what region is that S3 bucket actually in?

for this case, the S3 bucket is in us-gov-west-1.

There are two regions in GovCloud: us-gov-west-1 and us-gov-east-1.

Comment 12 Yunfei Jiang 2020-12-04 01:01:24 UTC
(In reply to Benjamin Gilbert from comment #8)
> A pre-signed URL should use the HTTPS scheme with a suitable endpoint, so
> those should work fine.

Yes, as my mentioned in description:

Additional info:
The s3:// URI for bootstrap machine location works on 4.6
The pre-sign URL for bootstrap machine location works on 4.6
The pre-sign URL for bootstrap machine location works on 4.7

Comment 13 Benjamin Gilbert 2020-12-04 06:06:18 UTC
Okay, great.  Since the bucket is in the same AWS partition as the instance, this is indeed expected to work, and was broken by https://github.com/coreos/ignition/pull/1078.

Comment 14 Micah Abbott 2020-12-05 15:53:59 UTC
Higher priority work related to 4.7 features prevented this from being worked on; setting UpcomingSprint

Comment 15 Benjamin Gilbert 2021-01-08 10:48:04 UTC
Setting No Doc Update since this was a 4.7 regression.

Comment 17 Yunfei Jiang 2021-01-21 05:17:37 UTC
from OpenShift installer side, verified and pass.

OCP version: 4.7.0-0.nightly-2021-01-19-095812
RHCOS version: 47.83.202101161239-0

cluster was installed successfully.

Comment 18 Michael Nguyen 2021-01-22 13:45:06 UTC
Closing as verified based on https://bugzilla.redhat.com/show_bug.cgi?id=1902996#c17

Comment 21 errata-xmlrpc 2021-02-24 15:36:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.