Bug 2009111
| Summary: | [IPI-on-GCP] 'Install a cluster with nested virtualization enabled' failed due to unable to launch compute instances | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianli Wei <jiwei> | |
| Component: | Cloud Compute | Assignee: | dmoiseev | |
| Cloud Compute sub component: | Cloud Controller Manager | QA Contact: | sunzhaohua <zhsun> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | unspecified | CC: | aarapov, aos-bugs, athomas, calfonso, dmoiseev, mfedosin, nstielau, revyas, wking, zhsun | |
| Version: | 4.9 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.10.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Due to backward incompatible change in google cloud sdk, machine controller was not able to create machines due to incorrect resulting image url. Image url logic repaired according latest google sdk changes.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2009738 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 16:14:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2009738 | |||
|
Description
Jianli Wei
2021-09-30 01:03:49 UTC
Upgrade a cluster from 4.3.40->4.4.33->4.5.41->4.6.46->4.7.32->4.8.13->4.9.0-0.nightly-2021-09-29-172320 then scale up machineset, met same issue.
$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
zhsun9-z8mz5-m-0 Running n1-standard-4 us-central1 us-central1-a 23h
zhsun9-z8mz5-m-1 Running n1-standard-4 us-central1 us-central1-b 23h
zhsun9-z8mz5-m-2 Running n1-standard-4 us-central1 us-central1-c 23h
zhsun9-z8mz5-w-a-r2f8l Running n1-standard-4 us-central1 us-central1-a 20h
zhsun9-z8mz5-w-b-lc8hm Running n1-standard-4 us-central1 us-central1-b 13h
zhsun9-z8mz5-w-c-gxz78 Failed 5m6s
zhsun9-z8mz5-w-c-pj9dw Failed 5m6s
zhsun9-z8mz5-w-c-r4cjs Running n1-standard-4 us-central1 us-central1-c 20h
zhsun9-z8mz5-w-c-r8cd5 Failed 5m6s
zhsun9-z8mz5-w-f-fbrqp Running n1-standard-4 us-central1 us-central1-f 133m
zhsun9-z8mz5-w-f-wsxp7 Failed 24m
$ oc edit machine zhsun9-z8mz5-w-c-gxz78
status:
conditions:
- lastTransitionTime: "2021-09-30T03:53:57Z"
message: Instance has not been created
reason: InstanceNotCreated
severity: Warning
status: "False"
type: InstanceExists
errorMessage: 'error launching instance: googleapi: Error 400: Invalid value for
field ''resource.disks[0].initializeParams.sourceImage'': ''https://compute.googleapis.com/compute/v1/openshift-qe/global/images/zhsun9-z8mz5-rhcos-image''.
The URL is malformed., invalid'
errorReason: InvalidConfiguration
lastUpdated: "2021-09-30T03:53:57Z"
phase: Failed
providerStatus:
conditions:
- lastProbeTime: "2021-09-30T03:53:57Z"
lastTransitionTime: "2021-09-30T03:53:57Z"
message: 'googleapi: Error 400: Invalid value for field ''resource.disks[0].initializeParams.sourceImage'':
''https://compute.googleapis.com/compute/v1/openshift-qe/global/images/zhsun9-z8mz5-rhcos-image''.
The URL is malformed., invalid'
reason: MachineCreationFailed
status: "False"
type: MachineCreated
Missing a "projects" between v1 and openshift-qe in the URL based on https://cloud.google.com/compute/docs/reference/rest/v1/images/list#http-request I'm investigating... It seems like the installer generates correct URLs for images: https://github.com/openshift/installer/blob/master/pkg/asset/rhcos/image.go#L103 So something happens after that. Could you please provide a must-gather output for this issue? Mike, seems this bug is same with https://bugzilla.redhat.com/show_bug.cgi?id=2009127#c1 must-gather is provided (In reply to sunzhaohua from comment #1) > Upgrade a cluster from > 4.3.40->4.4.33->4.5.41->4.6.46->4.7.32->4.8.13->4.9.0-0.nightly-2021-09-29- > 172320 then scale up machineset, met same issue. > > $ oc get machine > NAME PHASE TYPE REGION ZONE > AGE > zhsun9-z8mz5-m-0 Running n1-standard-4 us-central1 > us-central1-a 23h > zhsun9-z8mz5-m-1 Running n1-standard-4 us-central1 > us-central1-b 23h > zhsun9-z8mz5-m-2 Running n1-standard-4 us-central1 > us-central1-c 23h > zhsun9-z8mz5-w-a-r2f8l Running n1-standard-4 us-central1 > us-central1-a 20h > zhsun9-z8mz5-w-b-lc8hm Running n1-standard-4 us-central1 > us-central1-b 13h > zhsun9-z8mz5-w-c-gxz78 Failed > 5m6s > zhsun9-z8mz5-w-c-pj9dw Failed > 5m6s > zhsun9-z8mz5-w-c-r4cjs Running n1-standard-4 us-central1 > us-central1-c 20h > zhsun9-z8mz5-w-c-r8cd5 Failed > 5m6s > zhsun9-z8mz5-w-f-fbrqp Running n1-standard-4 us-central1 > us-central1-f 133m > zhsun9-z8mz5-w-f-wsxp7 Failed > 24m > > $ oc edit machine zhsun9-z8mz5-w-c-gxz78 > status: > conditions: > - lastTransitionTime: "2021-09-30T03:53:57Z" > message: Instance has not been created > reason: InstanceNotCreated > severity: Warning > status: "False" > type: InstanceExists > errorMessage: 'error launching instance: googleapi: Error 400: Invalid > value for > field ''resource.disks[0].initializeParams.sourceImage'': > ''https://compute.googleapis.com/compute/v1/openshift-qe/global/images/ > zhsun9-z8mz5-rhcos-image''. > The URL is malformed., invalid' > errorReason: InvalidConfiguration > lastUpdated: "2021-09-30T03:53:57Z" > phase: Failed > providerStatus: > conditions: > - lastProbeTime: "2021-09-30T03:53:57Z" > lastTransitionTime: "2021-09-30T03:53:57Z" > message: 'googleapi: Error 400: Invalid value for field > ''resource.disks[0].initializeParams.sourceImage'': > > ''https://compute.googleapis.com/compute/v1/openshift-qe/global/images/ > zhsun9-z8mz5-rhcos-image''. > The URL is malformed., invalid' > reason: MachineCreationFailed > status: "False" > type: MachineCreated Can you please share machineset and machines (running and failed) manifests? Looking into a code i don't understand why this test did passing before. This https://github.com/openshift/cluster-api-provider-gcp/blob/release-4.9/pkg/cloud/gcp/actuators/machine/reconciler.go#L74 been there for last year at least, related installer parts which i'm aware of did not change for quite a while as well. Need investigate this. Ok, base path was changed inside google sdk, so, my fix should be valid. Previous ocp versions should not be affected. Evidences might be found in diff: git diff --output=diff c6faa4bae2ca201573c628e92b112971833284e7~1..HEAD vendor/google.golang.org/api/compute/v1/compute-gen.go I changed target release for being able to backport patch to 4.9 using your existing automation. *** Bug 2009127 has been marked as a duplicate of this bug. *** verified clusterversion: 4.10.0-0.nightly-2021-10-07-212540 upgrade from 4.9.0-rc.1 to 4.10.0-0.nightly-2021-10-07-212540, upgrade is successful. After upgrade, machine could be created successful. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun1081-j96wp-master-0 Running n1-standard-4 us-central1 us-central1-a 129m zhsun1081-j96wp-master-1 Running n1-standard-4 us-central1 us-central1-b 129m zhsun1081-j96wp-master-2 Running n1-standard-4 us-central1 us-central1-c 129m zhsun1081-j96wp-worker-a-z2r2p Running n1-standard-4 us-central1 us-central1-a 122m zhsun1081-j96wp-worker-b-vdbkz Running n1-standard-4 us-central1 us-central1-b 122m zhsun1081-j96wp-worker-c-4b74m Running n1-standard-4 us-central1 us-central1-c 3m13s zhsun1081-j96wp-worker-c-f9wfn Running n1-standard-4 us-central1 us-central1-c 122m Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |