Bug 2017680
| Summary: | [gcp] Couldn’t enable support for instances with GPUs on GCP | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | sunzhaohua <zhsun> |
| Component: | Cloud Compute | Assignee: | Samuel Stuchly <sstuchly> |
| Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | ||
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:22:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
@ @Sam, Please make sure that the accelerated network fields have been copied over to the openshift/api repo as part of the migration and that the MAO repo has the latest copy of the api dependency. If you have issues, please speak to Alex who has been working on this migration. Tested with nightly build 4.10.0-0.nightly-2021-12-18-034942, all works well, move to verified.
$ oc get machine
NAME PHASE TYPE REGION ZONE AGE
zhsungcp201-r79l8-master-0 Running n1-standard-4 us-central1 us-central1-a 169m
zhsungcp201-r79l8-master-1 Running n1-standard-4 us-central1 us-central1-b 169m
zhsungcp201-r79l8-master-2 Running n1-standard-4 us-central1 us-central1-c 169m
zhsungcp201-r79l8-worker-a-9knlf Running n1-standard-4 us-central1 us-central1-a 165m
zhsungcp201-r79l8-worker-b-xsflz Running n1-standard-4 us-central1 us-central1-b 165m
zhsungcp201-r79l8-worker-c-vcw54 Deleting n1-standard-1 us-central1 us-central1-c 124m
$ oc edit machineset zhsungcp201-r79l8-worker-c
gpus:
- count: 1
type: nvidia-tesla-p100
kind: GCPMachineProviderSpec
machineType: n1-standard-1
onHostMaintenance: Terminate
restartPolicy: Always
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: Couldn’t enable support for instances with GPUs on GCP Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-10-25-190146 How reproducible: always Steps to Reproduce: 1. Create a new machineset providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 ... guestAccelerators: - acceleratorCount: 1 acceleratorType: nvidia-tesla-p100 kind: GCPMachineProviderSpec machineType: n1-standard-1 2. Check new created machines 3. Actual results: GuestAccelerators are ignored in machine yaml file providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 canIPForward: false credentialsSecret: name: gcp-cloud-credentials deletionProtection: false disks: - autoDelete: true boot: true image: projects/rhcos-cloud/global/images/rhcos-410-84-202110140201-0-gcp-x86-64 labels: null sizeGb: 128 type: pd-ssd kind: GCPMachineProviderSpec machineType: n1-standard-1 metadata: creationTimestamp: null networkInterfaces: - network: wewang-gcp10-r5h4b-network subnetwork: wewang-gcp10-r5h4b-worker-subnet projectID: openshift-qe region: us-central1 serviceAccounts: - email: wewang-gcp10-r5h4b-w.gserviceaccount.com scopes: - https://www.googleapis.com/auth/cloud-platform tags: - wewang-gcp10-r5h4b-worker userDataSecret: name: worker-user-data zone: us-central1-c Expected results: Could create instances with GPU successfully Additional info: https://issues.redhat.com/browse/OCPCLOUD-812