Description of problem: Couldn’t enable support for instances with GPUs on GCP Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-10-25-190146 How reproducible: always Steps to Reproduce: 1. Create a new machineset providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 ... guestAccelerators: - acceleratorCount: 1 acceleratorType: nvidia-tesla-p100 kind: GCPMachineProviderSpec machineType: n1-standard-1 2. Check new created machines 3. Actual results: GuestAccelerators are ignored in machine yaml file providerSpec: value: apiVersion: gcpprovider.openshift.io/v1beta1 canIPForward: false credentialsSecret: name: gcp-cloud-credentials deletionProtection: false disks: - autoDelete: true boot: true image: projects/rhcos-cloud/global/images/rhcos-410-84-202110140201-0-gcp-x86-64 labels: null sizeGb: 128 type: pd-ssd kind: GCPMachineProviderSpec machineType: n1-standard-1 metadata: creationTimestamp: null networkInterfaces: - network: wewang-gcp10-r5h4b-network subnetwork: wewang-gcp10-r5h4b-worker-subnet projectID: openshift-qe region: us-central1 serviceAccounts: - email: wewang-gcp10-r5h4b-w.gserviceaccount.com scopes: - https://www.googleapis.com/auth/cloud-platform tags: - wewang-gcp10-r5h4b-worker userDataSecret: name: worker-user-data zone: us-central1-c Expected results: Could create instances with GPU successfully Additional info: https://issues.redhat.com/browse/OCPCLOUD-812
@
@Sam, Please make sure that the accelerated network fields have been copied over to the openshift/api repo as part of the migration and that the MAO repo has the latest copy of the api dependency. If you have issues, please speak to Alex who has been working on this migration.
Tested with nightly build 4.10.0-0.nightly-2021-12-18-034942, all works well, move to verified. $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsungcp201-r79l8-master-0 Running n1-standard-4 us-central1 us-central1-a 169m zhsungcp201-r79l8-master-1 Running n1-standard-4 us-central1 us-central1-b 169m zhsungcp201-r79l8-master-2 Running n1-standard-4 us-central1 us-central1-c 169m zhsungcp201-r79l8-worker-a-9knlf Running n1-standard-4 us-central1 us-central1-a 165m zhsungcp201-r79l8-worker-b-xsflz Running n1-standard-4 us-central1 us-central1-b 165m zhsungcp201-r79l8-worker-c-vcw54 Deleting n1-standard-1 us-central1 us-central1-c 124m $ oc edit machineset zhsungcp201-r79l8-worker-c gpus: - count: 1 type: nvidia-tesla-p100 kind: GCPMachineProviderSpec machineType: n1-standard-1 onHostMaintenance: Terminate restartPolicy: Always
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056