Description of problem: UPI on OSP, machine status doesn't become "Failed" when creating a machine with invalid image Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-02-17-211020 How reproducible: Always Steps to Reproduce: 1.Create a machine, setting its providerSpec with an invalid image 2.Check machine status 3.Check logs Actual results: Machine stuck in Provisioning status, doesn't become "Failed" $ oc get machine NAME PHASE TYPE REGION ZONE AGE zhsun9-wxcfz-worker-aaaaa Provisioning ci.m1.xlarge regionOne nova 42m zhsun9-wxcfz-worker-b-llxd8 Running ci.m1.xlarge regionOne nova 16m zhsun9-wxcfz-worker-bfwjk Running ci.m1.xlarge regionOne nova 19h zhsun9-wxcfz-worker-c-6tjtg Running ci.m1.xlarge regionOne nova 18h I0220 03:35:35.617474 1 controller.go:164] Reconciling Machine "zhsun9-wxcfz-worker-aaaaa" I0220 03:35:35.617633 1 controller.go:376] Machine "zhsun9-wxcfz-worker-aaaaa" in namespace "openshift-machine-api" doesn't specify "cluster.k8s.io/cluster-name" label, assuming nil cluster I0220 03:35:35.627688 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0220 03:35:37.209780 1 controller.go:319] Reconciling machine object zhsun9-wxcfz-worker-aaaaa triggers idempotent create. I0220 03:35:37.222271 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle I0220 03:35:37.296965 1 machineservice.go:229] Cloud provider CA cert not provided, using system trust bundle E0220 03:35:44.641047 1 actuator.go:474] Machine error zhsun9-wxcfz-worker-aaaaa: error creating Openstack instance: Create new server err: no image with the name rhcos-44.81.202002071430-0000 could be found W0220 03:35:44.641072 1 controller.go:321] Failed to create machine "zhsun9-wxcfz-worker-aaaaa": error creating Openstack instance: Create new server err: no image with the name rhcos-44.81.202002071430-0000 could be found I0220 03:38:28.481553 1 controller.go:164] Reconciling Machine "zhsun9-wxcfz-worker-aaaaa" Expected results: Machine status become "Failed" Additional info:
*** Bug 1805023 has been marked as a duplicate of this bug. ***
The team considers this bug as valid. Considering this bug priority and our capacity, we are deferring this bug to an upcoming sprint. If there are reasons for us to reprioritise, please let us know.
Considering the priority assigned to this bug and our team capacity, we are deferring this bug to an upcoming sprint. Please let us know if there are reasons for us to reprioritize.
Deferring to an upcoming sprint. Please let us know if there are reasons for us to reprioritize.
This may have been fixed in 4.6 and 4.5 by updading cluster-api-provider-openstack's dependencies [1]. Can you still reproduce the issue on 4.5 and 4.6? [1]: https://github.com/openshift/cluster-api-provider-openstack/pull/101
(4.5 OR 4.6, sorry)
@Pierre Prinetti yes, I could reproduced this in 4.5 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-rc.5 True False 64m Cluster version is 4.5.0-rc.5 $ oc get machine NAME PHASE TYPE REGION ZONE AGE hongli-share-mtkvw-master-0 Running m1.xlarge regionOne nova 5h11m hongli-share-mtkvw-master-1 Running m1.xlarge regionOne nova 5h11m hongli-share-mtkvw-master-2 Running m1.xlarge regionOne nova 5h11m hongli-share-mtkvw-worker-lm57w Running m1.xlarge regionOne nova 4h54m hongli-share-mtkvw-worker-q2bnm Running m1.xlarge regionOne nova 4h54m hongli-share-mtkvw-worker-zzgwx Running m1.xlarge regionOne nova 4h54m hongli-share-mtkvw-worker1-7zlg4 Provisioning 27m I0702 08:16:29.247088 1 controller.go:313] hongli-share-mtkvw-worker1-7zlg4: reconciling machine triggers idempotent create I0702 08:16:29.264637 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle I0702 08:16:29.330715 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle E0702 08:16:37.659537 1 actuator.go:538] Machine error hongli-share-mtkvw-worker1-7zlg4: error creating Openstack instance: Create new server err: no image with the name hongli-share-mtkvw-rhcos-invalid could be found W0702 08:16:37.659572 1 controller.go:315] hongli-share-mtkvw-worker1-7zlg4: failed to create machine: error creating Openstack instance: Create new server err: no image with the name hongli-share-mtkvw-rhcos-invalid could be found
Validated on below clusterversion (IPI) [miyadav@miyadav ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-09-20-184226 True False 77m Cluster version is 4.6.0-0.nightly-2020-09-20-184226 [miyadav@miyadav ~]$ Steps : created machineset with invalid spec , machine stuck in Provisioning state .. [miyadav@miyadav ~]$ oc get machines PHASE TYPE REGION ZONE AGE miyadav-2109-rgtw2-master-0 Running m1.xlarge regionOne nova 109m miyadav-2109-rgtw2-master-1 Running m1.xlarge regionOne nova 109m miyadav-2109-rgtw2-master-2 Running m1.xlarge regionOne nova 109m miyadav-2109-rgtw2-worker-0-9trlc Running m1.large regionOne nova 103m miyadav-2109-rgtw2-worker-0-p7sb6 Running m1.large regionOne nova 103m miyadav-2109-rgtw2-worker-0-z9tlj Running m1.large regionOne nova 103m miyadav-2109-rgtw2-worker-inv-44zff Provisioning 41m [miyadav@miyadav ~]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-autoscaler-operator-67f5fbc644-k458r 2/2 Running 0 69m machine-api-controllers-5445c7f675-pbhwp 7/7 Running 0 66m machine-api-operator-7858f579db-tfg8f 2/2 Running 0 66m [miyadav@miyadav ~]$ oc logs -f machine-api-controllers-5445c7f675-pbhwp -c machine-controller . . . E0921 06:36:40.088279 1 actuator.go:550] Machine error miyadav-2109-rgtw2-worker-inv-44zff: Unable to find flavor with name m1.invalid W0921 06:36:40.088317 1 controller.go:315] miyadav-2109-rgtw2-worker-inv-44zff: failed to create machine: Unable to find flavor with name m1.invalid I0921 06:36:41.088670 1 controller.go:169] miyadav-2109-rgtw2-worker-inv-44zff: reconciling Machine I0921 06:36:41.112408 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle I0921 06:36:41.556962 1 controller.go:313] miyadav-2109-rgtw2-worker-inv-44zff: reconciling machine triggers idempotent create I0921 06:36:41.619936 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle I0921 06:36:41.716050 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle E0921 06:36:42.052502 1 actuator.go:550] Machine error miyadav-2109-rgtw2-worker-inv-44zff: Unable to find flavor with name m1.invalid W0921 06:36:42.052536 1 controller.go:315] miyadav-2109-rgtw2-worker-inv-44zff: failed to create machine: Unable to find flavor with name m1.invalid I0921 06:36:43.053162 1 controller.go:169] miyadav-2109-rgtw2-worker-inv-44zff: reconciling Machine I0921 06:36:43.071554 1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle . . . Expected : machine should become in failed status Actual : machine stuck in provisioning state
Validated at - oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-09-235738 True False 16m Cluster version is 4.7.0-0.nightly-2020-11-09-235738 Steps : created machineset with invalid spec (flavor: invalid) [root@miyadav miyadav]# oc get machines -w NAME PHASE TYPE REGION ZONE AGE miyadav-b025-jzgt7-master-0 Running m1.xlarge regionOne nova 52m miyadav-b025-jzgt7-master-1 Running m1.xlarge regionOne nova 52m miyadav-b025-jzgt7-master-2 Running m1.xlarge regionOne nova 52m miyadav-b025-jzgt7-worker-0-6d4l5 Running m1.large regionOne nova 50m miyadav-b025-jzgt7-worker-0-lg4r4 Running m1.large regionOne nova 50m miyadav-b025-jzgt7-worker-0-xnjzz Running m1.large regionOne nova 50m miyadav-b025-jzgt7-worker-i-47tw4 Failed 7s Expected & Actual : Machine should be in failed status Additional Info: Moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633