Bug 1890456

Summary: [vsphere] mapi_instance_create_failed doesn't work on vsphere
Product: OpenShift Container Platform Reporter: sunzhaohua <zhsun>
Component: Cloud ComputeAssignee: Danil Grigorev <dgrigore>
Cloud Compute sub component: Other Providers QA Contact: Milind Yadav <miyadav>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: low    
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Only certain errors caused the failure metric to be updated Consequence: Not all errors resulted in the failure metric being incremented Fix: Ensure all errors for machine creation update the metric Result: Any machine creation error updates the failure metric
Story Points: ---
Clone Of:
: 1900538 (view as bug list) Environment:
Last Closed: 2021-02-24 15:27:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1900538    

Description sunzhaohua 2020-10-22 09:40:26 UTC
Description of problem:
mapi_instance_create_failed doesn't work on vsphere

Version-Release number of selected component (if applicable):
4.6.0-rc.4

How reproducible:
Always

Steps to Reproduce:
1.Create a failed machine by setting template to an invalid one
2.Check prometheus metrics
3.

Actual results:
Prometheus web console show "No datapoints found".

$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
$  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "mapi_instance_"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64475    0 64475    0     0   530k      0 --:--:-- --:--:-- --:--:--  533k
$ oc get machine
NAME                            PHASE     TYPE   REGION   ZONE   AGE
zhsunvs22-tr2bv-master-0        Running                          15h
zhsunvs22-tr2bv-master-1        Running                          15h
zhsunvs22-tr2bv-master-2        Running                          15h
zhsunvs22-tr2bv-worker-5d6xw    Running                          15h
zhsunvs22-tr2bv-worker-xrw84    Running                          15h
zhsunvs22-tr2bv-worker1-sjkss   Failed                           13h

Expected results:
Should show mapi_instance_create_failed detail info.

Additional info:

Comment 1 Danil Grigorev 2020-11-12 18:10:34 UTC
The PR is going to be merged today/tomorrow, QA already confirmed the bug is not present. Still, will tag this BZ with upcoming sprint for a case of unexpected delays.

Comment 3 Milind Yadav 2020-11-23 09:05:50 UTC
Validated on - 


Steps:
1.Copy machineset to create an invalid image machineset
2.machine created in failed state when scaled 


Result:
mapi_instance_create_failed metric recorded successfully


Additional Info:
Moved to verified

Comment 6 errata-xmlrpc 2021-02-24 15:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633