Bug 1890456

Summary:	[vsphere] mapi_instance_create_failed doesn't work on vsphere
Product:	OpenShift Container Platform	Reporter:	sunzhaohua <zhsun>
Component:	Cloud Compute	Assignee:	Danil Grigorev <dgrigore>
Cloud Compute sub component:	Other Providers	QA Contact:	Milind Yadav <miyadav>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	low
Priority:	low
Version:	4.6
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Only certain errors caused the failure metric to be updated Consequence: Not all errors resulted in the failure metric being incremented Fix: Ensure all errors for machine creation update the metric Result: Any machine creation error updates the failure metric	Story Points:	---
Clone Of:
Clones:	1900538 (view as bug list)		Environment:
Last Closed:	2021-02-24 15:27:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1900538

Description sunzhaohua 2020-10-22 09:40:26 UTC

Description of problem:
mapi_instance_create_failed doesn't work on vsphere

Version-Release number of selected component (if applicable):
4.6.0-rc.4

How reproducible:
Always

Steps to Reproduce:
1.Create a failed machine by setting template to an invalid one
2.Check prometheus metrics
3.

Actual results:
Prometheus web console show "No datapoints found".

$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
$  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "mapi_instance_"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64475    0 64475    0     0   530k      0 --:--:-- --:--:-- --:--:--  533k
$ oc get machine
NAME                            PHASE     TYPE   REGION   ZONE   AGE
zhsunvs22-tr2bv-master-0        Running                          15h
zhsunvs22-tr2bv-master-1        Running                          15h
zhsunvs22-tr2bv-master-2        Running                          15h
zhsunvs22-tr2bv-worker-5d6xw    Running                          15h
zhsunvs22-tr2bv-worker-xrw84    Running                          15h
zhsunvs22-tr2bv-worker1-sjkss   Failed                           13h

Expected results:
Should show mapi_instance_create_failed detail info.

Additional info:

Comment 1 Danil Grigorev 2020-11-12 18:10:34 UTC

The PR is going to be merged today/tomorrow, QA already confirmed the bug is not present. Still, will tag this BZ with upcoming sprint for a case of unexpected delays.

Comment 3 Milind Yadav 2020-11-23 09:05:50 UTC

Validated on - 


Steps:
1.Copy machineset to create an invalid image machineset
2.machine created in failed state when scaled 


Result:
mapi_instance_create_failed metric recorded successfully


Additional Info:
Moved to verified

Comment 6 errata-xmlrpc 2021-02-24 15:27:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633