Bug 1890456 - [vsphere] mapi_instance_create_failed doesn't work on vsphere
Summary: [vsphere] mapi_instance_create_failed doesn't work on vsphere
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.7.0
Assignee: Danil Grigorev
QA Contact: Milind Yadav
Depends On:
Blocks: 1900538
TreeView+ depends on / blocked
Reported: 2020-10-22 09:40 UTC by sunzhaohua
Modified: 2021-02-24 15:28 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Only certain errors caused the failure metric to be updated Consequence: Not all errors resulted in the failure metric being incremented Fix: Ensure all errors for machine creation update the metric Result: Any machine creation error updates the failure metric
Clone Of:
: 1900538 (view as bug list)
Last Closed: 2021-02-24 15:27:41 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 733 0 None closed Bug 1890456: Cleanup and refactor vSphere metrics 2021-02-17 16:28:17 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:28:13 UTC

Description sunzhaohua 2020-10-22 09:40:26 UTC
Description of problem:
mapi_instance_create_failed doesn't work on vsphere

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.Create a failed machine by setting template to an invalid one
2.Check prometheus metrics

Actual results:
Prometheus web console show "No datapoints found".

$ token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
$  oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "mapi_instance_"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 64475    0 64475    0     0   530k      0 --:--:-- --:--:-- --:--:--  533k
$ oc get machine
NAME                            PHASE     TYPE   REGION   ZONE   AGE
zhsunvs22-tr2bv-master-0        Running                          15h
zhsunvs22-tr2bv-master-1        Running                          15h
zhsunvs22-tr2bv-master-2        Running                          15h
zhsunvs22-tr2bv-worker-5d6xw    Running                          15h
zhsunvs22-tr2bv-worker-xrw84    Running                          15h
zhsunvs22-tr2bv-worker1-sjkss   Failed                           13h

Expected results:
Should show mapi_instance_create_failed detail info.

Additional info:

Comment 1 Danil Grigorev 2020-11-12 18:10:34 UTC
The PR is going to be merged today/tomorrow, QA already confirmed the bug is not present. Still, will tag this BZ with upcoming sprint for a case of unexpected delays.

Comment 3 Milind Yadav 2020-11-23 09:05:50 UTC
Validated on - 

1.Copy machineset to create an invalid image machineset
2.machine created in failed state when scaled 

mapi_instance_create_failed metric recorded successfully

Additional Info:
Moved to verified

Comment 6 errata-xmlrpc 2021-02-24 15:27:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.