1496727 – No value recorded for metric cloudprovider_gce_api_request_errors when gce pd attached failed to node

Bug 1496727 - No value recorded for metric cloudprovider_gce_api_request_errors when gce pd attached failed to node

Summary: No value recorded for metric cloudprovider_gce_api_request_errors when gce pd...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Hemant Kumar
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-28 08:52 UTC by Chao Yang
Modified:	2019-02-19 21:42 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-19 21:42:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Chao Yang 2017-09-28 08:52:19 UTC

Description of problem:
There is no error metric for cloudprovider_gce_api_request_errors when gce pd attached faile to node 
Version-Release number of selected component (if applicable):
oc v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-chaoyang-master-etcd-nfs-1:8443
openshift v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd

How reproducible:
Always
Steps to Reproduce:
1.Create a dynamic pvc
2.Delete pv from gce web console
3.Create a pod using above pvc
4.Pod is in "ContainerCreating" status
5.oc describe pods 
Events:
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----						-------------	--------	------			-------
  1h		1h		1	default-scheduler						Normal		Scheduled		Successfully assigned gce1 to qe-chaoyang-node-registry-router-1
  1h		1h		1	kubelet, qe-chaoyang-node-registry-router-1			Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-n1kkg" 
  1h		28m		15	kubelet, qe-chaoyang-node-registry-router-1			Warning		FailedMount		Unable to mount volumes for pod "gce1_test(9112be4d-a414-11e7-ad1f-42010af00005)": timeout expired waiting for volumes to attach/mount for pod "test"/"gce1". list of unattached/unmounted volumes=[pvol]
  1h		1m		27	kubelet, qe-chaoyang-node-registry-router-1			Warning		FailedSync		Error syncing pod
  1h		1m		38	attachdetach							Warning		FailedMount		AttachVolume.Attach failed for volume "pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" : GCE persistent disk not found: diskName="kubernetes-dynamic-pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" zone="us-central1-a

Actual results:
No record for cloudprovider_gce_api_request_errors in 50 min

Expected results:
"cloudprovider_gce_api_request_errors { request = "attach_disk"}" should displayed on the prometheus web console.
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2017-10-06 15:48:03 UTC

I think currently this is by design. We don't record disk not found as error currently. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_disks.go#L583

It is debatable whether we should or we should not, we will have to figure out with upstream.

Comment 2 Hemant Kumar 2017-10-06 15:49:46 UTC

BTW what linked code shows is, we don't initialize metric object if disk is not found and that is why no metric is recorded if disk is not found.

Comment 3 Hemant Kumar 2019-02-19 21:42:37 UTC

I am going to close this since this there hasn't been a ask to provide this metric yet.

Note You need to log in before you can comment on or make changes to this bug.