Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1496727 - No value recorded for metric cloudprovider_gce_api_request_errors when gce pd attached failed to node
Summary: No value recorded for metric cloudprovider_gce_api_request_errors when gce pd...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.10.0
Assignee: Hemant Kumar
QA Contact: Chao Yang
Depends On:
TreeView+ depends on / blocked
Reported: 2017-09-28 08:52 UTC by Chao Yang
Modified: 2019-02-19 21:42 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2019-02-19 21:42:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Chao Yang 2017-09-28 08:52:19 UTC
Description of problem:
There is no error metric for cloudprovider_gce_api_request_errors when gce pd attached faile to node 
Version-Release number of selected component (if applicable):
oc v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-chaoyang-master-etcd-nfs-1:8443
openshift v3.7.0-0.127.0
kubernetes v1.7.0+80709908fd

How reproducible:
Steps to Reproduce:
1.Create a dynamic pvc
2.Delete pv from gce web console
3.Create a pod using above pvc
4.Pod is in "ContainerCreating" status
5.oc describe pods 
  FirstSeen	LastSeen	Count	From						SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----						-------------	--------	------			-------
  1h		1h		1	default-scheduler						Normal		Scheduled		Successfully assigned gce1 to qe-chaoyang-node-registry-router-1
  1h		1h		1	kubelet, qe-chaoyang-node-registry-router-1			Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-n1kkg" 
  1h		28m		15	kubelet, qe-chaoyang-node-registry-router-1			Warning		FailedMount		Unable to mount volumes for pod "gce1_test(9112be4d-a414-11e7-ad1f-42010af00005)": timeout expired waiting for volumes to attach/mount for pod "test"/"gce1". list of unattached/unmounted volumes=[pvol]
  1h		1m		27	kubelet, qe-chaoyang-node-registry-router-1			Warning		FailedSync		Error syncing pod
  1h		1m		38	attachdetach							Warning		FailedMount		AttachVolume.Attach failed for volume "pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" : GCE persistent disk not found: diskName="kubernetes-dynamic-pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" zone="us-central1-a

Actual results:
No record for cloudprovider_gce_api_request_errors in 50 min

Expected results:
"cloudprovider_gce_api_request_errors { request = "attach_disk"}" should displayed on the prometheus web console.
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2017-10-06 15:48:03 UTC
I think currently this is by design. We don't record disk not found as error currently. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_disks.go#L583

It is debatable whether we should or we should not, we will have to figure out with upstream.

Comment 2 Hemant Kumar 2017-10-06 15:49:46 UTC
BTW what linked code shows is, we don't initialize metric object if disk is not found and that is why no metric is recorded if disk is not found.

Comment 3 Hemant Kumar 2019-02-19 21:42:37 UTC
I am going to close this since this there hasn't been a ask to provide this metric yet.

Note You need to log in before you can comment on or make changes to this bug.