Description of problem: There is no error metric for cloudprovider_gce_api_request_errors when gce pd attached faile to node Version-Release number of selected component (if applicable): oc v3.7.0-0.127.0 kubernetes v1.7.0+80709908fd features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-chaoyang-master-etcd-nfs-1:8443 openshift v3.7.0-0.127.0 kubernetes v1.7.0+80709908fd How reproducible: Always Steps to Reproduce: 1.Create a dynamic pvc 2.Delete pv from gce web console 3.Create a pod using above pvc 4.Pod is in "ContainerCreating" status 5.oc describe pods Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 default-scheduler Normal Scheduled Successfully assigned gce1 to qe-chaoyang-node-registry-router-1 1h 1h 1 kubelet, qe-chaoyang-node-registry-router-1 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-n1kkg" 1h 28m 15 kubelet, qe-chaoyang-node-registry-router-1 Warning FailedMount Unable to mount volumes for pod "gce1_test(9112be4d-a414-11e7-ad1f-42010af00005)": timeout expired waiting for volumes to attach/mount for pod "test"/"gce1". list of unattached/unmounted volumes=[pvol] 1h 1m 27 kubelet, qe-chaoyang-node-registry-router-1 Warning FailedSync Error syncing pod 1h 1m 38 attachdetach Warning FailedMount AttachVolume.Attach failed for volume "pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" : GCE persistent disk not found: diskName="kubernetes-dynamic-pvc-4e7f9d70-a414-11e7-ad1f-42010af00005" zone="us-central1-a Actual results: No record for cloudprovider_gce_api_request_errors in 50 min Expected results: "cloudprovider_gce_api_request_errors { request = "attach_disk"}" should displayed on the prometheus web console. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
I think currently this is by design. We don't record disk not found as error currently. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_disks.go#L583 It is debatable whether we should or we should not, we will have to figure out with upstream.
BTW what linked code shows is, we don't initialize metric object if disk is not found and that is why no metric is recorded if disk is not found.
I am going to close this since this there hasn't been a ask to provide this metric yet.