Description of problem: Since NFS can fail to mount with various reasons, it is really hard to debug without obvious/verbose error messages. Version-Release number of selected component (if applicable): openshift v3.3.0.10 kubernetes v1.3.0+57fb9ac etcd 2.3.0+git How reproducible: Always Steps to Reproduce: 1. Prepare a nfs server. 2. Create a PV using above nfs storage, but with invalid export path. e.g, non-exist path 3. Create a PVC. 4. Check PV/PVC status. 5. Create a pod using above PVC. 6. Check the pod detail. oc describe pod podname Actual results: No obvious/verbose error message why it failed to mount. Expected results: Show more obvious/verbose error message for easier debug. Additional info: # oc describe pod mypod-o4702 -n o4702 Name: mypod-o4702 Namespace: o4702 Security Policy: restricted Node: ip-172-18-3-22.ec2.internal/172.18.3.22 Start Time: Tue, 26 Jul 2016 03:24:43 -0400 Labels: name=frontendhttp Status: Pending IP: Controllers: <none> Containers: myfrontend: Container ID: Image: aosqe/nfs-web Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /mnt from nfs (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-u8ut5 (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: nfs: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: nfsc-o4702 ReadOnly: false default-token-u8ut5: Type: Secret (a volume populated by a Secret) SecretName: default-token-u8ut5 QoS Tier: BestEffort Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 3m 3m 1 {default-scheduler } Normal Scheduled Successfully assigned mypod-o4702 to ip-172-18-3-22.ec2.internal 1m 1m 1 {kubelet ip-172-18-3-22.ec2.internal} Warning FailedMount Unable to mount volumes for pod "mypod-o4702_o4702(06085133-5302-11e6-bc71-0e147d09b24f)": timeout expired waiting for volumes to attach/mount for pod "mypod-o4702"/"o4702". list of unattached/unmounted volumes=[nfs] 1m 1m 1 {kubelet ip-172-18-3-22.ec2.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mypod-o4702"/"o4702". list of unattached/unmounted volumes=[nfs]
https://github.com/kubernetes/kubernetes/pull/27778
This will be fixed in the next release. There is already a LGTM on the current PR.
This is going to be fixed in kubernetes-1.4. We do not plan to backport to openshift-3.3. UpcomingReleasing
please retest
Tried the same step as in #comment 0, still no obvious error message why it failed(the error message is the same as in #comment 0). # openshift version openshift v3.4.0.16+cc70b72 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 # oc describe pods mypod-od6vc -n od6vc Name: mypod-od6vc Namespace: od6vc Security Policy: restricted Node: qe-lxia-ocp34-node-registry-router-1/10.240.0.4 Start Time: Fri, 28 Oct 2016 01:36:30 -0400 Labels: name=frontendhttp Status: Pending IP: Controllers: <none> Containers: myfrontend: Container ID: Image: aosqe/nfs-web Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /mnt from nfs (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-ia0ce (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: nfs: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: nfsc-od6vc ReadOnly: false default-token-ia0ce: Type: Secret (a volume populated by a Secret) SecretName: default-token-ia0ce QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 3m 3m 1 {default-scheduler } Normal Scheduled Successfully assigned mypod-od6vc to qe-lxia-ocp34-node-registry-router-1 1m 1m 1 {kubelet qe-lxia-ocp34-node-registry-router-1} Warning FailedMount Unable to mount volumes for pod "mypod-od6vc_od6vc(7ad6d6e8-9cd0-11e6-98fc-42010af00003)": timeout expired waiting for volumes to attach/mount for pod "mypod-od6vc"/"od6vc". list of unattached/unmounted volumes=[nfs] 1m 1m 1 {kubelet qe-lxia-ocp34-node-registry-router-1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mypod-od6vc"/"od6vc". list of unattached/unmounted volumes=[nfs]
I just tested latest Origin and I can see the upstream #27778 PR working in OCP Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod2 to ose1.rhs 44s 44s 1 {kubelet ose1.rhs} Warning FailedMount Unable to mount volumes for pod "nfs-bb-pod2_default(b2e7a914-9d28-11e6-aa36-525400f26441)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod2"/"default". list of unattached/unmounted volumes=[nfsvol] 44s 44s 1 {kubelet ose1.rhs} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod2"/"default". list of unattached/unmounted volumes=[nfsvol] 38s 38s 1 {kubelet ose1.rhs} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/b2e7a914-9d28-11e6-aa36-525400f26441-pv-nfs" (spec.Name: "pv-nfs") pod "b2e7a914-9d28-11e6-aa36-525400f26441" (UID: "b2e7a914-9d28-11e6-aa36-525400f26441") with: mount failed: exit status 32 Mounting arguments: nfs1.rhs:/opt/data12 /etc/openshift/data/pods/b2e7a914-9d28-11e6-aa36-525400f26441/volumes/kubernetes.io~nfs/pv-nfs nfs [] Output: mount.nfs: Connection timed out The true mount error from the provider might come after the generic "timeout expired waiting for volumes" - did you check it again a short time after you saw the original?
Another example using latest Origin code more aligned with the test scenario that QE is using: Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 12s 12s 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod2 to ose1.rhs 11s 3s 5 {kubelet ose1.rhs} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/1a23dbb1-9d2b-11e6-aa36-525400f26441-pv-nfs1" (spec.Name: "pv-nfs1") pod "1a23dbb1-9d2b-11e6-aa36-525400f26441" (UID: "1a23dbb1-9d2b-11e6-aa36-525400f26441") with: mount failed: exit status 32 Mounting arguments: nfs1.rhs:/opt/data36 /etc/openshift/data/pods/1a23dbb1-9d2b-11e6-aa36-525400f26441/volumes/kubernetes.io~nfs/pv-nfs1 nfs [] Output: mount.nfs: access denied by server while mounting nfs1.rhs:/opt/data36
This has been merged into ose and is in OSE v3.4.0.17 or newer.
Tried again today, I can get the truth error in the output. Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 14m 14m 1 {default-scheduler } Normal Scheduled Successfully assigned mypod-9xu-r to qe-chaoyang-node-registry-router-1 12m 56s 6 {kubelet qe-chaoyang-node-registry-router-1} Warning FailedMount Unable to mount volumes for pod "mypod-9xu-r_9xu-r(e0f1cefe-9f37-11e6-93b7-42010af0000f)": timeout expired waiting for volumes to attach/mount for pod "mypod-9xu-r"/"9xu-r". list of unattached/unmounted volumes=[nfs] 12m 56s 6 {kubelet qe-chaoyang-node-registry-router-1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "mypod-9xu-r"/"9xu-r". list of unattached/unmounted volumes=[nfs] 9m 37s 5 {kubelet qe-chaoyang-node-registry-router-1} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/e0f1cefe-9f37-11e6-93b7-42010af0000f-nfs-9xu-r" (spec.Name: "nfs-9xu-r") pod "e0f1cefe-9f37-11e6-93b7-42010af0000f" (UID: "e0f1cefe-9f37-11e6-93b7-42010af0000f") with: exit status 32
No docs are needed for this change
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066