Eric had following pod: "level": "s0:c84,c64" } }, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/app/upload", "name": "volume-wctkf", "subPath": "limesurvey" }, { "mountPath": "/var/lib/mysql", "name": "volume-d5igd", "subPath": "mysql" }, { "mountPath": "/etc/mysql", "name": "volume-qsjfa", "subPath": "etc" }, { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "default-token-jtz0h", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [ { "name": "default-dockercfg-027m4" } ], "nodeName": "ip-172-31-64-240.us-east-2.compute.internal", "nodeSelector": { "type": "compute" }, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": { "fsGroup": 1007100000, "seLinuxOptions": { "level": "s0:c84,c64" } }, "serviceAccount": "default", "serviceAccountName": "default", "terminationGracePeriodSeconds": 30, "volumes": [ { "name": "volume-wctkf", "persistentVolumeClaim": { "claimName": "mysql" } }, { "name": "volume-d5igd", "persistentVolumeClaim": { "claimName": "mysql" } }, { "name": "volume-qsjfa", "persistentVolumeClaim": { "claimName": "mysql" } }, { "name": "default-token-jtz0h", "secret": { "defaultMode": 420, "secretName": "default-token-jtz0h" } } ] }, User has 3 different volume names that refer to same PVC and pod failed to start while waiting for volumes to attach/mount. Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
I think there are 2 differnet issues here: 1. A pod does not need to have more than one volume section to for mounting different subpaths within a volume. This should be fixed in online perhaps. 2. Openshift has a problem that, if there are more than one volumes that mount the same PVC - the pod can't start because subpath mounts are created very later in the process and volumemanager will create only 1 mount. In a nutshell, https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/volume_manager.go#L396 check will always fail and hence pod can't start because kubelet will think it should wait for 3 mount/attaches.
#1 should likely go to the console team or better yet to the storage experience team. Can you make that a seperate bug? I was using the web console and this is the only thing it seemed to offer...
Tested on below version: openshift v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 Pod failed to use the existing pvc when oc set volume to add a new one with different mount path 1 oc new-app --image-stream=openshift/postgresql --env=POSTGRESQL_USER\=tester --env=POSTGRESQL_PASSWORD\=xxx --env=POSTGRESQL_DATABASE\=testdb --name=mydb 2 oc set volume dc mydb --add --type=pvc --claim-mode=rwo --claim-name=pvcsc --claim-size=1G --name=gcevolume --mount-path=/opt111 --claim-class=sc-zypw5 3 oc set volume dc/mydb --add --type=pvc --claim-name=pvcsc --mount-path=/opt2 --name=volume2 # oc get pods NAME READY STATUS RESTARTS AGE mydb-2-scsn9 1/1 Running 0 5m mydb-3-deploy 1/1 Running 0 4m mydb-3-mwk6v 0/1 ContainerCreating 0 4m # oc describe pods mydb-3-mwk6v Name: mydb-3-mwk6v Namespace: zypw5 Node: wehe-node-registry-router-2/10.1.2.8 Start Time: Sat, 24 Feb 2018 03:29:34 +0000 Labels: app=mydb deployment=mydb-3 deploymentconfig=mydb Annotations: openshift.io/deployment-config.latest-version=3 openshift.io/deployment-config.name=mydb openshift.io/deployment.name=mydb-3 openshift.io/generated-by=OpenShiftNewApp openshift.io/scc=restricted Status: Pending IP: Controlled By: ReplicationController/mydb-3 Containers: mydb: Container ID: Image: registry.access.redhat.com/rhscl/postgresql-96-rhel7@sha256:8837530839f3e0db75c0cfc435952c031d3010d1032168535698b5def13b6b68 Image ID: Port: 5432/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: POSTGRESQL_DATABASE: testdb POSTGRESQL_PASSWORD: xxx POSTGRESQL_USER: tester Mounts: /opt111 from gcevolume (rw) /opt2 from volume2 (rw) /var/lib/pgsql/data from mydb-volume-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-7djsc (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: mydb-volume-1: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: gcevolume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvcsc ReadOnly: false volume2: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvcsc ReadOnly: false default-token-7djsc: Type: Secret (a volume populated by a Secret) SecretName: default-token-7djsc Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m default-scheduler Successfully assigned mydb-3-mwk6v to wehe-node-registry-router-2 Normal SuccessfulMountVolume 2m kubelet, wehe-node-registry-router-2 MountVolume.SetUp succeeded for volume "mydb-volume-1" Normal SuccessfulMountVolume 2m kubelet, wehe-node-registry-router-2 MountVolume.SetUp succeeded for volume "default-token-7djsc" Normal SuccessfulMountVolume 2m kubelet, wehe-node-registry-router-2 MountVolume.SetUp succeeded for volume "pvc-cba6efd3-1912-11e8-9aac-000d3a11082a" Warning FailedMount 29s kubelet, wehe-node-registry-router-2 Unable to mount volumes for pod "mydb-3-mwk6v_zypw5(ef1527e1-1912-11e8-9aac-000d3a11082a)": timeout expired waiting for volumes to attach/mount for pod "zypw5"/"mydb-3-mwk6v". list of unattached/unmounted volumes=[gcevolume] # oc get dc -o yaml apiVersion: v1 items: - apiVersion: apps.openshift.io/v1 kind: DeploymentConfig metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2018-02-24T03:28:23Z generation: 4 labels: app: mydb name: mydb namespace: zypw5 resourceVersion: "143810" selfLink: /apis/apps.openshift.io/v1/namespaces/zypw5/deploymentconfigs/mydb uid: c4dcf7df-1912-11e8-9aac-000d3a11082a spec: replicas: 1 revisionHistoryLimit: 10 selector: app: mydb deploymentconfig: mydb strategy: activeDeadlineSeconds: 21600 resources: {} rollingParams: intervalSeconds: 1 maxSurge: 25% maxUnavailable: 25% timeoutSeconds: 600 updatePeriodSeconds: 1 type: Rolling template: metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: null labels: app: mydb deploymentconfig: mydb spec: containers: - env: - name: POSTGRESQL_DATABASE value: testdb - name: POSTGRESQL_PASSWORD value: xxx - name: POSTGRESQL_USER value: tester image: registry.access.redhat.com/rhscl/postgresql-96-rhel7@sha256:8837530839f3e0db75c0cfc435952c031d3010d1032168535698b5def13b6b68 imagePullPolicy: IfNotPresent name: mydb ports: - containerPort: 5432 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/pgsql/data name: mydb-volume-1 - mountPath: /opt111 name: gcevolume - mountPath: /opt2 name: volume2 dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: mydb-volume-1 - name: gcevolume persistentVolumeClaim: claimName: pvcsc - name: volume2 persistentVolumeClaim: claimName: pvcsc test: false triggers: - type: ConfigChange - imageChangeParams: automatic: true containerNames: - mydb from: kind: ImageStreamTag name: postgresql:9.6 namespace: openshift lastTriggeredImage: registry.access.redhat.com/rhscl/postgresql-96-rhel7@sha256:8837530839f3e0db75c0cfc435952c031d3010d1032168535698b5def13b6b68 type: ImageChange status: availableReplicas: 1 conditions: - lastTransitionTime: 2018-02-24T03:28:31Z lastUpdateTime: 2018-02-24T03:28:31Z message: Deployment config has minimum availability. status: "True" type: Available - lastTransitionTime: 2018-02-24T03:39:36Z lastUpdateTime: 2018-02-24T03:39:36Z message: replication controller "mydb-3" has failed progressing reason: ProgressDeadlineExceeded status: "False" type: Progressing details: causes: - type: ConfigChange message: config change latestVersion: 3 observedGeneration: 4 readyReplicas: 1 replicas: 1 unavailableReplicas: 0 updatedReplicas: 0 kind: List metadata: resourceVersion: "" selfLink: ""
The original bug was reported about auto-generated volume names when user does not specify a volume name then `oc` command automatically generates a volume name. You are trying the case where user is deliberately using different volume name that refer to same PVC. In that case, the fix has to be different and I think we should simply reject such changes to Pod spec because it will result in a pod definition which can't work.
Agree, lets clone that as a new bug, and lets reject the pod spec.
Cloned the new error described towards the end of this BZ into 1550666, marking this issue as MODIFIED (with the new issue still un-resolved)
Tested with below version: openshift v3.9.4 kubernetes v1.9.1+a0ce1bc657 I tried with a pod that use same pvc but different volume mount: $ cat pod_2mounts.yaml kind: Pod apiVersion: v1 metadata: name: pod spec: containers: - name: dynamic image: jhou/hello-openshift ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/mnt/azf" subPath: a name: nfsmount volumeMounts: - mountPath: "/mnt/azd" subPath: b name: mountnfs volumes: - name: nfsmount persistentVolumeClaim: claimName: pvc - name: mountnfs persistentVolumeClaim: claimName: pvc But pod could not be running: $ oc get pods NAME READY STATUS RESTARTS AGE pod 0/1 ContainerCreating 0 5m $ oc describe pods pod Name: pod Namespace: wehe Node: wehe-node-registry-router-2/10.1.2.8 Start Time: Mon, 12 Mar 2018 16:29:20 +0800 Labels: <none> Annotations: openshift.io/scc=restricted Status: Pending IP: Containers: dynamic: Container ID: Image: jhou/hello-openshift Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/azd from mountnfs (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-qt4mn (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: nfsmount: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc ReadOnly: false mountnfs: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc ReadOnly: false default-token-qt4mn: Type: Secret (a volume populated by a Secret) SecretName: default-token-qt4mn Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 6m default-scheduler Successfully assigned pod to wehe-node-registry-router-2 Normal SuccessfulMountVolume 6m kubelet, wehe-node-registry-router-2 MountVolume.SetUp succeeded for volume "default-token-qt4mn" Normal SuccessfulMountVolume 6m kubelet, wehe-node-registry-router-2 MountVolume.SetUp succeeded for volume "nfs" Warning FailedMount 1m (x2 over 4m) kubelet, wehe-node-registry-router-2 Unable to mount volumes for pod "pod_wehe(7601cd79-25cf-11e8-84ad-000d3a1a2ac7)": timeout expired waiting for volumes to attach/mount for pod "wehe"/"pod". list of unattached/unmounted volumes=[nfsmount] $ oc get pods pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: restricted creationTimestamp: 2018-03-12T08:29:20Z name: pod namespace: wehe resourceVersion: "41422" selfLink: /api/v1/namespaces/wehe/pods/pod uid: 7601cd79-25cf-11e8-84ad-000d3a1a2ac7 spec: containers: - image: jhou/hello-openshift imagePullPolicy: Always name: dynamic ports: - containerPort: 80 name: http-server protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000170000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mnt/azd name: mountnfs subPath: b - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-qt4mn readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-zs6m2 nodeName: wehe-node-registry-router-2 nodeSelector: node-role.kubernetes.io/compute: "true" restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000170000 seLinuxOptions: level: s0:c13,c7 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: nfsmount persistentVolumeClaim: claimName: pvc - name: mountnfs persistentVolumeClaim: claimName: pvc - name: default-token-qt4mn secret: defaultMode: 420 secretName: default-token-qt4mn status: conditions: - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z message: 'containers with unready status: [dynamic]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z status: "True" type: PodScheduled containerStatuses: - image: jhou/hello-openshift imageID: "" lastState: {} name: dynamic ready: false restartCount: 0 state: waiting: reason: ContainerCreating hostIP: 10.1.2.8 phase: Pending qosClass: BestEffort startTime: 2018-03-12T08:29:20Z $ oc get pods pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: restricted creationTimestamp: 2018-03-12T08:29:20Z name: pod namespace: wehe resourceVersion: "41422" selfLink: /api/v1/namespaces/wehe/pods/pod uid: 7601cd79-25cf-11e8-84ad-000d3a1a2ac7 spec: containers: - image: jhou/hello-openshift imagePullPolicy: Always name: dynamic ports: - containerPort: 80 name: http-server protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000170000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mnt/azd name: mountnfs subPath: b - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-qt4mn readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-zs6m2 nodeName: wehe-node-registry-router-2 nodeSelector: node-role.kubernetes.io/compute: "true" restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000170000 seLinuxOptions: level: s0:c13,c7 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: nfsmount persistentVolumeClaim: claimName: pvc - name: mountnfs persistentVolumeClaim: claimName: pvc - name: default-token-qt4mn secret: defaultMode: 420 secretName: default-token-qt4mn status: conditions: - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z message: 'containers with unready status: [dynamic]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2018-03-12T08:29:20Z status: "True" type: PodScheduled containerStatuses: - image: jhou/hello-openshift imageID: "" lastState: {} name: dynamic ready: false restartCount: 0 state: waiting: reason: ContainerCreating hostIP: 10.1.2.8 phase: Pending qosClass: BestEffort startTime: 2018-03-12T08:29:20Z
hmm, we are talking about 2 different cases of this bug here. The original bug that I opened was about - autogenerated volume names that are created each time you try to add a volume to deploymentconfig etc. The bug was - if you try to add new mount points from same PVC then for each mount point, a new autogenerated volume name is created. We fixed this bug - now if you don't specify a volume name and you are trying to mount same PVC multiple times then `oc` command will not generate new volume names if there is an existing PVC with same name. What you are reporting is different case. What you are attempting is - you are deliberately using different volume names that refer to same PVC. This pod can never start and there is no fix(within reason) I can make that will make this error go away. In other words - we can't prevent users from shooting themselves in foot. But to make this easier - what I am going to do is, a validation(https://github.com/kubernetes/kubernetes/pull/60934) will prevent creation of such pods in first place. Obviously pods still can't start but at least user will get immediate feedback. The second part of the bug is being tracked here - https://bugzilla.redhat.com/show_bug.cgi?id=1550666 I will try and edit this bug to reflect more accurate information.
Hemant modified the description to reflect the actual bug & fix. The other issue mentioned here by QE is accounted for in another ticket. The bug should go back ON_QA
Sorry for I misunderstood this bug. Tested on below version ag: openshift v3.9.4 kubernetes v1.9.1+a0ce1bc657 oc set volume auto generated a new volume name if we do not set it $ oc set volume dc mydb --add --type=pvc --claim-mode=rwo --claim-name=pvcsc --claim-size=1G --mount-path=/opt111 --claim-class=azddef $ oc get dc -o yaml apiVersion: v1 items: - apiVersion: v1 kind: DeploymentConfig metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2018-03-13T03:54:58Z generation: 3 labels: app: mydb name: mydb namespace: wehe resourceVersion: "88862" selfLink: /oapi/v1/namespaces/wehe/deploymentconfigs/mydb uid: 4c9f35a7-2672-11e8-b596-000d3a1a2ac7 spec: replicas: 1 revisionHistoryLimit: 10 selector: app: mydb deploymentconfig: mydb strategy: activeDeadlineSeconds: 21600 resources: {} rollingParams: intervalSeconds: 1 maxSurge: 25% maxUnavailable: 25% timeoutSeconds: 600 updatePeriodSeconds: 1 type: Rolling template: metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: null labels: app: mydb deploymentconfig: mydb spec: containers: - env: - name: POSTGRESQL_DATABASE value: testdb - name: POSTGRESQL_PASSWORD value: xxx - name: POSTGRESQL_USER value: tester image: registry.access.redhat.com/rhscl/postgresql-96-rhel7@sha256:06b86e301a272c1861571e1c514d3f71f7a1bd0f4cbc283b352bb6c5a34b62ec imagePullPolicy: IfNotPresent name: mydb ports: - containerPort: 5432 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/pgsql/data name: mydb-volume-1 - mountPath: /opt111 name: volume-xv5sx dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: mydb-volume-1 - name: volume-xv5sx persistentVolumeClaim: claimName: pvcsc test: false triggers: - type: ConfigChange - imageChangeParams: automatic: true containerNames: - mydb from: kind: ImageStreamTag name: postgresql:9.6 namespace: openshift lastTriggeredImage: registry.access.redhat.com/rhscl/postgresql-96-rhel7@sha256:06b86e301a272c1861571e1c514d3f71f7a1bd0f4cbc283b352bb6c5a34b62ec type: ImageChange status: availableReplicas: 1 conditions: - lastTransitionTime: 2018-03-13T03:55:06Z lastUpdateTime: 2018-03-13T03:55:06Z message: Deployment config has minimum availability. status: "True" type: Available - lastTransitionTime: 2018-03-13T03:58:08Z lastUpdateTime: 2018-03-13T03:58:10Z message: replication controller "mydb-2" successfully rolled out reason: NewReplicationControllerAvailable status: "True" type: Progressing details: causes: - type: ConfigChange message: config change latestVersion: 2 observedGeneration: 3 readyReplicas: 1 replicas: 1 unavailableReplicas: 0 updatedReplicas: 1 kind: List metadata: resourceVersion: "" selfLink: "" $ oc get pods NAME READY STATUS RESTARTS AGE mydb-2-6dwzp 1/1 Running 0 4m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489