Bug 1876219 - manila nfs share is stuck and is not able to be mounted by the containers
Summary: manila nfs share is stuck and is not able to be mounted by the containers
Keywords:
Status: CLOSED DUPLICATE of bug 1867152
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: aos-storage-staff@redhat.com
QA Contact: Qin Ping
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-06 13:17 UTC by Mohamed Belal
Modified: 2020-09-29 22:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-08 08:26:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
OSP manila for NFS share network topology (140.61 KB, image/png)
2020-09-06 13:17 UTC, Mohamed Belal
no flags Details

Description Mohamed Belal 2020-09-06 13:17:03 UTC
Created attachment 1713881 [details]
OSP manila for NFS share network topology

Created attachment 1713881 [details]
OSP manila for NFS share network topology

Description of problem:
manila-csi is able to create the required share on OSP, the PV is bound to the PVC but the containers are not able to mount the created volume. The NFS share is reachable through a private network which is added to the OCP workers as a second interface. The share can be mounted manually on the worker as well.

The issue is that pod/csi-nodeplugin-nfsplugin is not able to see the worker/hostnetwork and based on that it can't mount the share automatically "There is a suggested solution in the Expected results"


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install OSP integrated with Ceph and enable manila for fileShare following the standard Architecture design considering the network as attached
2. Install OCP on OSP
3. Install manila-csi operator and driver while attaching a second interface on workers for StoaregNFS reachability
4. Create PVC using the newly generated manila storage class
5. Attach the PVC to one of the containers like image-registry

Actual results:
$ oc get pods -n openshift-image-registry
NAME                                               READY   STATUS              RESTARTS   AGE
cluster-image-registry-operator-7c7c9d6bf6-99hg2   2/2     Running             0          43h
image-pruner-1599350400-9gtp8                      0/1     Completed           0          12h
image-registry-747ccb4b66-b8wjf                    0/1     ContainerCreating   0          9s   --------------> stuck in ContainerCreating
node-ca-9cw9w                                      1/1     Running             0          43h
node-ca-cngrl                                      1/1     Running             0          43h
node-ca-ffpmz                                      1/1     Running             0          39h
node-ca-kvgwh                                      1/1     Running             0          39h
node-ca-pcqt4                                      1/1     Running             0          43h
node-ca-vwxth                                      1/1     Running             0          43h
node-ca-w74rh                                      1/1     Running             0          43h

4m7s        Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    MountVolume.SetUp failed for volume "pvc-15a5342e-6870-4950-9d1a-0eba0378746e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
20m         Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[registry-certificates trusted-ca installation-pull-secrets registry-token-gkswx registry-storage registry-tls]: timed out waiting for the condition
7m45s       Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[registry-storage registry-tls registry-certificates trusted-ca installation-pull-secrets registry-token-gkswx]: timed out waiting for the condition
5m42s       Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[registry-tls registry-certificates trusted-ca installation-pull-secrets registry-token-gkswx registry-storage]: timed out waiting for the condition
13m         Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[trusted-ca installation-pull-secrets registry-token-gkswx registry-storage registry-tls registry-certificates]: timed out waiting for the condition
3m39s       Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    Unable to attach or mount volumes: unmounted volumes=[registry-storage], unattached volumes=[installation-pull-secrets registry-token-gkswx registry-storage registry-tls registry-certificates trusted-ca]: timed out waiting for the condition
3m38s       Warning   FailedMount              pod/image-registry-747ccb4b66-b8wjf    MountVolume.SetUp failed for volume "pvc-15a5342e-6870-4950-9d1a-0eba0378746e" : rpc error: code = Unavailable desc = transport is closing


$ oc logs csi-nodeplugin-nfsplugin-f8c9w -n openshift-manila-csi-driver
I0906 12:02:25.734215       1 nfs.go:49] Driver: nfs.csi.k8s.io version: 2.0.0
I0906 12:02:25.734918       1 nfs.go:99] Enabling volume access mode: SINGLE_NODE_WRITER
I0906 12:02:25.734923       1 nfs.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY
I0906 12:02:25.734925       1 nfs.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY
I0906 12:02:25.734928       1 nfs.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER
I0906 12:02:25.734930       1 nfs.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER
I0906 12:02:25.734936       1 nfs.go:110] Enabling controller service capability: UNKNOWN
I0906 12:02:25.736767       1 server.go:92] Listening for connections on address: &net.UnixAddr{Name:"/plugin/csi.sock", Net:"unix"}
E0906 12:02:27.668921       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:03:16.979205       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:03:17.280677       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:05:19.048241       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:05:19.349408       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:05:52.343538       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:05:52.343774       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:07:21.119619       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:07:21.417581       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:07:52.507238       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:07:52.507226       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:08:13.255901       1 mount_linux.go:139] Mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 10.206.120.6:/volumes/_nogroup/9bcfc69b-788f-4ac7-a201-cead05379743 /var/lib/kubelet/pods/0de89c17-75b0-4702-bcdc-41c87e02ad7b/volumes/kubernetes.io~csi/pvc-15a5342e-6870-4950-9d1a-0eba0378746e/mount
Output: mount.nfs: Connection timed out

E0906 12:08:13.257048       1 utils.go:50] GRPC error: rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 10.206.120.6:/volumes/_nogroup/9bcfc69b-788f-4ac7-a201-cead05379743 /var/lib/kubelet/pods/0de89c17-75b0-4702-bcdc-41c87e02ad7b/volumes/kubernetes.io~csi/pvc-15a5342e-6870-4950-9d1a-0eba0378746e/mount
Output: mount.nfs: Connection timed out
E0906 12:09:23.196106       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:09:23.497019       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:09:52.687017       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:09:52.687208       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:11:25.291848       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted
E0906 12:11:25.593596       1 utils.go:50] GRPC error: rpc error: code = NotFound desc = Volume not mounted


Expected results:
The container should be able to mount the nfs share.

$ oc get pods -n openshift-image-registry
NAME                                               READY   STATUS      RESTARTS   AGE
cluster-image-registry-operator-7c7c9d6bf6-99hg2   2/2     Running     0          44h
image-pruner-1599350400-9gtp8                      0/1     Completed   0          12h
image-registry-747ccb4b66-b8wjf                    1/1     Running     0          47m   --------------> up and running
node-ca-9cw9w                                      1/1     Running     0          44h
node-ca-cngrl                                      1/1     Running     0          44h
node-ca-ffpmz                                      1/1     Running     0          39h
node-ca-kvgwh                                      1/1     Running     0          39h
node-ca-pcqt4                                      1/1     Running     0          44h
node-ca-vwxth                                      1/1     Running     0          44h
node-ca-w74rh                                      1/1     Running     0          44h
[ocp4@rhel8-node ~]$


A suggested solution is to add "spec.hostNetwork: true"  to the csi-nodeplugin-nfsplugin daemonset -----------------------------------> suggested solution

$ oc get daemonset.apps/csi-nodeplugin-nfsplugin -o yaml  -n openshift-manila-csi-driver
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "4"
  labels:
    app: openstack-manila-csi
    component: nfs-nodeplugin
  name: csi-nodeplugin-nfsplugin
  namespace: openshift-manila-csi-driver
  resourceVersion: "1289489"
  selfLink: /apis/apps/v1/namespaces/openshift-manila-csi-driver/daemonsets/csi-nodeplugin-nfsplugin
  uid: b538297c-3074-4fdc-8a59-bc7d8eb6427a
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: openstack-manila-csi
      component: nfs-nodeplugin
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: openstack-manila-csi
        component: nfs-nodeplugin
    spec:
      containers:
      - args:
        - --nodeid=$(NODE_ID)
        - --endpoint=unix://plugin/csi.sock
        - --mount-permissions=0777
        env:
        - name: NODE_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: registry.redhat.io/openshift4/ose-csi-driver-nfs-rhel7@sha256:da67709ab66079b798914f1fe5cf867d6a050635534d2e56588164d8d9189183
        imagePullPolicy: IfNotPresent
        name: nfs
        resources: {}
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - SYS_ADMIN
          privileged: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /plugin
          name: plugin-dir
        - mountPath: /var/lib/kubelet/pods
          mountPropagation: Bidirectional
          name: pods-mount-dir
      dnsPolicy: ClusterFirst
      hostNetwork: true  --------------------------------------------------> should be added so that the manila nfs plugin will see the worker/host networks
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: csi-nodeplugin
      serviceAccountName: csi-nodeplugin
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/lib/kubelet/plugins/csi-nfsplugin
          type: DirectoryOrCreate
        name: plugin-dir
      - hostPath:
          path: /var/lib/kubelet/pods
          type: Directory
        name: pods-mount-dir
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 4
  desiredNumberScheduled: 4
  numberAvailable: 4
  numberMisscheduled: 0
  numberReady: 4
  observedGeneration: 4
  updatedNumberScheduled: 4


Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2020-09-08 08:26:18 UTC
Thanks for the report. It looks like a dup of bug #1867152 - different symptoms, but the same solution.

*** This bug has been marked as a duplicate of bug 1867152 ***


Note You need to log in before you can comment on or make changes to this bug.