Bug 1466708

Summary: Fail to create nfs-provisioner pod if docker is disable "Deferred Removal"
Product: OpenShift Container Platform Reporter: Wenqi He <wehe>
Component: StorageAssignee: Matthew Wong <mawong>
Status: CLOSED NOTABUG QA Contact: Wenqi He <wehe>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, bchilds, jsafrane, wehe
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-07 19:47:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wenqi He 2017-06-30 09:56:05 UTC
Description of problem:
Debug this for whole day to check the root cause.. Fail to create nfs-provisioner pod, if docker is set to "Deferred Removal Enabled: false"

Version-Release number of selected component (if applicable):
oc v3.6.126.4
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1. Create a project
2. Create a scc
$ oc create -f https://raw.githubusercontent.com/kubernetes-incubator/nfs-provisioner/master/deploy/kube-config/openshift-scc.yaml
3. Add user to this scc
$ oadm policy add-scc-to-user nfs-provisioner <user> 
4. Create a nfs-provisioner pod
oc create -f https://raw.githubusercontent.com/kubernetes-incubator/external-storage/master/nfs/deploy/kubernetes/pod.yaml

Actual results:
Pod fail to create
# oc get pods
NAME                    READY     STATUS              RESTARTS   AGE
nfs-provisioner-df60e   0/1       ContainerCreating   0          17m
# oc describe pods nfs-provisioner-df60e
And after this failed, even we delete nfs-provisioner pod, this node cannot schedule any pod with ContainerCreating status
After I changed the "Deferred Removal Enabled: false", this would work well

Expected results:
nfs-provisioner pod could be running
No other impact bad thing happens subsequently,all the other pod could be scheduled and running.
 

Master Log:
Jun 30 07:00:36 storage-master dockerd-current: time="2017-06-30T07:00:36.664650412Z" level=error msg="Handler for POST /v1.24/containers/create?name=k8s_POD_nfs-provisioner-df60e_qxni3_d146724d-5d61-11e7-832a-000d3a181970_0 returned error: devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed"
Jun 30 07:00:36 storage-master dockerd-current: time="2017-06-30T07:00:36.664706912Z" level=error msg="Handler for POST /v1.24/containers/create returned error: devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed"
Jun 30 07:00:36 storage-master atomic-openshift-node: E0630 07:00:36.665145   23505 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to create a sandbox for pod "nfs-provisioner-df60e": Error response from daemon: {"message":"devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed"}
Jun 30 07:00:36 storage-master atomic-openshift-node: E0630 07:00:36.665183   23505 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nfs-provisioner-df60e_qxni3(d146724d-5d61-11e7-832a-000d3a181970)" failed: rpc error: code = 2 desc = failed to create a sandbox for pod "nfs-provisioner-df60e": Error response from daemon: {"message":"devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed"}
Jun 30 07:00:36 storage-master atomic-openshift-node: E0630 07:00:36.665199   23505 kuberuntime_manager.go:619] createPodSandbox for pod "nfs-provisioner-df60e_qxni3(d146724d-5d61-11e7-832a-000d3a181970)" failed: rpc error: code = 2 desc = failed to create a sandbox for pod "nfs-provisioner-df60e": Error response from daemon: {"message":"devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed"}
Jun 30 07:00:36 storage-master atomic-openshift-node: E0630 07:00:36.665271   23505 pod_workers.go:182] Error syncing pod d146724d-5d61-11e7-832a-000d3a181970 ("nfs-provisioner-df60e_qxni3(d146724d-5d61-11e7-832a-000d3a181970)"), skipping: failed to "CreatePodSandbox" for "nfs-provisioner-df60e_qxni3(d146724d-5d61-11e7-832a-000d3a181970)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nfs-provisioner-df60e_qxni3(d146724d-5d61-11e7-832a-000d3a181970)\" failed: rpc error: code = 2 desc = failed to create a sandbox for pod \"nfs-provisioner-df60e\": Error response from daemon: {\"message\":\"devmapper: Error activating devmapper device for '8c62cc05b67414e3d6b33fee98ce91fcede16432ca69e82040dde327d97df061-init': devicemapper: Can't set cookie dm_task_set_cookie failed\"}"


Additional info:
$ docker info | grep Deferred
Deferred Removal Enabled: false

And then I ssh to the node, try to run docker run, I also get error message like below:
# docker run --rm -ti docker.io/aosqe/nfs-server bash
/usr/bin/docker-current: Error response from daemon: devmapper: Error activating devmapper device for 'fa63515664d65c6a3466e428db15aa1cfad264b849cfe812dbf31a37bfc3a71c-init': devicemapper: Can't set cookie dm_task_set_cookie failed.
See '/usr/bin/docker-current run --help'.

Comment 1 Matthew Wong 2017-06-30 15:25:17 UTC
What version of docker is this? I don't think there is anything special about the container that should prevent it from starting it in the first place. I could only find this https://bugzilla.redhat.com/show_bug.cgi?id=1456138 which points to a recent issue with docker where that cookie error message shows up.

The "solution" is to just not use the pod.yaml in test cases and use a deployment.yaml instead. The pod mounts an emptyDir and relies on docker storage driver behaving while the deployment mounts a hostPath. I have removed the pod.yaml from all documentation, can we modify the test cases to use a deployment.yaml?

https://github.com/kubernetes-incubator/external-storage/blob/master/nfs/deploy/kubernetes/auth/deployment-sa.yaml this mounts /srv on the host. It could be patched to mount e.g. /tmp/something.

Comment 2 Wenqi He 2017-07-03 06:44:58 UTC
(In reply to Matthew Wong from comment #1)
> What version of docker is this? I don't think there is anything special
> about the container that should prevent it from starting it in the first
> place. I could only find this
> https://bugzilla.redhat.com/show_bug.cgi?id=1456138 which points to a recent
> issue with docker where that cookie error message shows up.
> 
> The "solution" is to just not use the pod.yaml in test cases and use a
> deployment.yaml instead. The pod mounts an emptyDir and relies on docker
> storage driver behaving while the deployment mounts a hostPath. I have
> removed the pod.yaml from all documentation, can we modify the test cases to
> use a deployment.yaml?
> 
> https://github.com/kubernetes-incubator/external-storage/blob/master/nfs/
> deploy/kubernetes/auth/deployment-sa.yaml this mounts /srv on the host. It
> could be patched to mount e.g. /tmp/something.

Yeah, I noticed that bug, so after I debug I found the issue is from the docker option of "Deferred Removal Enabled: false". If docker started with the option of "Deferred Removal Enabled: true", this would not repro.

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-39.1.git6ffd653.el7.x86_64
 Go version:      go1.8.3
 Git commit:      6ffd653/1.12.6
 Built:           Wed Jun 14 14:31:17 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-39.1.git6ffd653.el7.x86_64
 Go version:      go1.8.3
 Git commit:      6ffd653/1.12.6
 Built:           Wed Jun 14 14:31:17 2017
 OS/Arch:         linux/amd64

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-8:2-475-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 11.8 MB
 Data Space Total: 107.4 GB
 Data Space Available: 7.441 GB
 Metadata Space Used: 581.6 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data

And I have tried some other emptyDir cases, did not find the issue as nfs-provisioner pod.
BTW, deployment.yaml can totally work without any issue. Thanks.

Comment 3 Wenqi He 2017-07-03 09:51:35 UTC
This seems related to NFS. QE has a nfs-server pod which has the purpose to create nfs-server in the project and then test. But I found this pod also can repro this issue:

1. Create scc, service account
2. Create nfs-server pod:
$ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/storage/nfs/nfs-server.yaml

Check the node log:
153061 Jul  3 06:44:38 wehe-node-1 dockerd-current: time="2017-07-03T06:44:38.055266578Z" level=error msg="devmapper: Error unmounting device 3102       fa5f99655a415a3e84c755dc5101f38c4886fc1ef86082bcf5ac5f9621db: devicemapper: Can not set cookie: dm_task_set_cookie failed"
153062 Jul  3 06:44:38 wehe-node-1 dockerd-current: time="2017-07-03T06:44:38.055340579Z" level=error msg="Error unmounting container 8b1a0322f493       d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: devicemapper: Can not set cookie: dm_task_set_cookie failed"
153074 Jul  3 06:44:38 wehe-node-1 atomic-openshift-node: I0703 06:44:38.112036   12994 kubelet_pods.go:1168] Generating status for "nfs-server_1q       9kb(4b01b604-5fb6-11e7-b37e-000d3a1bc5aa)"
153075 Jul  3 06:44:38 wehe-node-1 atomic-openshift-node: I0703 06:44:38.112087   12994 kubelet.go:1847] SyncLoop (PLEG): "nfs-server_1q9kb(4b01b6       04-5fb6-11e7-b37e-000d3a1bc5aa)", event: &pleg.PodLifecycleEvent{ID:"4b01b604-5fb6-11e7-b37e-000d3a1bc5aa", Type:"ContainerDied", Data:"f8f       2a32fa0a04d7cc1f8afb728da7e2ad09e91e3781b2601dc5c8b08053a6540"}
153076 Jul  3 06:44:38 wehe-node-1 atomic-openshift-node: W0703 06:44:38.112116   12994 pod_container_deletor.go:77] Container "f8f2a32fa0a04d7cc1       f8afb728da7e2ad09e91e3781b2601dc5c8b08053a6540" not found in pod's containers
153077 Jul  3 06:44:38 wehe-node-1 atomic-openshift-node: I0703 06:44:38.112260   12994 status_manager.go:340] Status Manager: adding pod: "4b01b6       04-5fb6-11e7-b37e-000d3a1bc5aa", with status: ('\x04', {Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:10:20 +0000        UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:44:38 +0000 UTC ContainersNotReady containers with unready status: [nfs-ser       ver]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:10:20 +0000 UTC  }]   172.27.17.5  2017-07-03 06:10:20 +0000 UTC [] [{       nfs-server {nil nil &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2017-07-03 06:10:49 +0000 UTC,FinishedA       t:2017-07-03 06:44:25 +0000 UTC,ContainerID:docker://8b1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15,}} {nil nil nil} fals       e 0 docker.io/aosqe/nfs-server:latest docker-pullable://docker.io/aosqe/nfs-server@sha256:3e5e1fbfc51571a007451a98a43f6f30d7abbcc4a3ce63bb0       b39db253956e8b0 docker://8b1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15}] BestEffort}) to podStatusChannel
153078 Jul  3 06:44:38 wehe-node-1 atomic-openshift-node: I0703 06:44:38.112464   12994 status_manager.go:147] Status Manager: syncing pod: "4b01b       604-5fb6-11e7-b37e-000d3a1bc5aa", with status: (4, {Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:10:20 +0000 UTC         } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:44:38 +0000 UTC ContainersNotReady containers with unready status: [nfs-server]       } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-07-03 06:10:20 +0000 UTC  }]   172.27.17.5  2017-07-03 06:10:20 +0000 UTC [] [{nfs-       server {nil nil &ContainerStateTerminated{ExitCode:137,Signal:0,Reason:Error,Message:,StartedAt:2017-07-03 06:10:49 +0000 UTC,FinishedAt:20       17-07-03 06:44:25 +0000 UTC,ContainerID:docker://8b1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15,}} {nil nil nil} false 0        docker.io/aosqe/nfs-server:latest docker-pullable://docker.io/aosqe/nfs-server@sha256:3e5e1fbfc51571a007451a98a43f6f30d7abbcc4a3ce63bb0b39d       b253956e8b0 docker://8b1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15}] BestEffort}) from podStatusChannel
153433 Jul  3 06:45:07 wehe-node-1 dockerd-current: time="2017-07-03T06:45:07.149023906Z" level=error msg="Error removing mounted layer 8b1a0322f4       93d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: failed to remove device 3102fa5f99655a415a3e84c755dc5101f38c4886fc1ef86082bcf5ac5f9       621db:devicemapper: Can not set cookie: dm_task_set_cookie failed"
153434 Jul  3 06:45:07 wehe-node-1 dockerd-current: time="2017-07-03T06:45:07.149112407Z" level=error msg="Handler for DELETE /v1.24/containers/8b       1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15?v=1 returned error: Driver devicemapper failed to remove root filesystem 8b1       a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: failed to remove device 3102fa5f99655a415a3e84c755dc5101f38c4886fc1ef86082bc       f5ac5f9621db:devicemapper: Can not set cookie: dm_task_set_cookie failed"
153435 Jul  3 06:45:07 wehe-node-1 dockerd-current: time="2017-07-03T06:45:07.149143207Z" level=error msg="Handler for DELETE /v1.24/containers/8b       1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15 returned error: Driver devicemapper failed to remove root filesystem 8b1a032       2f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: failed to remove device 3102fa5f99655a415a3e84c755dc5101f38c4886fc1ef86082bcf5ac       5f9621db:devicemapper: Can not set cookie: dm_task_set_cookie failed"
153436 Jul  3 06:45:07 wehe-node-1 atomic-openshift-node: E0703 06:45:07.149642   12994 remote_runtime.go:241] RemoveContainer "8b1a0322f493d41aa1       525a56b943e82116fa33bcebe27c5a0f9299f321addd15" from runtime service failed: rpc error: code = 2 desc = failed to remove container "8b1a032       2f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15": Error response from daemon: {"message":"Driver devicemapper failed to remove ro       ot filesystem 8b1a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: failed to remove device 3102fa5f99655a415a3e84c755dc5101f38       c4886fc1ef86082bcf5ac5f9621db:devicemapper: Can not set cookie: dm_task_set_cookie failed"}
153437 Jul  3 06:45:07 wehe-node-1 atomic-openshift-node: E0703 06:45:07.149696   12994 kuberuntime_gc.go:123] Failed to remove container "8b1a032       2f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15": rpc error: code = 2 desc = failed to remove container "8b1a0322f493d41aa1525a56       b943e82116fa33bcebe27c5a0f9299f321addd15": Error response from daemon: {"message":"Driver devicemapper failed to remove root filesystem 8b1       a0322f493d41aa1525a56b943e82116fa33bcebe27c5a0f9299f321addd15: failed to remove device 3102fa5f99655a415a3e84c755dc5101f38c4886fc1ef86082bc       f5ac5f9621db:devicemapper: Can not set cookie: dm_task_set_cookie failed"}

Comment 4 Jan Safranek 2017-07-04 14:48:25 UTC
I tried to reproduce this bug with today's docker and openshift and I failed
docker-1.12.6-32.git88a4867.el7.x86_64
atomic-openshift-node-3.6.133-1.git.0.524e4c8.el7.x86_64

$ docker info | grep Deferred
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0

$ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/storage/nfs/nfs-server.yaml

The pod gets created and running and can be deleted without any problems.

I tried also docker-1.12.6-39.1.git6ffd653.el7.x86_64, still the same - works like charm. There must be something else wrong on the system.

Comment 5 Wenqi He 2017-07-05 02:34:20 UTC
(In reply to Jan Safranek from comment #4)
> I tried to reproduce this bug with today's docker and openshift and I failed
> docker-1.12.6-32.git88a4867.el7.x86_64
> atomic-openshift-node-3.6.133-1.git.0.524e4c8.el7.x86_64
> 
> $ docker info | grep Deferred
>  Deferred Removal Enabled: false
>  Deferred Deletion Enabled: false
>  Deferred Deleted Device Count: 0
> 
> $ oc create -f
> https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/storage/
> nfs/nfs-server.yaml
> 
> The pod gets created and running and can be deleted without any problems.
> 
> I tried also docker-1.12.6-39.1.git6ffd653.el7.x86_64, still the same -
> works like charm. There must be something else wrong on the system.

At the first time this pod could be deployed and running. But after we delete it, any other pod could not be scheduled again with my comment 3 errors. Do you still have the env? Could try to create a another pod:
$ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/hello-pod.json

Comment 6 Wenqi He 2017-07-05 10:28:00 UTC
And I have tried to only create the pod https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/storage/nfs/nfs-server.yaml, it will not repro. But if we create other pod to use the nfs volume, this issue would repro:

Detailed repro step:
1. Create a project
2. Create a scc https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/authorization/scc/scc_super_template.yaml
2. Add service account and user to this scc
# oc get scc super-wehe -o yaml
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: true
allowHostPID: false
allowHostPorts: true
allowPrivilegedContainer: true
allowedCapabilities: []
apiVersion: v1
defaultAddCapabilities: []
fsGroup:
  type: RunAsAny
groups:
- system:serviceaccounts:9tw98
kind: SecurityContextConstraints
metadata:
  creationTimestamp: 2017-07-05T09:51:43Z
  name: super-wehe
  resourceVersion: "63304"
  selfLink: /api/v1/securitycontextconstraints/super-wehe
  uid: 8ce63cbc-6167-11e7-b249-000d3a1a72a9
priority: null
readOnlyRootFilesystem: false
requiredDropCapabilities: []
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: RunAsAny
supplementalGroups:
  type: RunAsAny
users:
- wehe
volumes:
- '*'
3. Create nfs server pod
# oc create -f https://github.com/openshift-qe/v3-testfiles/raw/master/storage/nfs/nfs-server.yaml
4. Check the service
$ oc get svc
NAME          CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nfs-service   172.30.109.118   <none>        2049/TCP   9s
5. Update the a nfs pv to use this nfs-server
# cat pvrwo.json | oc create -f -
{
  "apiVersion": "v1",
  "kind": "PersistentVolume",
  "metadata": {
    "name": "wehepv"
  },
  "spec": {
    "capacity": {
        "storage": "2Gi"
    },
    "accessModes": [ "ReadWriteOnce" ],
    "nfs": {
        "path": "/",
        "server": "172.30.109.118"
    },
    "persistentVolumeReclaimPolicy": "Retain"
  }
}
6. Create a pvc to bind this pv
# cat pvc.json | oc create -f -
{
    "apiVersion": "v1",
    "kind": "PersistentVolumeClaim",
    "metadata": {
        "name": "pvc"
    },
    "spec": {
        "accessModes": [ "ReadWriteOnce" ],
        "resources": {
            "requests": {
                "storage": "1Gi"
            }
        }
    }
}
7. Create a pod 
oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/nfs/auto/web-pod.json to consume this pvc
8. Touch some file to check the read/write on mounted dir
]$ oc get pods
NAME         READY     STATUS    RESTARTS   AGE
nfs          1/1       Running   0          45s
nfs-server   1/1       Running   0          8m
$ oc exec -it nfs sh
sh-4.2$ ls /mnt/                                                                                                                                 
sh-4.2$ touch /mnt/test
sh-4.2$ exit

9. Delete all the pods and svc
$ oc delete pods --all
pod "nfs" deleted
pod "nfs-server" deleted
[wehe@dhcp-136-45 octest]$ oc get pods
NAME         READY     STATUS        RESTARTS   AGE
nfs          0/1       Terminating   0          2m
nfs-server   1/1       Terminating   0          9m

10. Create a pod again, and pod will keep in containercreating status
$ oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/hello-pod.json

$ oc get pods
NAME              READY     STATUS              RESTARTS   AGE
hello-openshift   0/1       ContainerCreating   0          10m

11. ssh to the node and run docker
# docker run --rm -ti docker.io/aosqe/nfs-web bash
/usr/bin/docker-current: Error response from daemon: devmapper: Error activating devmapper device for 'f76117935a9892fefc77651e018f8e0e6da797c4cd96b439053ba31925cf9154-init': devicemapper: Can't set cookie dm_task_set_cookie failed.

I think this is similar root cause of nfs-provisioner pod. Thanks.

Comment 7 Matthew Wong 2017-07-12 15:23:57 UTC
I wonder if this would happen with an "ordinary" NFS server? Since you say it happens only when a pod actually *uses* the NFS volume. Could you try using creating a pod that uses a non-nfs-provisioner pod NFS server NFS volume and see if anything breaks? There is something weird going on with devicemapper and the NFS mount maybe. Thanks. 

Regardless I'm leaning heavily towards it being a Docker problem that we have little control over. The NFS server pod is not doing anything particularly bad or different from x months ago, only Docker has changed.

Comment 8 Wenqi He 2017-07-14 10:36:08 UTC
I have tried on an real NFS server which is deployed on the node, this issue was not repro. Thanks