Description of problem: On a CNS setup, a script was run to create block PVCs. When the block devices were being created, tcmu-runner process was killed on two out of three gluster pods one after the other. As expected the creation of block devices failed. However when the tcmu-runner service was manually restarted, it failed with the following error: sh-4.2# systemctl status tcmu-runner ● tcmu-runner.service - LIO Userspace-passthrough daemon Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled) Active: failed (Result: timeout) since Mon 2018-07-09 05:03:18 UTC; 22s ago Main PID: 29211 CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode95e4671_7f74_11e8_974c_005056a525c4.slice/docker-ccd4f307cd274a7d2bedf1ddc20a9c3115a2fc07e432074b1a7e0a5d3665c737.scope/system.slice/tcmu-runner.service └─29211 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block Jul 09 04:58:50 dhcp46-244.lab.eng.blr.redhat.com tcmu-runner[29211]: 2018-07-09 04:58:50.726 29211 [ERROR] tcmu_glfs_open:529 test-vol_glusterfs_claim30_fcf9c007-80cd-11e8-a4e5-0a580a810203: glfs_open(vol=vol_37b50be8ac1...e or directory Jul 09 04:58:50 dhcp46-244.lab.eng.blr.redhat.com tcmu-runner[29211]: tcmu_glfs_open:529 test-vol_glusterfs_claim30_fcf9c007-80cd-11e8-a4e5-0a580a810203: glfs_open(vol=vol_37b50be8ac1fb551ad7f1b2985d8b6a7, file=block-stor...e or directory Jul 09 04:58:50 dhcp46-244.lab.eng.blr.redhat.com tcmu-runner[29211]: 2018-07-09 04:58:50.726 29211 [ERROR] add_device:486 : handler open failed for uio29 Jul 09 04:58:50 dhcp46-244.lab.eng.blr.redhat.com tcmu-runner[29211]: add_device:486 : handler open failed for uio29 Jul 09 05:00:17 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service start operation timed out. Terminating. Jul 09 05:01:48 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service stop-final-sigterm timed out. Killing. Jul 09 05:03:18 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service still around after final SIGKILL. Entering failed mode. Jul 09 05:03:18 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: Failed to start LIO Userspace-passthrough daemon. Jul 09 05:03:18 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: Unit tcmu-runner.service entered failed state. Jul 09 05:03:18 dhcp46-244.lab.eng.blr.redhat.com systemd[1]: tcmu-runner.service failed. # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- systemctl is-active tcmu-runner; done glusterfs-storage-cctj8 +++++++++++++++++++++++ active glusterfs-storage-qpk4g +++++++++++++++++++++++ failed command terminated with exit code 3 glusterfs-storage-w9jcs +++++++++++++++++++++++ failed command terminated with exit code 3 # for i in `oc get pods -o wide| grep glusterfs|cut -d " " -f1` ; do echo $i; echo +++++++++++++++++++++++; oc exec $i -- ps aux|grep Ds; done glusterfs-storage-cctj8 +++++++++++++++++++++++ glusterfs-storage-qpk4g +++++++++++++++++++++++ root 1423 0.0 0.0 122668 11028 ? Ds 04:51 0:00 /usr/bin/python /usr/bin/targetctl clear root 2437 0.0 0.0 527176 19532 ? Ds 05:01 0:00 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block glusterfs-storage-w9jcs +++++++++++++++++++++++ root 25246 0.0 0.0 122668 11040 ? Ds 04:51 0:00 /usr/bin/python /usr/bin/targetctl clear root 29211 0.0 0.0 842380 19692 ? Ds 04:58 0:00 /usr/bin/tcmu-runner --tcmu-log-dir /var/log/glusterfs/gluster-block Version-Release number of selected component (if applicable): # oc version oc v3.10.0-0.67.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO # rpm -qa|grep gluster glusterfs-client-xlators-3.8.4-54.12.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.12.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.12.el7rhgs.x86_64 glusterfs-libs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-3.8.4-54.12.el7rhgs.x86_64 glusterfs-api-3.8.4-54.12.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.12.el7rhgs.x86_64 glusterfs-server-3.8.4-54.12.el7rhgs.x86_64 gluster-block-0.2.1-20.el7rhgs.x86_64 heketi-7.0.0-2.el7rhgs.x86_64 How reproducible: 1/1 Steps to Reproduce: 1. Create block devices 2. While block devices are getting created kill tcmu-runner process (kill -9 <process_id>) 3. Restart tcmu-runner service (systemctl start tcmu-runner) Actual results: tcmu-runner fails to start Expected results: tcmu-runner should start Additional info: Logs will be attached soon