Bug 1474273 - [Gluster-block]: Deletion of a volume containing block devices leaves the tcmu-runner in a sub-optimal state.
[Gluster-block]: Deletion of a volume containing block devices leaves the tcm...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tcmu-runner (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Prasanna Kumar Kalever
Rahul Hinduja
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-24 05:30 EDT by Sweta Anandpara
Modified: 2017-09-28 13:43 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sweta Anandpara 2017-07-24 05:30:19 EDT
Description of problem:
=========================
Hit this while verifying bug 1456227.

Had about ~35 blocks present on a volume, did a volume stop and delete and then executed the command 'targetcli clearconfig confirm=True' - which resulted in multiple errors in the tcmu-runner logs as well as in the status of tcmu-runner.
'systemctl status tcmu-runner' even though it shows as 'active(running)' displayed the same error messages, indicating that tcmu-runner in that node was not healthy.

Restart of gluster-blockd did not help. Restart of tcmu-runner was hung. When tried to get the node (and its services) back to normal post weekend, did a node reboot. Status of tcmu-runner shows the service as dead, and any attempt to restart the same results in the same behaviour - 'active (running)' but lots of errors (pasted below) in the logs. Restart of gluster-blockd remains hung ( I am assuming as it tries to get tcmu-runner UP because of spec dependency).

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.8.4-33 and gluster-block-0.2.1-6

How reproducible:
=================
1:1


Steps to Reproduce:     
======================
// Same as mentioned in the bz 1456227

1. Create some gluster-blocks
2. Gluster volume stop the volume and delete it
3. targetcli clearconfig confirm=True

Actual results:
===============
tcmu-runner goes down, taking gluster-blockd along with it.

Expected results:
================
The services should not be affected.
Or if we are expecting something to go wrong, 'volume delete' should error out at the outset itself , mentioning that there are block devices present in it.


Additional info:
=================

I did have a few blocks (may be 3, but I am not sure) which were in a failed state, due to some negative testing that I was doing. But I have no idea what are the devices uio1, 2 and 3 that are mentioned in the logs.

root@dhcp47-115 ~]# systemctl status tcmu-runner
● tcmu-runner.service - LIO Userspace-passthrough daemon
   Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled)
   Active: active (running) since Wed 2017-07-19 03:21:54 EDT; 1 day 20h ago
 Main PID: 2737 (tcmu-runner)
   CGroup: /system.slice/tcmu-runner.service
           └─2737 /usr/bin/tcmu-runner --tcmu-log-dir=/var/log/gluster-block/

Jul 19 03:21:54 dhcp47-115.lab.eng.blr.redhat.com systemd[1]: Starting LIO Userspace-passthrough daemon...
Jul 19 03:21:54 dhcp47-115.lab.eng.blr.redhat.com systemd[1]: Started LIO Userspace-passthrough daemon.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.254 2737 [ERROR] remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.258 2737 [ERROR] remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio1: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.261 2737 [ERROR] remove_device:522 : Could not remove device uio1: not found.
[root@dhcp47-115 ~]# 


/var/log/messages also show the same errors wrt tcmu-runner.
------------------

Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.254 2737 [ERROR] remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 localhost journal: tcmu-runner#012: remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.258 2737 [ERROR] remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 localhost journal: tcmu-runner#012: remove_device:522 : Could not remove device uio1: not found.
Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.261 2737 [ERROR] remove_device:522 : Could not remove device uio1: not found.
Comment 2 Sweta Anandpara 2017-07-24 05:36:43 EDT
Gluster-block logs present in this location: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

[qe@rhsqe-repo 1474273]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$ pwd
/home/repo/sosreports/1474273
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$ ll
total 12
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:03 gluster-block_dhcp47-115
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:02 gluster-block_dhcp47-116
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:01 gluster-block_dhcp47-117
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$
Comment 7 Sweta Anandpara 2017-08-18 03:34:58 EDT
Removing the need-info on this bug as Humble has replied in comment 4.

Note You need to log in before you can comment on or make changes to this bug.