Bug 1474273 - [Gluster-block]: Deletion of a block hosting volume containing block devices leaves the tcmu-runner in a sub-optimal state.
Summary: [Gluster-block]: Deletion of a block hosting volume containing block devices ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tcmu-runner
Version: cns-3.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Prasanna Kumar Kalever
QA Contact: Rahul Hinduja
URL:
Whiteboard:
: 1467851 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-24 09:30 UTC by Sweta Anandpara
Modified: 2019-02-07 08:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 08:13:35 UTC
Embargoed:


Attachments (Terms of Use)

Description Sweta Anandpara 2017-07-24 09:30:19 UTC
Description of problem:
=========================
Hit this while verifying bug 1456227.

Had about ~35 blocks present on a volume, did a volume stop and delete and then executed the command 'targetcli clearconfig confirm=True' - which resulted in multiple errors in the tcmu-runner logs as well as in the status of tcmu-runner.
'systemctl status tcmu-runner' even though it shows as 'active(running)' displayed the same error messages, indicating that tcmu-runner in that node was not healthy.

Restart of gluster-blockd did not help. Restart of tcmu-runner was hung. When tried to get the node (and its services) back to normal post weekend, did a node reboot. Status of tcmu-runner shows the service as dead, and any attempt to restart the same results in the same behaviour - 'active (running)' but lots of errors (pasted below) in the logs. Restart of gluster-blockd remains hung ( I am assuming as it tries to get tcmu-runner UP because of spec dependency).

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.8.4-33 and gluster-block-0.2.1-6

How reproducible:
=================
1:1


Steps to Reproduce:     
======================
// Same as mentioned in the bz 1456227

1. Create some gluster-blocks
2. Gluster volume stop the volume and delete it
3. targetcli clearconfig confirm=True

Actual results:
===============
tcmu-runner goes down, taking gluster-blockd along with it.

Expected results:
================
The services should not be affected.
Or if we are expecting something to go wrong, 'volume delete' should error out at the outset itself , mentioning that there are block devices present in it.


Additional info:
=================

I did have a few blocks (may be 3, but I am not sure) which were in a failed state, due to some negative testing that I was doing. But I have no idea what are the devices uio1, 2 and 3 that are mentioned in the logs.

root@dhcp47-115 ~]# systemctl status tcmu-runner
● tcmu-runner.service - LIO Userspace-passthrough daemon
   Loaded: loaded (/usr/lib/systemd/system/tcmu-runner.service; static; vendor preset: disabled)
   Active: active (running) since Wed 2017-07-19 03:21:54 EDT; 1 day 20h ago
 Main PID: 2737 (tcmu-runner)
   CGroup: /system.slice/tcmu-runner.service
           └─2737 /usr/bin/tcmu-runner --tcmu-log-dir=/var/log/gluster-block/

Jul 19 03:21:54 dhcp47-115.lab.eng.blr.redhat.com systemd[1]: Starting LIO Userspace-passthrough daemon...
Jul 19 03:21:54 dhcp47-115.lab.eng.blr.redhat.com systemd[1]: Started LIO Userspace-passthrough daemon.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.254 2737 [ERROR] remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.258 2737 [ERROR] remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: tcmu-runner
                                                                     : remove_device:522 : Could not remove device uio1: not found.
Jul 20 23:35:10 dhcp47-115.lab.eng.blr.redhat.com tcmu-runner[2737]: 2017-07-20 23:35:10.261 2737 [ERROR] remove_device:522 : Could not remove device uio1: not found.
[root@dhcp47-115 ~]# 


/var/log/messages also show the same errors wrt tcmu-runner.
------------------

Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.254 2737 [ERROR] remove_device:522 : Could not remove device uio3: not found.
Jul 20 23:35:10 localhost journal: tcmu-runner#012: remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.258 2737 [ERROR] remove_device:522 : Could not remove device uio2: not found.
Jul 20 23:35:10 localhost journal: tcmu-runner#012: remove_device:522 : Could not remove device uio1: not found.
Jul 20 23:35:10 localhost tcmu-runner: 2017-07-20 23:35:10.261 2737 [ERROR] remove_device:522 : Could not remove device uio1: not found.

Comment 2 Sweta Anandpara 2017-07-24 09:36:43 UTC
Gluster-block logs present in this location: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

[qe@rhsqe-repo 1474273]$ hostname
rhsqe-repo.lab.eng.blr.redhat.com
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$ pwd
/home/repo/sosreports/1474273
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$ ll
total 12
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:03 gluster-block_dhcp47-115
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:02 gluster-block_dhcp47-116
drwxr-xr-x. 2 qe qe 4096 Jul 24 15:01 gluster-block_dhcp47-117
[qe@rhsqe-repo 1474273]$ 
[qe@rhsqe-repo 1474273]$

Comment 7 Sweta Anandpara 2017-08-18 07:34:58 UTC
Removing the need-info on this bug as Humble has replied in comment 4.

Comment 10 Prasanna Kumar Kalever 2018-09-21 09:22:13 UTC
*** Bug 1467851 has been marked as a duplicate of this bug. ***

Comment 11 Prasanna Kumar Kalever 2019-02-07 08:13:35 UTC
We shouldn't use targetcli to clean the volumes. Only use gluster-block.


Note You need to log in before you can comment on or make changes to this bug.