Bug 1966873 - [RFE] Create Ansible role for removing stale LUNs example remove_mpath_device.yml
Summary: [RFE] Create Ansible role for removing stale LUNs example remove_mpath_device...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-ansible-collection
Version: 4.4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.4.8-1
: 4.4.8
Assignee: Vojtech Juranek
QA Contact: Amit Sharir
URL:
Whiteboard:
Depends On: 1310330
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-02 06:09 UTC by Eyal Shenitzky
Modified: 2021-09-08 14:13 UTC (History)
14 users (show)

Fixed In Version: ovirt-ansible-collection-1.6.2
Doc Type: Enhancement
Doc Text:
A new role, `remove_stale_lun` has been added to the oVirt Ansible collection. The `remove_stale_lun` role iterates through all the hosts in a data center and removes stale LUN device from the hosts. This role accepts two parameters: data center name and LUN WWID. Before running this role, the LUN must be unzoned from the storage server by thestorage server administrator. Otherwise, the LUN will reappear on the hosts shortly after the operation.
Clone Of:
Environment:
Last Closed: 2021-09-08 14:13:18 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-collection pull 320 0 None open roles: add role for removing stale LUN 2021-08-10 08:17:29 UTC
Github oVirt ovirt-ansible-collection pull 334 0 None None None 2021-08-26 11:06:56 UTC
Red Hat Product Errata RHBA-2021:3464 0 None None None 2021-09-08 14:13:20 UTC

Description Eyal Shenitzky 2021-06-02 06:09:09 UTC
Description of problem:

remove_mpath_device.yml example playbook in ovirt-ansible-collection/examples should be wrapped to an ansible role.

https://github.com/oVirt/ovirt-ansible-collection/blob/master/examples/remove_mpath_device.yml

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Avihai 2021-07-18 11:27:53 UTC
As Ilan already verified this here[1] with the ansible example in 4.4.5, what is the verification scenario in this bug?
Is it the same as just using an ansible role instead of the ansible example?


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1310330#c112

Comment 3 Vojtech Juranek 2021-07-19 07:25:04 UTC
(In reply to Avihai from comment #2)
> As Ilan already verified this here[1] with the ansible example in 4.4.5,
> what is the verification scenario in this bug?
> Is it the same as just using an ansible role instead of the ansible example?

yes, the only difference probably would be that in previous verification Ilan had all hosts listed in the inventory (not mentioned in the verification steps) and now the role should gather all the affected hosts (all hosts in the data center) and listing the hosts in the ansible inventory won't be needed. All other steps are same.

Comment 6 Amit Sharir 2021-08-18 13:25:24 UTC
I used the following steps to verify this:

Verified on rhv-release-4.4.5-11

1. create an iSCSI domain using LUNs from the same storage server (if you create only one and detach it, hosts will log out from storage server and there won't be any stale LUNs)

sdh                                                     8:112  0   10G  0 disk  
└─3600a09803830447a4f244c4657616f79                   253:7    0   10G  0 mpath 
  ├─a605a814--2774--4995--830b--e9d477ff92e4-metadata 253:29   0  128M  0 lvm   
  ├─a605a814--2774--4995--830b--e9d477ff92e4-ids      253:30   0  128M  0 lvm   
  ├─a605a814--2774--4995--830b--e9d477ff92e4-inbox    253:31   0  128M  0 lvm   
  ├─a605a814--2774--4995--830b--e9d477ff92e4-outbox   253:32   0  128M  0 lvm   
  ├─a605a814--2774--4995--830b--e9d477ff92e4-leases   253:33   0    2G  0 lvm   
  ├─a605a814--2774--4995--830b--e9d477ff92e4-xleases  253:34   0    1G  0 lvm   
  └─a605a814--2774--4995--830b--e9d477ff92e4-master   253:35   0    1G  0 lvm  


2. put one of the domains into maintenance
3. detach this storage domain
4. remove the corresponding LUN on the storage server
5. verify that there is still multipath device corresponding to LUN used by this SD on the hosts, e.g.

sdh                                                     8:112  0   10G  0 disk  
└─3600a09803830447a4f244c4657616f79                   253:7    0   10G  0 mpath 


6. run attached ansible script

[root@storage-ge13-vdsm1 ~]# ansible-playbook --extra-vars "lun=3600a09803830447a4f244c4657616f79" /usr/share/doc/ovirt-ansible-collection/examples/remove_mpath_device.yml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [Cleanly remove unzoned storage devices (LUNs)] *********************************************************************************

TASK [Gathering Facts] ***************************************************************************************************************
ok: [localhost]

TASK [Get underlying disks (paths) for a multipath device and turn them into a list.] ************************************************
changed: [localhost]

TASK [Remove from multipath device.] *************************************************************************************************
changed: [localhost]

TASK [Remove each path from the SCSI subsystem.] *************************************************************************************
changed: [localhost] => (item=sdh)

PLAY RECAP ***************************************************************************************************************************
localhost                  : ok=4    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   



7. verify the mpath device is removed from the hosts and there are no complaints in the system log that multipath device is not accessible

[root@storage-ge13-vdsm1 ~]# multipath -ll | grep 616f79
[root@storage-ge13-vdsm1 ~]# 
[root@storage-ge13-vdsm1 ~]# 
[root@storage-ge13-vdsm1 ~]# lsblk | grep 616f79



From what I can see the remove_mpath_device.yml example playbook in ovirt-ansible-collection/examples is wrapped to an ansible role (like required).
I did find an issue with the ansible connection to the host from the engine machine. The connection requires SSH Key-Based Authentication configuration between the engine and host on the root user (something that our customers will not have). It will be better to use the private key of the ovirt engine which is used for the ssh connection. 

So this bug isn't verified since the ovirt-ansible-collection/examples is wrapped to an ansible role but our end-users won't be able to use this feature in the current state.
In other words - this feature is missing documentation or an additional fix/workaround.

Comment 7 Vojtech Juranek 2021-08-26 10:48:02 UTC
ack, should be fixed in https://github.com/oVirt/ovirt-ansible-collection/pull/334

Comment 12 Vojtech Juranek 2021-08-27 08:43:44 UTC
> 6. run attached ansible script
> 
> [root@storage-ge13-vdsm1 ~]# ansible-playbook --extra-vars
> "lun=3600a09803830447a4f244c4657616f79"
> /usr/share/doc/ovirt-ansible-collection/examples/remove_mpath_device.yml

This example was removed and role should be used instead. You should use /usr/share/doc/ovirt-ansible-collection/roles/remove_stale_lun/examples/remove_stale_lun.yml

Comment 14 Amit Sharir 2021-08-31 15:28:16 UTC
Version: 
vdsm-4.40.80.5-1.el8ev.x86_64
ovirt-engine-4.4.8.5-0.4.el8ev.noarch

Verification steps:
I did the steps mentioned above with this example script (https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/remove_stale_lun/examples/remove_stale_lun.yml)

Verification conclusions:
The expected output matched the actual output.
The total flow mentioned was done with no errors, currently, the ovirt-ansible-collection/examples is wrapped to an ansible role like required.
In addition, the issue that I mentioned in comment#6 of the problematic connection that requires SSH Key-Based Authentication configuration between the engine and host on the root user was fixed (meaning that now our end users and customers will be able to use this feature in their current environment configuration).

Bug verified.

Comment 18 errata-xmlrpc 2021-09-08 14:13:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages security update [ovirt-4.4.8] 0-day), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3464


Note You need to log in before you can comment on or make changes to this bug.