Bug 2023224
Summary: | multipath -f fails with "map in use" error while removing the LUNs using "ovirt_remove_stale_lun" | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | nijin ashok <nashok> |
Component: | ovirt-ansible-collection | Assignee: | Vojtech Juranek <vjuranek> |
Status: | CLOSED ERRATA | QA Contact: | Amit Sharir <asharir> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.4.8 | CC: | aefrat, ahadas, apinnick, ddacosta, gveitmic, lsvaty, mgandhi, michal.skrivanek, mperina, sfishbai, vjuranek |
Target Milestone: | ovirt-4.4.10 | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-ansible-collection-1.6.6 | Doc Type: | Enhancement |
Doc Text: |
Previously, when running the 'ovirt_remove_stale_lun' Ansible role, the removal of the multipath device map could fail because of a conflict with a VGS scan. In the current release, the 'ovirt_remove_stale_lun' role for removing multipath is retried six times to allow the removal to succeed.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2022-02-08 10:07:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
nijin ashok
2021-11-15 09:07:15 UTC
Can you please supply the verification flow that is required in order to verify this bug? We want a flow that will resemble the most to the flow the customer used. To be more specific - please update on the following: 1. How to create the stale luns in the setup of the test. 2. Where the ansible script was executed from on the customer side (from the engine?). 3. Does the customer modify the ansible script in some way before running it? 4. The relevant commands that were used in the process/flow. 5. Is there some way to reproduce this error in a smaller environment? (QE doesn't have an environment with so many resources - 60+ hosts, 20+ storage domains, and 100+ LUNs). Thanks. (In reply to Amit Sharir from comment #3) > 1. How to create the stale luns in the setup of the test. You can try to remove any LUNs that are mapped to hosts which is not used by the storage domain or VM. > 2. Where the ansible script was executed from on the customer side (from the > engine?). engine. > 3. Does the customer modify the ansible script in some way before running it? No. > 4. The relevant commands that were used in the process/flow. Used the example yml https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/remove_stale_lun/examples/remove_stale_lun.yml and changed the values to match with the environment. > 5. Is there some way to reproduce this error in a smaller environment? (QE > doesn't have an environment with so many resources - 60+ hosts, 20+ storage > domains, and 100+ LUNs). We can ask vdsm to monitor SDs more aggressively by setting the below values in the vdsm conf so that it runs vgs every 2 seconds. [irs] repo_stats_cache_refresh_timeout=2 sd_health_check_delay=1 I hit the issue on 1 out of 5 runs in my test environment after setting the above value. It was 2 hosts, 2 SDs, 3 LUNs environment. > > Thanks. Fixed in nightly quay.io/ovirt/el8stream-ansible-executor:latest as of today, can be used with ansible 2.11 Following #c14 and #c10, moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV Engine and Host Common Packages [ovirt-4.4.10]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0463 According to the specific verification flow in comment 10: https://bugzilla.redhat.com/show_bug.cgi?id=2023224#c10 There are some steps in a verification flow that need some operations for the luns from the "NetApp system manager" UI, and we can't add them to our automation. The operations use the initiators mapping option via "NetApp system manager" UI. There is a TC in our automation that covered the removing stale lun from the hypervisor (TestCase27720) |