Bug 2023344
| Summary: | vdsmd fails to shut down cleanly when it tries to deactivate a used LV | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Milan Zamazal <mzamazal> |
| Component: | Core | Assignee: | Nir Soffer <nsoffer> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | sshmulev |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.50.0.1 | CC: | bugs, dfodor, eshames |
| Target Milestone: | ovirt-4.4.10 | Keywords: | ZStream |
| Target Release: | 4.40.100.1 | Flags: | pm-rhel:
ovirt-4.4+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | vdsm-4.40.100.1 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
Vdsm tried to deactivate logical volumes during shutdown.
Consequence:
Vdsm shutdown can fail, or programs using logical volumes managed by vdsm could fail.
Fix:
Vdsm does not deactivate logical volumes during shutdown.
Result:
Vdsm shutdown faster and programs using logical volumes managed by vdsm are not affected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-19 07:00:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(In reply to Milan Zamazal from comment #0) > Expected results: > > vdsmd stops cleanly within TimeoutStopSec interval, logging "[vds] Exiting" > as its last log message. This is not possible to guarantee. The expect result is not to tear down storage during shutdown. Tearing down storage must be done only when engnine ask to deactivate storage domain or disconnect from storage. When the vdsmd service stop, storage should be left as is. Storage domains that were active before vdsm was stopped should remain active. Testing this change:
1. Make sure host is up in a data center with active storage domain
2. List active special volumes:
# lvs | egrep 'ids|inbox|leases|measter|metadata|outbox|xleases'
ids 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-ao---- 128.00m
inbox 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m
leases 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 2.00g
metadata 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m
outbox 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m
xleases 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 1.00g
ids 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-ao---- 128.00m
inbox 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m
leases 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 2.00g
metadata 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m
outbox 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m
xleases 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 1.00g
ids feab3738-c158-4d48-8a41-b5a95c057a50 -wi-ao---- 128.00m
inbox feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m
leases feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 2.00g
metadata feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m
outbox feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m
xleases feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 1.00g
ids ffc11717-c272-4a41-be97-9ac75d401413 -wi-ao---- 128.00m
inbox ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m
leases ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 2.00g
metadata ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m
outbox ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m
xleases ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 1.00g
3. On the host, stop the vdsmd service
systemctl stop vdsmd
4. Make sure this log does not apear:
INFO ... [storage.blocksd] Tearing down domain 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (blockSD:996)\
5. Check that the active special volumes were not deactivated
To reproduce, use vdsm from 4.4.9. After stopping it, all the special lvs except
"ids" will be inactive.
Verified. Versions: ovirt-engine-4.4.10-0.17.el8ev vdsm-4.40.100.1-1.el8ev Steps to reproduce as described in comment2. Also tried to reproduce it on 4.4.9 - Same steps and got the line of "2021-11-24 15:29:18,672+0200 INFO (monitor/6cf08a4) [storage.BlockSD] Tearing down domain 6cf08a47-7cdc-42b1-b7d5-7c685a0ef45e (blockSD:995) " In this version(4.4.10), it does not reproduce, and the special volumes didn't activate. This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022. Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Description of problem: In some environments, Vdsm tries, when it is being stopped by systemd, to tear down a logical volume that is being used. This may block further shutdown steps performed by vdsmd, resulting in exceeding service TimeoutStopSec timeout, vdsmd being killed forcefully by systemd and leaving vdsmd shutdown procedure unfinished. Version-Release number of selected component (if applicable): current master (4.5 development) How reproducible: 100% in ost-basic-suite-master-host-1 after test_004_basic_sanity.py completes while vm0 is running there. Steps to Reproduce: 1. Run `systemctl stop vdsmd' on the host. 2. Check vdsm.log. Actual results: vdsmd doesn't complete its shutdown actions. The end of the log looks like this (notice it's not finished with "Exiting" message indicating vdsmd stopped its activity completely): 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Shutting down domain monitors (monitor:243) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 15fa3d6c-671b-46ef-af9a-00337011fa26 (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring a8bab4ef-2952-4c42-ba44-dbb3e1b8c87c (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 3a729aa1-8e14-4ea0-8794-e3d67fbde542 (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 0187cf2f-2344-48de-a2a0-dd007315399f (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (shutdown=True) (monitor:268) 2021-11-12 07:09:32,831+0000 INFO (monitor/0187cf2) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_exported/0187cf2f-2344-48de-a2a0-dd007315399f/dom_md/metadata' (check:135) 2021-11-12 07:09:32,838+0000 INFO (monitor/a8bab4e) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_share1/a8bab4ef-2952-4c42-ba44-dbb3e1b8c87c/dom_md/metadata' (check:135) 2021-11-12 07:09:32,844+0000 INFO (monitor/15fa3d6) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs__he/15fa3d6c-671b-46ef-af9a-00337011fa26/dom_md/metadata' (check:135) 2021-11-12 07:09:32,864+0000 INFO (monitor/3a729aa) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_share2/3a729aa1-8e14-4ea0-8794-e3d67fbde542/dom_md/metadata' (check:135) 2021-11-12 07:09:32,865+0000 INFO (monitor/186180c) [storage.check] Stop checking '/dev/186180cb-5cc5-4aa4-868a-9e1ed7965ddf/metadata' (check:135) 2021-11-12 07:09:32,867+0000 INFO (monitor/186180c) [storage.blocksd] Tearing down domain 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (blockSD:996) 2021-11-12 07:09:32,867+0000 DEBUG (monitor/186180c) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgchange --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }' --available n 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (cwd None) (commands:154) 2021-11-12 07:09:38,063+0000 DEBUG (monitor/186180c) [common.commands] FAILED: <err> = b' Logical volume 186180cb-5cc5-4aa4-868a-9e1ed7965ddf/ids in use.\n Can\'t deactivate volume group "186180cb-5cc5-4aa4-868a-9e1ed7965ddf" with 1 open logical volume(s)\n'; <rc> = 5 (commands:186) 2021-11-12 07:09:38,066+0000 WARN (monitor/186180c) [storage.lvm] Command with specific filter failed or returned no data, retrying with a wider filter: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgchange\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--available\', \'n\', \'186180cb-5cc5-4aa4-868a-9e1ed7965ddf\'] rc=5 out=[] err=[\' Logical volume 186180cb-5cc5-4aa4-868a-9e1ed7965ddf/ids in use.\', \' Can\\\'t deactivate volume group "186180cb-5cc5-4aa4-868a-9e1ed7965ddf" with 1 open logical volume(s)\']' (lvm:532) 2021-11-12 07:09:38,067+0000 DEBUG (monitor/186180c) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgchange --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405d40a7e761def4ed09aac63282$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|^/dev/mapper/36001405ece10ae48e99470295edda1bb$|^/dev/mapper/36001405f269e8e9efb34e5f9170b349b$|^/dev/mapper/36001405f8c5fe6c239e46a0976b842df$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }' --available n 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (cwd None) (commands:154) Expected results: vdsmd stops cleanly within TimeoutStopSec interval, logging "[vds] Exiting" as its last log message.