Description of problem: In some environments, Vdsm tries, when it is being stopped by systemd, to tear down a logical volume that is being used. This may block further shutdown steps performed by vdsmd, resulting in exceeding service TimeoutStopSec timeout, vdsmd being killed forcefully by systemd and leaving vdsmd shutdown procedure unfinished. Version-Release number of selected component (if applicable): current master (4.5 development) How reproducible: 100% in ost-basic-suite-master-host-1 after test_004_basic_sanity.py completes while vm0 is running there. Steps to Reproduce: 1. Run `systemctl stop vdsmd' on the host. 2. Check vdsm.log. Actual results: vdsmd doesn't complete its shutdown actions. The end of the log looks like this (notice it's not finished with "Exiting" message indicating vdsmd stopped its activity completely): 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Shutting down domain monitors (monitor:243) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 15fa3d6c-671b-46ef-af9a-00337011fa26 (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring a8bab4ef-2952-4c42-ba44-dbb3e1b8c87c (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 3a729aa1-8e14-4ea0-8794-e3d67fbde542 (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 0187cf2f-2344-48de-a2a0-dd007315399f (shutdown=True) (monitor:268) 2021-11-12 07:09:32,820+0000 INFO (MainThread) [storage.monitor] Stop monitoring 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (shutdown=True) (monitor:268) 2021-11-12 07:09:32,831+0000 INFO (monitor/0187cf2) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_exported/0187cf2f-2344-48de-a2a0-dd007315399f/dom_md/metadata' (check:135) 2021-11-12 07:09:32,838+0000 INFO (monitor/a8bab4e) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_share1/a8bab4ef-2952-4c42-ba44-dbb3e1b8c87c/dom_md/metadata' (check:135) 2021-11-12 07:09:32,844+0000 INFO (monitor/15fa3d6) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs__he/15fa3d6c-671b-46ef-af9a-00337011fa26/dom_md/metadata' (check:135) 2021-11-12 07:09:32,864+0000 INFO (monitor/3a729aa) [storage.check] Stop checking '/rhev/data-center/mnt/ost-he-basic-suite-master-storage:_exports_nfs_share2/3a729aa1-8e14-4ea0-8794-e3d67fbde542/dom_md/metadata' (check:135) 2021-11-12 07:09:32,865+0000 INFO (monitor/186180c) [storage.check] Stop checking '/dev/186180cb-5cc5-4aa4-868a-9e1ed7965ddf/metadata' (check:135) 2021-11-12 07:09:32,867+0000 INFO (monitor/186180c) [storage.blocksd] Tearing down domain 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (blockSD:996) 2021-11-12 07:09:32,867+0000 DEBUG (monitor/186180c) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgchange --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }' --available n 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (cwd None) (commands:154) 2021-11-12 07:09:38,063+0000 DEBUG (monitor/186180c) [common.commands] FAILED: <err> = b' Logical volume 186180cb-5cc5-4aa4-868a-9e1ed7965ddf/ids in use.\n Can\'t deactivate volume group "186180cb-5cc5-4aa4-868a-9e1ed7965ddf" with 1 open logical volume(s)\n'; <rc> = 5 (commands:186) 2021-11-12 07:09:38,066+0000 WARN (monitor/186180c) [storage.lvm] Command with specific filter failed or returned no data, retrying with a wider filter: LVM command failed: 'cmd=[\'/sbin/lvm\', \'vgchange\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--available\', \'n\', \'186180cb-5cc5-4aa4-868a-9e1ed7965ddf\'] rc=5 out=[] err=[\' Logical volume 186180cb-5cc5-4aa4-868a-9e1ed7965ddf/ids in use.\', \' Can\\\'t deactivate volume group "186180cb-5cc5-4aa4-868a-9e1ed7965ddf" with 1 open logical volume(s)\']' (lvm:532) 2021-11-12 07:09:38,067+0000 DEBUG (monitor/186180c) [common.commands] /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgchange --config 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405423a0b45cef54e6e9fd0c7df4$|^/dev/mapper/36001405d40a7e761def4ed09aac63282$|^/dev/mapper/36001405e9302e3f55e74aedbf352b79f$|^/dev/mapper/36001405ece10ae48e99470295edda1bb$|^/dev/mapper/36001405f269e8e9efb34e5f9170b349b$|^/dev/mapper/36001405f8c5fe6c239e46a0976b842df$|", "r|.*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }' --available n 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (cwd None) (commands:154) Expected results: vdsmd stops cleanly within TimeoutStopSec interval, logging "[vds] Exiting" as its last log message.
(In reply to Milan Zamazal from comment #0) > Expected results: > > vdsmd stops cleanly within TimeoutStopSec interval, logging "[vds] Exiting" > as its last log message. This is not possible to guarantee. The expect result is not to tear down storage during shutdown. Tearing down storage must be done only when engnine ask to deactivate storage domain or disconnect from storage. When the vdsmd service stop, storage should be left as is. Storage domains that were active before vdsm was stopped should remain active.
Testing this change: 1. Make sure host is up in a data center with active storage domain 2. List active special volumes: # lvs | egrep 'ids|inbox|leases|measter|metadata|outbox|xleases' ids 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-ao---- 128.00m inbox 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m leases 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 2.00g metadata 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m outbox 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 128.00m xleases 313e6d78-80f7-41ab-883b-d1bddf77a5da -wi-a----- 1.00g ids 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-ao---- 128.00m inbox 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m leases 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 2.00g metadata 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m outbox 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 128.00m xleases 4345b708-77ab-4477-9d7b-b9e6d053cb1f -wi-a----- 1.00g ids feab3738-c158-4d48-8a41-b5a95c057a50 -wi-ao---- 128.00m inbox feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m leases feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 2.00g metadata feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m outbox feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 128.00m xleases feab3738-c158-4d48-8a41-b5a95c057a50 -wi-a----- 1.00g ids ffc11717-c272-4a41-be97-9ac75d401413 -wi-ao---- 128.00m inbox ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m leases ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 2.00g metadata ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m outbox ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 128.00m xleases ffc11717-c272-4a41-be97-9ac75d401413 -wi-a----- 1.00g 3. On the host, stop the vdsmd service systemctl stop vdsmd 4. Make sure this log does not apear: INFO ... [storage.blocksd] Tearing down domain 186180cb-5cc5-4aa4-868a-9e1ed7965ddf (blockSD:996)\ 5. Check that the active special volumes were not deactivated To reproduce, use vdsm from 4.4.9. After stopping it, all the special lvs except "ids" will be inactive.
Verified. Versions: ovirt-engine-4.4.10-0.17.el8ev vdsm-4.40.100.1-1.el8ev Steps to reproduce as described in comment2. Also tried to reproduce it on 4.4.9 - Same steps and got the line of "2021-11-24 15:29:18,672+0200 INFO (monitor/6cf08a4) [storage.BlockSD] Tearing down domain 6cf08a47-7cdc-42b1-b7d5-7c685a0ef45e (blockSD:995) " In this version(4.4.10), it does not reproduce, and the special volumes didn't activate.
This bugzilla is included in oVirt 4.4.10 release, published on January 18th 2022. Since the problem described in this bug report should be resolved in oVirt 4.4.10 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.