Bug 1429203
| Summary: | LVM operations sometimes hang forever | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Greg Scott <gscott> | ||||||
| Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> | ||||||
| lvm2 sub component: | Default / Unclassified (RHEL6) | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED WONTFIX | Docs Contact: | |||||||
| Severity: | urgent | ||||||||
| Priority: | urgent | CC: | agk, cshao, cww, gscott, heinzm, jbrassow, jscalf, lsurette, mkalinin, msnitzer, nsoffer, prajnoha, prockai, pstehlik, srevivo, teigland, tnisan, ycui, ykaul, zkabelac | ||||||
| Version: | 6.7 | ||||||||
| Target Milestone: | pre-dev-freeze | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | All | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-06-13 19:02:52 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1428637 | ||||||||
| Bug Blocks: | 1424853 | ||||||||
| Attachments: |
|
||||||||
Since 4.0, stuck lvm command will not harm monitoring, so a host will not become non-operational because monitoring was stopped. However, stuck lvm command may cause: - host will never activate (deactivating lvs never finish on startup) - timeouts in various flows - other failures if code takes a lock before calling lvm operation (we usually call lvm under various locks) This issue should be handled by the lvm team. David, can you look into this? We will need to see where the lvm command is blocked, e.g. on disk io, on a file lock. You may also want to reassign this bz to lvm when you have more data. This is what we see in the sosreport on host 0027, generated on Feb 25.
$ grep /var/lock/lvm/ process/lsof_-b_M_-n_-l
lvm 9548 0 3uR REG 0,17 0 277616082 /var/lock/lvm/P_global (deleted)
lvm 9548 0 4r REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 10410 0 3uW REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 10410 0 4r REG 0,17 0 291072 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1 (deleted)
lvm 10839 0 3u REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 10839 0 4rR REG 0,17 0 277616082 /var/lock/lvm/P_global (deleted)
lvm 12188 0 3r REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 15428 0 3r REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 30425 0 3r REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 34189 0 3u REG 0,17 0 335188164 /var/lock/lvm/V_e93af66a-cdcb-46a4-b52c-cf888902d354
lvm 37487 0 3r REG 0,17 0 277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm 42248 0 3uR REG 0,17 0 291072 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1 (deleted)
$ grep lvm process/ps_alxwww
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 9547 1 0 -20 72072 2516 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size
4 0 9548 9547 0 -20 44780 21816 flock_ S< ? 0:01 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size
4 0 10409 1 0 -20 72072 2512 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm lvextend --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754
4 0 10410 10409 0 -20 39224 16236 flock_ S< ? 0:00 /sbin/lvm lvextend --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754
4 0 10834 1 0 -20 72072 2516 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2
4 0 10839 10834 0 -20 44756 22052 flock_ S< ? 0:00 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2
0 0 12187 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 12188 12187 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
0 0 15427 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 15428 15427 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
4 0 19033 1 20 0 10724 1480 wait Ss ? 0:00 bash -c ?VERSION=`/bin/rpm -q --qf '[%{VERSION}]' sos | /bin/sed 's/\.//'`;?if [ $VERSION -ge 32 ]; then? /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine?elif [ $VERSION -ge 30 ]; then? /usr/sbin/sosreport --batch -k logs.all_logs=True -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2?elif [ $VERSION -ge 22 ]; then? /usr/sbin/sosreport --batch -k general.all_logs=True -o libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm?elif [ $VERSION -ge 17 ]; then? /usr/sbin/sosreport --no-progressbar -k general.all_logs=True -o vdsm,general,networking,hardware,process,yum,filesys?else? /bin/echo No valid version of sosreport found. 1>&2? exit 1?fi?
4 0 19039 19033 20 0 245416 32748 pipe_w S ? 0:13 /usr/bin/python /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine
0 0 30424 19039 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 30425 30424 20 0 40964 18300 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
4 0 34213 33208 0 -20 72068 2516 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm lvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484001a2|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags e93af66a-cdcb-46a4-b52c-cf888902d354
4 0 34217 34213 0 -20 28156 5088 - R< ? 0:00 /sbin/lvm lvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484001a2|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags e93af66a-cdcb-46a4-b52c-cf888902d354
0 0 37486 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 37487 37486 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
4 0 42247 1 0 -20 72072 2504 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c2
4 0 42248 42247 0 -20 48404 27448 semtim S< ? 0:01 /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38
If we match this with output from process/ps_auxwww:
$ awk '/lvm/ {print "pid="$2, "start="$9}' process/ps_auxwww
pid=9547 start=12:55
pid=9548 start=12:55
pid=10409 start=Feb24
pid=10410 start=Feb24
pid=10834 start=Feb24
pid=10839 start=Feb24
pid=12187 start=Feb24
pid=12188 start=Feb24
pid=15427 start=Feb24
pid=15428 start=Feb24
pid=19033 start=Feb24
pid=19039 start=Feb24
pid=30424 start=Feb24
pid=30425 start=Feb24
pid=37486 start=Feb24
pid=37487 start=Feb24
pid=42247 start=Feb19
pid=42248 start=Feb19
These processes are stuck since Feb 19
$ egrep 42248.+lvm process/ps_alxwww
4 0 42248 42247 0 -20 48404 27448 semtim S< ? 0:01 /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38
$ egrep 42247.+lvm process/ps_alxwww
4 0 42248 42247 0 -20 48404 27448 semtim S< ? 0:01 /sbin/lvm lvchange --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38
These commands are stuck since Feb 24:
$ egrep 10409.+lvm process/ps_alxwww
4 0 10409 1 0 -20 72072 2512 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm lvextend --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754
$ egrep 10410.+lvm process/ps_alxwww
4 0 10410 10409 0 -20 39224 16236 flock_ S< ? 0:00 /sbin/lvm lvextend --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754
$ egrep 10834.+lvm process/ps_alxwww
4 0 10834 1 0 -20 72072 2516 poll_s S< ? 0:00 /usr/bin/sudo -n /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2
$ egrep 10839.+lvm process/ps_alxwww
4 0 10839 10834 0 -20 44756 22052 flock_ S< ? 0:00 /sbin/lvm pvs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2
$ egrep 12187.+lvm process/ps_alxwww
0 0 12187 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 12188 12187 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
This was run by support, trying to understand why lvm commands are hang since Feb 19.
$ egrep 15427.+lvm process/ps_alxwww
0 0 15427 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 15428 15427 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
This was run by support, trying to understand why lvm commands are hang since Feb 19.
$ egrep 19033.+lvm process/ps_alxwww
4 0 19033 1 20 0 10724 1480 wait Ss ? 0:00 bash -c ?VERSION=`/bin/rpm -q --qf '[%{VERSION}]' sos | /bin/sed 's/\.//'`;?if [ $VERSION -ge 32 ]; then? /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine?elif [ $VERSION -ge 30 ]; then? /usr/sbin/sosreport --batch -k logs.all_logs=True -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2?elif [ $VERSION -ge 22 ]; then? /usr/sbin/sosreport --batch -k general.all_logs=True -o libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm?elif [ $VERSION -ge 17 ]; then? /usr/sbin/sosreport --no-progressbar -k general.all_logs=True -o vdsm,general,networking,hardware,process,yum,filesys?else? /bin/echo No valid version of sosreport found. 1>&2? exit 1?fi?
4 0 19039 19033 20 0 245416 32748 pipe_w S ? 0:13 /usr/bin/python /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine
This the sosreport command, stuck since Feb 24.
$ egrep 30424.+lvm process/ps_alxwww
0 0 30424 19039 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 30425 30424 20 0 40964 18300 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
More commands run by support...
$ egrep 37486.+lvm process/ps_alxwww
0 0 37486 1 20 0 4124 672 wait S ? 0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4 0 37487 37486 20 0 40964 18232 flock_ S ? 0:00 /sbin/lvm vgs -v -o +tags
So what I see here is:
- 2 lvm commands stuck since Feb 19
- more lvm commands stuck since Feb 24
Smells like deadlock in lvm to me.
David, do we need more data to assign this bug to LVM?
Greg, this host was booted on Feb 19.
We see two stuck commands since Feb 19 - these commands are run during vdsm startup
to deactivate logical volumes.
We see that the host was active on Feb 24 - it even became the SPM.
Now, if vdsm startup was stuck in Feb 19, how the host became the SPM on Feb 24?
This means vdsm was restarted.
$ grep 'I am' vdsm.log*
vdsm.log:MainThread::INFO::2017-02-25 13:05:01,799::vdsm::132::vds::(run) (PID: 33208) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
vdsm.log.1:MainThread::INFO::2017-02-25 12:55:04,061::vdsm::132::vds::(run) (PID: 9276) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
vdsm.log.1:MainThread::INFO::2017-02-25 12:59:19,572::vdsm::132::vds::(run) (PID: 14184) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
vdsm.log.86:MainThread::INFO::2017-02-19 15:04:10,928::vdsm::132::vds::(run) (PID: 45457) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
vdsm.log.87:MainThread::INFO::2017-02-19 14:58:10,688::vdsm::132::vds::(run) (PID: 40842) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
And we indeed see that vdsm was restarted on Feb 19 twice.
In sanlock.log we see that sanlock was started on:
2017-02-19 14:57:42+0000 936 [35696]: sanlock daemon started 2.8 host 4567d320-02bd-469a-90e1-62ca27c20548.gdcphx0027
sanlock usually start during boot, so I assume the machine was rebooted on Feb 19
14:57.
In vdsm.log.86 we see lvchange command:
storageRefresh::DEBUG::2017-02-19 15:00:33,594::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e38684c-3d70-4027-b10b-3b35d89b76ad dc25e385-332e-4feb-8ea2-78a7036ac8d1/4326c752-32d6-42ac-817f-2ddceb4137a5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/51472f0e-54ee-4e3d-8419-3a17e0a32fcc dc25e385-332e-4feb-8ea2-78a7036ac8d1/648e0f35-820c-469e-900b-5ee8adcf32d8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d553debb-dfa0-4739-bddb-7abac9122bc2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/58977c32-a861-4615-87b5-eebab5fb1c1b dc25e385-332e-4feb-8ea2-78a7036ac8d1/5a8bdf22-4590-4d0f-9736-f32e188ac898 dc25e385-332e-4feb-8ea2-78a7036ac8d1/381942ce-41af-42fe-a87b-9c7d785da54d dc25e385-332e-4feb-8ea2-78a7036ac8d1/0b4a02e8-bc93-492c-845f-02b6bda559eb dc25e385-332e-4feb-8ea2-78a7036ac8d1/0cd2cdba-b2d4-4e3a-8661-015afcbfa214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/32374523-8ab9-4075-9375-68c537f6fa28 dc25e385-332e-4feb-8ea2-78a7036ac8d1/67cc7f87-5f71-4969-bfe4-65b3d93541b6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c640fd9e-3d32-49b0-9228-b6cc0fd4da53 dc25e385-332e-4feb-8ea2-78a7036ac8d1/b18964cd-d7c6-48cd-8beb-c41142694f5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d5c054b-4b3a-406a-b772-37deb5ff515b dc25e385-332e-4feb-8ea2-78a7036ac8d1/a37bf01e-1c95-474c-ad4a-523c8b14dbe8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d2cc2f42-e24a-4aa2-8c0c-6b30d0832fe7 (cwd None)
This seems to be the same stuck command we see in the previous comment.
In the next log, vdsm.log.86, we see that vdsm was restarted:
MainThread::DEBUG::2017-02-19 15:03:38,306::vdsm::59::vds::(sigtermHandler) Received signal 15
MainThread::INFO::2017-02-19 15:04:10,928::vdsm::132::vds::(run) (PID: 45457) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)
This time vdsm startup succeeds:
storageRefresh::DEBUG::2017-02-19 15:10:34,085::hsm::410::Storage.HSM::(storageRefresh) HSM is ready
But the stuck lvm commands are probably still running.
Greg, is it possible that in Feb 19, an operator deleted the lock files
held by the stuck lvchange commands and restarted vdsm? I know this bad practice
was used in this site.
On rhel 7, systemd will kill any child command of vdsm using cgroups. On rhel 6,
terminating vdsm may leave vdsm child commands running, since the init scripts
give vdsm only 10 seconds to terminate, and then kill vdsm. If vdsm could not
terminate all child processes in 10 seconds, some may be still running after
vdsm service was stopped.
Remember the symptoms we saw at the time. SPM hosts suddenly went non-operational for no apparent reason and then would not activate when we rebooted them. And when we failed over to a different SPM host, *that* host also went non-operational. We had datacenters that could not come back alive and 2000 users and lots of high level managers screaming. So, yeah, we improvised. As I recall, that host booted on 2/12. It left 3 files in /var/lock/lvm dated 2/12 and I'm looking for the case where I preserved that. We tried restarting VDSM and the restart had a problem - then we got rid of those files in /var/lock/lvm and VDSM successfully started after that. I *think* that was the sequence of events. I'm digging through a bunch of cases to find where we documented it. This is a challenge to put back together because we had multiple hosts failing in multiple ways at two different sites, and multiple cases opening and closing. - Greg (In reply to Greg Scott from comment #5) > Remember the symptoms we saw at the time. SPM hosts suddenly went > non-operational for no apparent reason and then would not activate when we > rebooted them. This started with bad firmware upgrade on the storage server, then hosts were rebooted on Feb 19 - sanlock log is full of errors on this date, and we see a reboot after that. > And when we failed over to a different SPM host, *that* host > also went non-operational. We had datacenters that could not come back > alive and 2000 users and lots of high level managers screaming. So, yeah, > we improvised. But since we have the problem of lvm command stuck during boot, and they were not terminated when restarting vdsm, we may have a time bomb on these machines. This means that the issue we see on the spm in this host may be related to the lvm commands stuck for 5 days on this host. > As I recall, that host booted on 2/12. It left 3 files in /var/lock/lvm > dated 2/12 and I'm looking for the case where I preserved that. We tried > restarting VDSM and the restart had a problem - then we got rid of those > files in /var/lock/lvm and VDSM successfully started after that. I *think* > that was the sequence of events. I'm digging through a bunch of cases to > find where we documented it. > > This is a challenge to put back together because we had multiple hosts > failing in multiple ways at two different sites, and multiple cases opening > and closing. I think we need to make sure that all hosts are not running stuck lvm commands. Can you ask the operators to get the output of: ps auxf On all hosts? Lets check if we have stuck lvm commands running silently, and since when. Sorry for continuing to repeat this, but it seems to me that all the lvm issues can likely be traced back to the well known filter bug (causing all lvm commands to read all devices). If we can trace back the hanging LVM problem back to that filter bug and prove it's fixed with RHVH-7, you'll hear me yell for joy from half-way around the world. How do we prove it? Nir - I left a request for the customer to get the data you asked for. I'll leave the needsinfo turned on until we get the data back. - Greg And I was assuming that fixing the filter issue will not only avoid all of the unwanted scanning, but also avoid unwanted LV activations. Activating a huge number of unwanted LVs on the host would also be a contributing factor and needs to be fixed. So how do we test that assumption? Let's say I do an LVM operation with devices that should have been filtered out. Does it make sense for that LVM operation to hang forever? Or should it return an error? I vote for choice #2. If that's our problem, which we don't know for sure yet. (In reply to David Teigland from comment #9) > And I was assuming that fixing the filter issue will not only avoid all of > the unwanted scanning, but also avoid unwanted LV activations. Activating a > huge number of unwanted LVs on the host would also be a contributing factor > and needs to be fixed. David, how fixing the filter bug (bug 1428637) will prevent unwanted activation? Or you mean setup a proper lvm filter on the host, allowing access only to devices needed by the host? Right, I mean that both issues (filter bug and using a filter to prevent unwanted lvs from being activated) probably contributed to the problems and both should be fixed. Created attachment 1260964 [details]
ps auxf from Quincy; but missing key details.
I'm attaching the output from ps auxf from every host in the Quincy datacenter. But we're missing task details because their terminal emulation session scrolled off to the right instead of wrapping. I asked to run it again, this time doing:
ps auxf | grep lvm
ls -al /var/lock/lvm
- Greg
Thanks, Greg, there is lot of good data there!
Please do not filter this through grep, it will make it harder to see the big
picture, and detect orphaned lvm processes.
We just need to redirect the output to a file and copy the file, instead of
copying the console output.
Here is a summary of stuck processes from attachment 1260964 [details]:
# jabphx0005
root 3498 0.0 0.0 72080 2620 ? S< Feb18 0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p
root 3499 0.0 0.0 50132 29176 ? S< Feb18 0:02 \_ /sbin/lvm lvchange --config devices { preferred_name
# jabphx0008
root 19944 0.0 0.0 72076 2608 ? S< Feb12 0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p
root 19945 0.0 0.0 49300 28344 ? S< Feb12 0:02 \_ /sbin/lvm lvchange --config devices { preferred_name
# jabphx0018
root 4646 0.0 0.0 72080 2624 ? S< Feb26 0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p
root 4647 0.0 0.0 51424 30468 ? S< Feb26 0:02 \_ /sbin/lvm lvchange --config devices { preferred_name
# jabphx0022
root 33532 0.0 0.0 72080 2620 ? S< Feb12 0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p
root 33533 0.0 0.0 49368 28412 ? S< Feb12 0:01 \_ /sbin/lvm lvchange --config devices { preferred_name
So we have processes stuck for almost one month.
David, do you need more info?
If I were David, I'd want to know what UUIDs they're stuck on. Now that we have it down to 5 nodes, why don't I have them to a px auxf to a file on those and ls -al /var/lock/lvm. Ya know - as long as I'm asking for data, why don't I get sosreports too? - Greg Here is what I asked for in the support case: ***************************** For hosts 0005, 0008, 0018, and 0022 would you do this for us? ps auxf > /tmp/$(hostname)-auxf.txt ls -al /var/lock/lvm > /tmp/$(hostname)-ls.txt Also do sosreports from those hosts. Post the sosreports and outputs from the commands above to the case. I know this is a hassle, but we're looking for stuck LVM processes and this data will help isolate them and figure out why they're stuck. One more - which host is SPM? If it's not one of these above, let's do an sosreport from that one too. And flag it in the name that it's SPM. thanks ****************************** - Greg
> root 3498 0.0 0.0 72080 2620 ? S< Feb18 0:00
> /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p
> root 3499 0.0 0.0 50132 29176 ? S< Feb18 0:02 \_
> /sbin/lvm lvchange --config devices { preferred_name
Was this lvchange -an to deactivate LVs? I think I saw that somewhere. Didn't you tell me once that vdsm had this problem of trying to deactivate LVs that were in use? I suspect this goes back to the issue of activating LVs that you shouldn't be. Is vdsm running this lvchange? How could vdsm run lvchange and have it block for days without noticing any problem? There could also be an lvm bug related to running many lvchange -a commands concurrently.
(In reply to David Teigland from comment #17) > > root 3498 0.0 0.0 72080 2620 ? S< Feb18 0:00 > > /usr/bin/sudo -n /sbin/lvm lvchange --config devices { p > > root 3499 0.0 0.0 50132 29176 ? S< Feb18 0:02 \_ > > /sbin/lvm lvchange --config devices { preferred_name > > Was this lvchange -an to deactivate LVs? Correct. > Didn't you tell me once that vdsm had this problem of trying to deactivate > LVs that were in use? I suspect this goes back to the issue of activating > LVs that you shouldn't be. Correct, we added this 3 years ago as a workaround for auto activation during boot with FC storage. > Is vdsm running this lvchange? Yes, when vdsm starts, it get the status of all lvs on the system, and then run one lvchange command per vg, with all the lvs that are active but not open on that vg. > How could vdsm > run lvchange and have it block for days without noticing any problem? What happened here is this: 1. vdsm started this command during vdsm startup 2. The command did not return 3. Operator restarted vdsm 4. Vdsm was probably killed by the init script because it could not terminate within 10 seconds 5. The lvm command was orphaned, and because it was stuck, continue to run for almost a month > There > could also be an lvm bug related to running many lvchange -a commands > concurrently. Vdsm run the commands serially during startup, one command per vg. But later during normal operation, vdsm deactivate lvs concurrently, in many flows. When shutting down a vm, when finishing storage operations on lvs, etc. Moving to LVM after discussion with David. A core from the stuck lvchange commands might help us explain things in more detail, but the diagnosis and solution will probably remain the same as above: apply the fix for the filters bug and avoid activating unwanted LVs (then vdsm should not even need the lvchange -an workaround.) Can we fix lvm such that it returns an error status instead of hanging forever? Greg, we need core dump of the stuck lvm processes mentioned in comment 14. Here is what I asked the customer to get:
On host jabphx0005
gcore -o /tmp/$(hostname) 3498
gcore -o /tmp/$(hostname) 3499
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt
On host jabphx0008
gcore -o /tmp/$(hostname) 19944
gcore -o /tmp/$(hostname) 19945
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt
On host jabphx0018
gcore -o /tmp/$(hostname) 4646
gcore -o /tmp/$(hostname) 4647
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt
On host jabphx0022
gcore -o /tmp/$(hostname) 33532
gcore -o /tmp/$(hostname) 33533
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt
Those commands will create two files in /tmp, named {hostname}.{pid} and one file named {hostname}-lvm.txt. Post all those to the case.
And I still need
ps auxf
ls -al /var/lock/lvm
from all hosts in the other site.
I took this from a test RHEVH-6.7 system here at my place that boots from a local disk. It has never connected to a manager. I don't have a fiberchannel SAN here. My multipath.conf is virgin.
As you can see below, all the names and all the entries in /dev/mapper look generic. Are the names different with boot from SAN? I wonder if udev set up those symlinks in /dev/mapper? Or if /dev/dm-0 through /dev/dm-4 always maps the boot devices?
[root@rhevhtest admin]# mount
/dev/mapper/live-rw on / type ext2 (ro,noatime)
/dev/mapper/HostVG-Config on /config type ext4 (rw,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/var/log on /tmp/early-logs type none (rw,bind)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/data/images on /var/lib/libvirt/images type none (rw,bind)
/data/core on /var/log/core type none (rw,bind)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[root@rhevhtest admin]#
[root@rhevhtest admin]#
[root@rhevhtest admin]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
Config HostVG -wi-ao---- 8.00m
Data HostVG -wi-ao---- 43.38g
Logging HostVG -wi-ao---- 2.00g
Swap HostVG -wi-ao---- 3.87g
[root@rhevhtest admin]#
[root@rhevhtest admin]# vgs
VG #PV #LV #SN Attr VSize VFree
HostVG 1 4 0 wz--n- 49.28g 28.00m
[root@rhevhtest admin]#
[root@rhevhtest admin]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda4 HostVG lvm2 a-- 49.28g 28.00m
[root@rhevhtest admin]#
[root@rhevhtest admin]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/live-rw 1.5G 527M 971M 36% /
/dev/mapper/HostVG-Config
6.8M 1.5M 5.0M 23% /config
tmpfs 1.9G 52K 1.9G 1% /dev/shm
[root@rhevhtest admin]#
[root@rhevhtest admin]# more /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20160104.0.el6ev)
[root@rhevhtest admin]#
[root@rhevhtest admin]# ls /dev/mapper -al
total 0
drwxr-xr-x. 2 root root 160 2017-02-24 15:42 .
drwxr-xr-x. 20 root root 8820 2017-02-24 15:44 ..
crw-rw----. 1 root root 10, 58 2017-02-24 15:42 control
lrwxrwxrwx. 1 root root 7 2017-02-24 15:42 HostVG-Config -> ../dm-2
lrwxrwxrwx. 1 root root 7 2017-02-24 15:42 HostVG-Data -> ../dm-4
lrwxrwxrwx. 1 root root 7 2017-02-24 15:42 HostVG-Logging -> ../dm-3
lrwxrwxrwx. 1 root root 7 2017-02-24 15:42 HostVG-Swap -> ../dm-1
lrwxrwxrwx. 1 root root 7 2017-02-24 15:42 live-rw -> ../dm-0
[root@rhevhtest admin]#
[root@rhevhtest admin]#
[root@rhevhtest admin]# more /etc/multipath.conf
blacklist {
}
defaults {
find_multipaths yes
getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/d
ev/%n"
}
[root@rhevhtest admin]#
Created attachment 1261518 [details]
ps auxf and ls -al /var/lock/lvm from hosts 0001 - 0015 in Grafton
I'm attaching ps auxf output from Grafton hosts 0001 through 0015. The operators tell me they will have the rest ready soon. Most of these have lvm lvchange commands that look like they've been sitting there a while.
My phone rang again today (Sunday). Here we go again. We have two hypervisors that won't activate after a reboot. RHEV-H 6.7-2016-0104. They have 23,000+ lsblk objects, hald is consuming 100 percent of the CPU, and we have a couple of processes trying a massive lvchange operation on each hypervisor. So far, restarting vdsmd has been ineffective. The uptime is 68 minutes. I said to give it another hour and watch for hald in top to finish grinding. Then try restarting vdsmd. Hopefully I gave good advice. By now, we've identified the problem. We know LVM filtering is a solution, but it won't be easy to do with boot from SAN and RHEVH-6.8. And going to 6.8 with 34 hypervisors X 2 sites will take time - especially since every reboot will need at least an hour to grind through it all. Here is what we need: We need a workaround to make the best we can with what's in place right now. We need an LVM filtering solution that works with boot from SAN. We need a universal LVM filtering fix for RHEL 7.3.n, since other pure RHEL sites see a variation on this same theme. We need that filtering to go into the soonest RHVH we can find after the RHEL one is built. And, of course, we need it yesterday. Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com |
Description of problem: We've had several cases where RHEV-H hosts start an LVM operation that hangs forever. This triggers a series of ugly consequences that sometimes leads to entire data centers going non-operational. See additional info below for log extracts. Version-Release number of selected component (if applicable): Working with RHEVM 3.5.8 and RHEV-H 6.7-20160104 How reproducible: Intermittent Steps to Reproduce: 1. Working on it 2. 3. Actual results: LVM operations hang forever. Sometimes. Expected results: LVM operations should either complete successfully or return an error. Additional info: Here is one example Nir Soffer put together ******************** Here is whats going on (all on vdsm.log.15 in host 0027) Thread-122 got the operation mutex... Thread-122::DEBUG::2017-02-24 09:51:00,141::lvm::320::Storage.OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex And tried to reload pvs... Thread-122::DEBUG::2017-02-24 09:51:00,150::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm pvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2 (cwd None) This command never returned. All other monitoring threads are stuck waiting on the mutex: Thread-463::DEBUG::2017-02-24 09:51:00,257::lvm::514::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-70::DEBUG::2017-02-24 09:51:00,266::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 02ef0b2e-9448-41b0-86ad-430712d82c64::DEBUG::2017-02-24 09:51:01,441::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 8c2691fb-64f4-4e20-a964-eeb3688977e7::DEBUG::2017-02-24 09:51:01,547::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-27::DEBUG::2017-02-24 09:51:04,238::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-436::DEBUG::2017-02-24 09:51:05,052::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-23::DEBUG::2017-02-24 09:51:07,631::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-475::DEBUG::2017-02-24 09:51:17,678::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-447::DEBUG::2017-02-24 09:52:24,797::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-26::DEBUG::2017-02-24 09:53:15,834::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-15::DEBUG::2017-02-24 09:53:17,194::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-14::DEBUG::2017-02-24 09:53:31,227::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-20::DEBUG::2017-02-24 09:53:44,925::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-16::DEBUG::2017-02-24 09:53:51,046::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-19::DEBUG::2017-02-24 09:53:54,883::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-22::DEBUG::2017-02-24 09:53:58,980::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-28::DEBUG::2017-02-24 09:54:01,094::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-18::DEBUG::2017-02-24 09:54:03,989::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-29::DEBUG::2017-02-24 09:54:08,633::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-30::DEBUG::2017-02-24 09:54:13,039::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-458::DEBUG::2017-02-24 09:54:15,864::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-17::DEBUG::2017-02-24 09:54:17,464::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-21::DEBUG::2017-02-24 09:54:18,787::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-469::DEBUG::2017-02-24 09:54:25,064::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-441::DEBUG::2017-02-24 09:54:37,475::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-420::DEBUG::2017-02-24 09:54:58,368::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-431::DEBUG::2017-02-24 09:55:00,166::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... 70f3ee6d-51de-40a0-8347-594cfebdb506::DEBUG::2017-02-24 09:55:10,619::lvm::428::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-425::DEBUG::2017-02-24 09:56:01,422::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-453::DEBUG::2017-02-24 09:56:01,688::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Thread-1889::DEBUG::2017-02-24 09:56:06,857::lvm::498::Storage.OperationMutex::(_invalidateAllPvs) Operation 'lvm reload operation' is holding the operation mutex, waiting... Since the lvm command never returns, monitoring practically stopped, and the host became non-operational. ************************** We have other examples of LVM operations that never complete in various support cases and I'll post those in comments as I go back and find them.