RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1429203 - LVM operations sometimes hang forever
Summary: LVM operations sometimes hang forever
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.7
Hardware: All
OS: All
urgent
urgent
Target Milestone: pre-dev-freeze
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1428637
Blocks: 1424853
TreeView+ depends on / blocked
 
Reported: 2017-03-05 16:07 UTC by Greg Scott
Modified: 2020-05-14 15:42 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-13 19:02:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ps auxf from Quincy; but missing key details. (4.72 MB, text/plain)
2017-03-07 20:45 UTC, Greg Scott
no flags Details
ps auxf and ls -al /var/lock/lvm from hosts 0001 - 0015 in Grafton (175.28 KB, application/x-gzip)
2017-03-09 12:00 UTC, Greg Scott
no flags Details

Description Greg Scott 2017-03-05 16:07:21 UTC
Description of problem:
We've had several cases where RHEV-H hosts start an LVM operation that hangs forever.  This triggers a series of ugly consequences that sometimes leads to entire data centers going non-operational.  See additional info below for log extracts.

Version-Release number of selected component (if applicable):
Working with RHEVM 3.5.8 and RHEV-H 6.7-20160104

How reproducible:
Intermittent

Steps to Reproduce:
1. Working on it
2.
3.

Actual results:
LVM operations hang forever.  Sometimes.

Expected results:
LVM operations should either complete successfully or return an error.

Additional info:

Here is one example Nir Soffer put together

********************

Here is whats going on (all on vdsm.log.15 in host 0027)

Thread-122 got the operation mutex...

Thread-122::DEBUG::2017-02-24 09:51:00,141::lvm::320::Storage.OperationMutex::(_reloadpvs) Operation 'lvm reload operation' got the operation mutex

And tried to reload pvs...

Thread-122::DEBUG::2017-02-24 09:51:00,150::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm pvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2 (cwd None)

This command never returned.

All other monitoring threads are stuck waiting on the mutex:

Thread-463::DEBUG::2017-02-24 09:51:00,257::lvm::514::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-70::DEBUG::2017-02-24 09:51:00,266::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
02ef0b2e-9448-41b0-86ad-430712d82c64::DEBUG::2017-02-24 09:51:01,441::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
8c2691fb-64f4-4e20-a964-eeb3688977e7::DEBUG::2017-02-24 09:51:01,547::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-27::DEBUG::2017-02-24 09:51:04,238::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-436::DEBUG::2017-02-24 09:51:05,052::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-23::DEBUG::2017-02-24 09:51:07,631::lvm::371::Storage.OperationMutex::(_reloadvgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-475::DEBUG::2017-02-24 09:51:17,678::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-447::DEBUG::2017-02-24 09:52:24,797::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-26::DEBUG::2017-02-24 09:53:15,834::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-15::DEBUG::2017-02-24 09:53:17,194::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-14::DEBUG::2017-02-24 09:53:31,227::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-20::DEBUG::2017-02-24 09:53:44,925::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-16::DEBUG::2017-02-24 09:53:51,046::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-19::DEBUG::2017-02-24 09:53:54,883::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-22::DEBUG::2017-02-24 09:53:58,980::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-28::DEBUG::2017-02-24 09:54:01,094::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-18::DEBUG::2017-02-24 09:54:03,989::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-29::DEBUG::2017-02-24 09:54:08,633::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-30::DEBUG::2017-02-24 09:54:13,039::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-458::DEBUG::2017-02-24 09:54:15,864::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-17::DEBUG::2017-02-24 09:54:17,464::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-21::DEBUG::2017-02-24 09:54:18,787::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-469::DEBUG::2017-02-24 09:54:25,064::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-441::DEBUG::2017-02-24 09:54:37,475::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-420::DEBUG::2017-02-24 09:54:58,368::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-431::DEBUG::2017-02-24 09:55:00,166::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
70f3ee6d-51de-40a0-8347-594cfebdb506::DEBUG::2017-02-24 09:55:10,619::lvm::428::Storage.OperationMutex::(_reloadlvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-425::DEBUG::2017-02-24 09:56:01,422::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-453::DEBUG::2017-02-24 09:56:01,688::lvm::504::Storage.OperationMutex::(_invalidatevgs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-1889::DEBUG::2017-02-24 09:56:06,857::lvm::498::Storage.OperationMutex::(_invalidateAllPvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...

Since the lvm command never returns, monitoring practically stopped, and the host
became non-operational.

**************************

We have other examples of LVM operations that never complete in various support cases and I'll post those in comments as I go back and find them.

Comment 1 Nir Soffer 2017-03-06 12:36:43 UTC
Since 4.0, stuck lvm command will not harm monitoring, so a host will not become
non-operational because monitoring was stopped.

However, stuck lvm command may cause:
- host will never activate (deactivating lvs never finish on startup)
- timeouts in various flows
- other failures if code takes a lock before calling lvm operation (we usually
  call lvm under various locks)

This issue should be handled by the lvm team.

David, can you look into this?

Comment 2 David Teigland 2017-03-06 19:13:18 UTC
We will need to see where the lvm command is blocked, e.g. on disk io, on a file lock.  You may also want to reassign this bz to lvm when you have more data.

Comment 3 Nir Soffer 2017-03-07 07:57:31 UTC
This is what we see in the sosreport on host 0027, generated on Feb 25.


$ grep /var/lock/lvm/ process/lsof_-b_M_-n_-l

lvm        9548        0    3uR     REG               0,17             0  277616082 /var/lock/lvm/P_global (deleted)
lvm        9548        0    4r      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       10410        0    3uW     REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       10410        0    4r      REG               0,17             0     291072 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1 (deleted)
lvm       10839        0    3u      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       10839        0    4rR     REG               0,17             0  277616082 /var/lock/lvm/P_global (deleted)
lvm       12188        0    3r      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       15428        0    3r      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       30425        0    3r      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       34189        0    3u      REG               0,17             0  335188164 /var/lock/lvm/V_e93af66a-cdcb-46a4-b52c-cf888902d354
lvm       37487        0    3r      REG               0,17             0  277587507 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1:aux (deleted)
lvm       42248        0    3uR     REG               0,17             0     291072 /var/lock/lvm/V_dc25e385-332e-4feb-8ea2-78a7036ac8d1 (deleted)


$ grep lvm process/ps_alxwww 

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND

4     0  9547     1   0 -20  72072  2516 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size

4     0  9548  9547   0 -20  44780 21816 flock_ S<   ?          0:01 /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size

4     0 10409     1   0 -20  72072  2512 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm lvextend --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754

4     0 10410 10409   0 -20  39224 16236 flock_ S<   ?          0:00 /sbin/lvm lvextend --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754

4     0 10834     1   0 -20  72072  2516 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2

4     0 10839 10834   0 -20  44756 22052 flock_ S<   ?          0:00 /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2

0     0 12187     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags

4     0 12188 12187  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

0     0 15427     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags

4     0 15428 15427  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

4     0 19033     1  20   0  10724  1480 wait   Ss   ?          0:00 bash -c ?VERSION=`/bin/rpm -q --qf '[%{VERSION}]' sos | /bin/sed 's/\.//'`;?if [ $VERSION -ge 32 ]; then?    /usr/sbin/sosreport  --batch --all-logs         -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine?elif [ $VERSION -ge 30 ]; then?    /usr/sbin/sosreport  --batch -k logs.all_logs=True         -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2?elif [ $VERSION -ge 22 ]; then?    /usr/sbin/sosreport  --batch -k general.all_logs=True         -o libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm?elif [ $VERSION -ge 17 ]; then?    /usr/sbin/sosreport  --no-progressbar -k general.all_logs=True         -o vdsm,general,networking,hardware,process,yum,filesys?else?    /bin/echo No valid version of sosreport found. 1>&2?    exit 1?fi?

4     0 19039 19033  20   0 245416 32748 pipe_w S    ?          0:13 /usr/bin/python /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine
0     0 30424 19039  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags

4     0 30425 30424  20   0  40964 18300 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

4     0 34213 33208   0 -20  72068  2516 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm lvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484001a2|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags e93af66a-cdcb-46a4-b52c-cf888902d354

4     0 34217 34213   0 -20  28156  5088 -      R<   ?          0:00 /sbin/lvm lvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484001a2|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags e93af66a-cdcb-46a4-b52c-cf888902d354

0     0 37486     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags

4     0 37487 37486  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

4     0 42247     1   0 -20  72072  2504 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c2

4     0 42248 42247   0 -20  48404 27448 semtim S<   ?          0:01 /sbin/lvm lvchange --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38


If we match this with output from process/ps_auxwww:

$ awk '/lvm/ {print "pid="$2, "start="$9}' process/ps_auxwww
pid=9547 start=12:55
pid=9548 start=12:55
pid=10409 start=Feb24
pid=10410 start=Feb24
pid=10834 start=Feb24
pid=10839 start=Feb24
pid=12187 start=Feb24
pid=12188 start=Feb24
pid=15427 start=Feb24
pid=15428 start=Feb24
pid=19033 start=Feb24
pid=19039 start=Feb24
pid=30424 start=Feb24
pid=30425 start=Feb24
pid=37486 start=Feb24
pid=37487 start=Feb24
pid=42247 start=Feb19
pid=42248 start=Feb19


These processes are stuck since Feb 19

$ egrep 42248.+lvm process/ps_alxwww 

4     0 42248 42247   0 -20  48404 27448 semtim S<   ?          0:01 /sbin/lvm lvchange --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38

$ egrep 42247.+lvm process/ps_alxwww 

4     0 42248 42247   0 -20  48404 27448 semtim S<   ?          0:01 /sbin/lvm lvchange --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e38


These commands are stuck since Feb 24:

$ egrep 10409.+lvm process/ps_alxwww 

4     0 10409     1   0 -20  72072  2512 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm lvextend --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754

$ egrep 10410.+lvm process/ps_alxwww 

4     0 10410 10409   0 -20  39224 16236 flock_ S<   ?          0:00 /sbin/lvm lvextend --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --autobackup n --size 12288m dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754

$ egrep 10834.+lvm process/ps_alxwww 

4     0 10834     1   0 -20  72072  2516 poll_s S<   ?          0:00 /usr/bin/sudo -n /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2

$ egrep 10839.+lvm process/ps_alxwww 
4     0 10839 10834   0 -20  44756 22052 flock_ S<   ?          0:00 /sbin/lvm pvs --config  devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ 'a|/dev/mapper/3514f0c514840001b|/dev/mapper/3514f0c514840001c|/dev/mapper/3514f0c514840001d|/dev/mapper/3514f0c514840001e|/dev/mapper/3514f0c5148400023|/dev/mapper/3514f0c5148400043|/dev/mapper/3514f0c51484000df|/dev/mapper/3514f0c51484000e0|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c51484000e3|/dev/mapper/3514f0c51484000e4|/dev/mapper/3514f0c51484000e5|/dev/mapper/3514f0c51484000e6|/dev/mapper/3514f0c5148400177|/dev/mapper/3514f0c5148400178|/dev/mapper/3514f0c5148400179|/dev/mapper/3514f0c514840017a|/dev/mapper/3514f0c51484001a0|/dev/mapper/3514f0c51484001a1|/dev/mapper/3514f0c51484001a2|/dev/mapper/3514f0c51484001a3|/dev/mapper/3514f0c51484001a4|/dev/mapper/3514f0c51484001a5|/dev/mapper/3514f0c51484001a6|/dev/mapper/3514f0c51484001a7|/dev/mapper/3514f0c51484001a8|/dev/mapper/3514f0c51484001a9|/dev/mapper/3514f0c51484001aa|/dev/mapper/3514f0c51484001bb|/dev/mapper/3514f0c51484001bc|/dev/mapper/3514f0c51484001bd|/dev/mapper/3514f0c51484001be|/dev/mapper/3514f0c51484001bf|/dev/mapper/3514f0c51484001c0|/dev/mapper/3514f0c51484001c1|/dev/mapper/3514f0c51484001c2|/dev/mapper/3514f0c51484001c3|/dev/mapper/3514f0c51484001c4|/dev/mapper/3514f0c51484001c5|', 'r|.*|' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 }  --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,size,vg_name,vg_uuid,pe_start,pe_count,pe_alloc_count,mda_count,dev_size /dev/mapper/3514f0c51484000e2

$ egrep 12187.+lvm process/ps_alxwww 

0     0 12187     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4     0 12188 12187  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

This was run by support, trying to understand why lvm commands are hang since Feb 19.

$ egrep 15427.+lvm process/ps_alxwww 
0     0 15427     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4     0 15428 15427  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

This was run by support, trying to understand why lvm commands are hang since Feb 19.


$ egrep 19033.+lvm process/ps_alxwww 

4     0 19033     1  20   0  10724  1480 wait   Ss   ?          0:00 bash -c ?VERSION=`/bin/rpm -q --qf '[%{VERSION}]' sos | /bin/sed 's/\.//'`;?if [ $VERSION -ge 32 ]; then?    /usr/sbin/sosreport  --batch --all-logs         -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine?elif [ $VERSION -ge 30 ]; then?    /usr/sbin/sosreport  --batch -k logs.all_logs=True         -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2?elif [ $VERSION -ge 22 ]; then?    /usr/sbin/sosreport  --batch -k general.all_logs=True         -o libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm?elif [ $VERSION -ge 17 ]; then?    /usr/sbin/sosreport  --no-progressbar -k general.all_logs=True         -o vdsm,general,networking,hardware,process,yum,filesys?else?    /bin/echo No valid version of sosreport found. 1>&2?    exit 1?fi?
4     0 19039 19033  20   0 245416 32748 pipe_w S    ?          0:13 /usr/bin/python /usr/sbin/sosreport --batch --all-logs -o logs,libvirt,vdsm,general,networking,hardware,process,yum,filesys,devicemapper,selinux,kernel,memory,rpm,processor,pci,md,block,scsi,multipath,systemd,sanlock,lvm2,ovirt_hosted_engine

This the sosreport command, stuck since Feb 24.

$ egrep 30424.+lvm process/ps_alxwww 
0     0 30424 19039  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4     0 30425 30424  20   0  40964 18300 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

More commands run by support...

$ egrep 37486.+lvm process/ps_alxwww 
0     0 37486     1  20   0   4124   672 wait   S    ?          0:00 timeout 300s /sbin/lvm vgs -v -o +tags
4     0 37487 37486  20   0  40964 18232 flock_ S    ?          0:00 /sbin/lvm vgs -v -o +tags

So what I see here is:

- 2 lvm commands stuck since Feb 19
- more lvm commands stuck since Feb 24

Smells like deadlock in lvm to me.

David, do we need more data to assign this bug to LVM?

Comment 4 Nir Soffer 2017-03-07 08:30:09 UTC
Greg, this host was booted on Feb 19.

We see two stuck commands since Feb 19 - these commands are run during vdsm startup
to deactivate logical volumes.

We see that the host was active on Feb 24 - it even became the SPM.

Now, if vdsm startup was stuck in Feb 19, how the host became the SPM on Feb 24?
This means vdsm was restarted.

$ grep 'I am' vdsm.log*

vdsm.log:MainThread::INFO::2017-02-25 13:05:01,799::vdsm::132::vds::(run) (PID: 33208) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

vdsm.log.1:MainThread::INFO::2017-02-25 12:55:04,061::vdsm::132::vds::(run) (PID: 9276) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

vdsm.log.1:MainThread::INFO::2017-02-25 12:59:19,572::vdsm::132::vds::(run) (PID: 14184) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

vdsm.log.86:MainThread::INFO::2017-02-19 15:04:10,928::vdsm::132::vds::(run) (PID: 45457) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

vdsm.log.87:MainThread::INFO::2017-02-19 14:58:10,688::vdsm::132::vds::(run) (PID: 40842) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

And we indeed see that vdsm was restarted on Feb 19 twice.

In sanlock.log we see that sanlock was started on:

2017-02-19 14:57:42+0000 936 [35696]: sanlock daemon started 2.8 host 4567d320-02bd-469a-90e1-62ca27c20548.gdcphx0027

sanlock usually start during boot, so I assume the machine was rebooted on Feb 19
14:57.

In vdsm.log.86 we see lvchange command:

storageRefresh::DEBUG::2017-02-19 15:00:33,594::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/3514f0c51484000e1|/dev/mapper/3514f0c51484000e2|/dev/mapper/3514f0c5148400178|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --autobackup n --available n dc25e385-332e-4feb-8ea2-78a7036ac8d1/26b7b6ba-20e3-449d-bbbb-179dbdb0d214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/87c69927-a9a4-4a8d-9701-0203d9f7bb95 dc25e385-332e-4feb-8ea2-78a7036ac8d1/429951f3-50fd-43f4-ba77-1ee4a0a6f54f dc25e385-332e-4feb-8ea2-78a7036ac8d1/e374a525-6aed-4f23-9716-2531fa81b401 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e772c45-3af9-4553-bb19-e4ca1e154bb2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/3e92d7eb-e3f2-45d7-a37b-5389ba872f55 dc25e385-332e-4feb-8ea2-78a7036ac8d1/47e12eda-a416-4fc7-8641-24d6cdcb1cd8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ed22a4f8-1698-4828-b200-48da3916fdc6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ecc56472-4271-4633-9e86-6f0074befb2e dc25e385-332e-4feb-8ea2-78a7036ac8d1/0010acf7-294b-4767-a470-929fff5f492b dc25e385-332e-4feb-8ea2-78a7036ac8d1/85165b4d-5ae4-4b0a-b052-f2bbdf02c983 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c8968cb1-e454-4b41-ad5d-400b40f8125d dc25e385-332e-4feb-8ea2-78a7036ac8d1/eec01f1b-aa7b-46ec-adcf-727967b09bbb dc25e385-332e-4feb-8ea2-78a7036ac8d1/d5c5e1ce-15e0-4b98-8937-1b17f6b57cc3 dc25e385-332e-4feb-8ea2-78a7036ac8d1/aae4e6b4-79a4-47d1-9ef9-d1dddc5f923b dc25e385-332e-4feb-8ea2-78a7036ac8d1/f3cd6cd6-9b85-4f66-9fc9-c0e96f10d6ff dc25e385-332e-4feb-8ea2-78a7036ac8d1/f09dd0ae-7e4e-45bb-baf1-552513459e19 dc25e385-332e-4feb-8ea2-78a7036ac8d1/cd36bf4f-2754-4471-94c9-d286eab7433f dc25e385-332e-4feb-8ea2-78a7036ac8d1/528258e7-a51f-4025-8437-a19832f1c34b dc25e385-332e-4feb-8ea2-78a7036ac8d1/27148909-c5ed-4159-87f6-a817fb38073e dc25e385-332e-4feb-8ea2-78a7036ac8d1/8ef0917b-80f7-428b-8ab3-4fff25424e58 dc25e385-332e-4feb-8ea2-78a7036ac8d1/445c5052-6e5e-4b7d-b49c-7637578075c4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1285e1f4-fb70-4f23-9278-71c352f249d1 dc25e385-332e-4feb-8ea2-78a7036ac8d1/e233a812-9de7-4a32-b962-142539e3a4b7 dc25e385-332e-4feb-8ea2-78a7036ac8d1/faa9869c-6f31-441c-a601-2dc11ac2aa17 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d34d117f-46e0-4141-85b6-b092381e064e dc25e385-332e-4feb-8ea2-78a7036ac8d1/01c5b0d3-042c-4215-8e8f-184519c5fcb5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/f9598b6d-a0c0-4f41-9b02-ba344443665c dc25e385-332e-4feb-8ea2-78a7036ac8d1/1e9f5266-4298-4aa1-9c3a-608b8ddebf12 dc25e385-332e-4feb-8ea2-78a7036ac8d1/a904a94d-9f7d-428c-860f-ac13f731d36d dc25e385-332e-4feb-8ea2-78a7036ac8d1/932d8557-5182-4718-8f75-93b87c432537 dc25e385-332e-4feb-8ea2-78a7036ac8d1/fc21d80c-4ad4-4177-8c18-2802d72bc8d0 dc25e385-332e-4feb-8ea2-78a7036ac8d1/1673589a-0b47-47ad-8501-f1f23fd4ea61 dc25e385-332e-4feb-8ea2-78a7036ac8d1/ccb03021-50bc-48d1-ad1c-398a6c500d5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/43dabf02-7ad6-445e-863e-c02c75e422b4 dc25e385-332e-4feb-8ea2-78a7036ac8d1/6e95dc39-1888-42b9-902f-98ca999f21dd dc25e385-332e-4feb-8ea2-78a7036ac8d1/41e422fa-c765-46f5-9d73-9db73b109213 dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d690bff-7e78-4a64-b773-a40e7ecc7eff dc25e385-332e-4feb-8ea2-78a7036ac8d1/7d578e19-4b9a-4f57-8d09-384a607199ab dc25e385-332e-4feb-8ea2-78a7036ac8d1/b1f66a72-3332-4759-9ef4-9bbdc2112e69 dc25e385-332e-4feb-8ea2-78a7036ac8d1/62fa7c35-5be0-4498-b24f-42fca2629754 dc25e385-332e-4feb-8ea2-78a7036ac8d1/73f92179-a02e-4803-934d-8454c12e8695 dc25e385-332e-4feb-8ea2-78a7036ac8d1/33ee6c71-557b-4853-b873-b008afdbb69f dc25e385-332e-4feb-8ea2-78a7036ac8d1/8a186eba-1037-4438-9ee1-6b23f6c97a45 dc25e385-332e-4feb-8ea2-78a7036ac8d1/bafe8fc0-0098-4eb4-b801-81d801d7121c dc25e385-332e-4feb-8ea2-78a7036ac8d1/34e8045c-45a9-4089-bc0c-f7d752564c5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/abda7a11-ca27-4694-8df6-77264f8eddf6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/37df4990-acce-44a7-81ad-6ba340eef483 dc25e385-332e-4feb-8ea2-78a7036ac8d1/666295b9-36cd-4709-ba1d-2c278bfc2053 dc25e385-332e-4feb-8ea2-78a7036ac8d1/2e38684c-3d70-4027-b10b-3b35d89b76ad dc25e385-332e-4feb-8ea2-78a7036ac8d1/4326c752-32d6-42ac-817f-2ddceb4137a5 dc25e385-332e-4feb-8ea2-78a7036ac8d1/51472f0e-54ee-4e3d-8419-3a17e0a32fcc dc25e385-332e-4feb-8ea2-78a7036ac8d1/648e0f35-820c-469e-900b-5ee8adcf32d8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d553debb-dfa0-4739-bddb-7abac9122bc2 dc25e385-332e-4feb-8ea2-78a7036ac8d1/58977c32-a861-4615-87b5-eebab5fb1c1b dc25e385-332e-4feb-8ea2-78a7036ac8d1/5a8bdf22-4590-4d0f-9736-f32e188ac898 dc25e385-332e-4feb-8ea2-78a7036ac8d1/381942ce-41af-42fe-a87b-9c7d785da54d dc25e385-332e-4feb-8ea2-78a7036ac8d1/0b4a02e8-bc93-492c-845f-02b6bda559eb dc25e385-332e-4feb-8ea2-78a7036ac8d1/0cd2cdba-b2d4-4e3a-8661-015afcbfa214 dc25e385-332e-4feb-8ea2-78a7036ac8d1/32374523-8ab9-4075-9375-68c537f6fa28 dc25e385-332e-4feb-8ea2-78a7036ac8d1/67cc7f87-5f71-4969-bfe4-65b3d93541b6 dc25e385-332e-4feb-8ea2-78a7036ac8d1/c640fd9e-3d32-49b0-9228-b6cc0fd4da53 dc25e385-332e-4feb-8ea2-78a7036ac8d1/b18964cd-d7c6-48cd-8beb-c41142694f5e dc25e385-332e-4feb-8ea2-78a7036ac8d1/4d5c054b-4b3a-406a-b772-37deb5ff515b dc25e385-332e-4feb-8ea2-78a7036ac8d1/a37bf01e-1c95-474c-ad4a-523c8b14dbe8 dc25e385-332e-4feb-8ea2-78a7036ac8d1/d2cc2f42-e24a-4aa2-8c0c-6b30d0832fe7 (cwd None)

This seems to be the same stuck command we see in the previous comment.

In the next log, vdsm.log.86, we see that vdsm was restarted:

MainThread::DEBUG::2017-02-19 15:03:38,306::vdsm::59::vds::(sigtermHandler) Received signal 15

MainThread::INFO::2017-02-19 15:04:10,928::vdsm::132::vds::(run) (PID: 45457) I am the actual vdsm 4.16.32-1.el6ev gdcphx0027.statestr.com (2.6.32-573.12.1.el6.x86_64)

This time vdsm startup succeeds:

storageRefresh::DEBUG::2017-02-19 15:10:34,085::hsm::410::Storage.HSM::(storageRefresh) HSM is ready

But the stuck lvm commands are probably still running.

Greg, is it possible that in Feb 19, an operator deleted the lock files
held by the stuck lvchange commands and restarted vdsm? I know this bad practice
was used in this site.

On rhel 7, systemd will kill any child command of vdsm using cgroups. On rhel 6,
terminating vdsm may leave vdsm child commands running, since the init scripts
give vdsm only 10 seconds to terminate, and then kill vdsm. If vdsm could not
terminate all child processes in 10 seconds, some may be still running after
vdsm service was stopped.

Comment 5 Greg Scott 2017-03-07 12:27:37 UTC
Remember the symptoms we saw at the time.  SPM hosts suddenly went non-operational for no apparent reason and then would not activate when we rebooted them.  And when we failed over to a different SPM host, *that* host also went non-operational.  We had datacenters that could not come back alive and 2000 users and lots of high level managers screaming.  So, yeah, we improvised.

As I recall, that host booted on 2/12. It left 3 files in /var/lock/lvm dated 2/12 and I'm looking for the case where I preserved that.  We tried restarting VDSM and the restart had a problem - then we got rid of those files in /var/lock/lvm and VDSM successfully started after that.  I *think* that was the sequence of events.  I'm digging through a bunch of cases to find where we documented it.

This is a challenge to put back together because we had multiple hosts failing in multiple ways at two different sites, and multiple cases opening and closing.

- Greg

Comment 6 Nir Soffer 2017-03-07 15:03:27 UTC
(In reply to Greg Scott from comment #5)
> Remember the symptoms we saw at the time.  SPM hosts suddenly went
> non-operational for no apparent reason and then would not activate when we
> rebooted them.  

This started with bad firmware upgrade on the storage server, then hosts
were rebooted on Feb 19 - sanlock log is full of errors on this date, and we 
see a reboot after that.

> And when we failed over to a different SPM host, *that* host
> also went non-operational.  We had datacenters that could not come back
> alive and 2000 users and lots of high level managers screaming.  So, yeah,
> we improvised.

But since we have the problem of lvm command stuck during boot, and they were
not terminated when restarting vdsm, we may have a time bomb on these machines.
This means that the issue we see on the spm in this host may be related to the 
lvm commands stuck for 5 days on this host.
 
> As I recall, that host booted on 2/12. It left 3 files in /var/lock/lvm
> dated 2/12 and I'm looking for the case where I preserved that.  We tried
> restarting VDSM and the restart had a problem - then we got rid of those
> files in /var/lock/lvm and VDSM successfully started after that.  I *think*
> that was the sequence of events.  I'm digging through a bunch of cases to
> find where we documented it.
> 
> This is a challenge to put back together because we had multiple hosts
> failing in multiple ways at two different sites, and multiple cases opening
> and closing.

I think we need to make sure that all hosts are not running stuck lvm commands.

Can you ask the operators to get the output of:

    ps auxf

On all hosts? Lets check if we have stuck lvm commands running silently, and since
when.

Comment 7 David Teigland 2017-03-07 15:58:02 UTC
Sorry for continuing to repeat this, but it seems to me that all the lvm issues can likely be traced back to the well known filter bug (causing all lvm commands to read all devices).

Comment 8 Greg Scott 2017-03-07 16:06:35 UTC
If we can trace back the hanging LVM problem back to that filter bug and prove it's fixed with RHVH-7, you'll hear me yell for joy from half-way around the world.  How do we prove it?

Nir - I left a request for the customer to get the data you asked for.  I'll leave the needsinfo turned on until we get the data back.

- Greg

Comment 9 David Teigland 2017-03-07 16:08:35 UTC
And I was assuming that fixing the filter issue will not only avoid all of the unwanted scanning, but also avoid unwanted LV activations.  Activating a huge number of unwanted LVs on the host would also be a contributing factor and needs to be fixed.

Comment 10 Greg Scott 2017-03-07 16:45:03 UTC
So how do we test that assumption?

Let's say I do an LVM operation with devices that should have been filtered out.  Does it make sense for that LVM operation to hang forever?  Or should it return an error?  I vote for choice #2.

If that's our problem, which we don't know for sure yet.

Comment 11 Nir Soffer 2017-03-07 16:59:21 UTC
(In reply to David Teigland from comment #9)
> And I was assuming that fixing the filter issue will not only avoid all of
> the unwanted scanning, but also avoid unwanted LV activations.  Activating a
> huge number of unwanted LVs on the host would also be a contributing factor
> and needs to be fixed.

David, how fixing the filter bug (bug 1428637) will prevent unwanted activation?

Or you mean setup a proper lvm filter on the host, allowing access only to devices
needed by the host?

Comment 12 David Teigland 2017-03-07 17:03:12 UTC
Right, I mean that both issues (filter bug and using a filter to prevent unwanted lvs from being activated) probably contributed to the problems and both should be fixed.

Comment 13 Greg Scott 2017-03-07 20:45:39 UTC
Created attachment 1260964 [details]
ps auxf from Quincy; but missing key details.

I'm attaching the output from ps auxf from every host in the Quincy datacenter.  But we're missing task details because their terminal emulation session scrolled off to the right instead of wrapping.  I asked to run it again, this time doing:

ps auxf | grep lvm
ls -al /var/lock/lvm

- Greg

Comment 14 Nir Soffer 2017-03-07 23:32:42 UTC
Thanks, Greg, there is lot of good data there!

Please do not filter this through grep, it will make it harder to see the big
picture, and detect orphaned lvm processes.

We just need to redirect the output to a file and copy the file, instead of
copying the console output.

Here is a summary of stuck processes from attachment 1260964 [details]:

# jabphx0005

root      3498  0.0  0.0  72080  2620 ?        S<   Feb18   0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
root      3499  0.0  0.0  50132 29176 ?        S<   Feb18   0:02  \_ /sbin/lvm lvchange --config  devices { preferred_name

# jabphx0008

root     19944  0.0  0.0  72076  2608 ?        S<   Feb12   0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
root     19945  0.0  0.0  49300 28344 ?        S<   Feb12   0:02  \_ /sbin/lvm lvchange --config  devices { preferred_name

# jabphx0018

root      4646  0.0  0.0  72080  2624 ?        S<   Feb26   0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
root      4647  0.0  0.0  51424 30468 ?        S<   Feb26   0:02  \_ /sbin/lvm lvchange --config  devices { preferred_name

# jabphx0022

root     33532  0.0  0.0  72080  2620 ?        S<   Feb12   0:00 /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
root     33533  0.0  0.0  49368 28412 ?        S<   Feb12   0:01  \_ /sbin/lvm lvchange --config  devices { preferred_name

So we have processes stuck for almost one month.

David, do you need more info?

Comment 15 Greg Scott 2017-03-08 01:20:26 UTC
If I were David, I'd want to know what UUIDs they're stuck on.  Now that we have it down to 5 nodes, why don't I have them to a px auxf to a file on those and ls -al /var/lock/lvm.

Ya know - as long as I'm asking for data, why don't I get sosreports too?

- Greg

Comment 16 Greg Scott 2017-03-08 01:33:43 UTC
Here is what I asked for in the support case:

*****************************
For hosts 0005, 0008, 0018, and 0022

would you do this for us?

ps auxf > /tmp/$(hostname)-auxf.txt
ls -al /var/lock/lvm > /tmp/$(hostname)-ls.txt

Also do sosreports from those hosts.  Post the sosreports and outputs from the commands above to the case.  I know this is a hassle, but we're looking for stuck LVM processes and this data will help isolate them and figure out why they're stuck.

One more - which host is SPM?  If it's not one of these above, let's do an sosreport from that one too.  And flag it in the name that it's SPM.  

thanks
******************************

- Greg

Comment 17 David Teigland 2017-03-08 16:43:11 UTC
> root      3498  0.0  0.0  72080  2620 ?        S<   Feb18   0:00
> /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
> root      3499  0.0  0.0  50132 29176 ?        S<   Feb18   0:02  \_
> /sbin/lvm lvchange --config  devices { preferred_name

Was this lvchange -an to deactivate LVs?  I think I saw that somewhere.  Didn't you tell me once that vdsm had this problem of trying to deactivate LVs that were in use?  I suspect this goes back to the issue of activating LVs that you shouldn't be.  Is vdsm running this lvchange?  How could vdsm run lvchange and have it block for days without noticing any problem?  There could also be an lvm bug related to running many lvchange -a commands concurrently.

Comment 18 Nir Soffer 2017-03-08 16:51:58 UTC
(In reply to David Teigland from comment #17)
> > root      3498  0.0  0.0  72080  2620 ?        S<   Feb18   0:00
> > /usr/bin/sudo -n /sbin/lvm lvchange --config  devices { p
> > root      3499  0.0  0.0  50132 29176 ?        S<   Feb18   0:02  \_
> > /sbin/lvm lvchange --config  devices { preferred_name
> 
> Was this lvchange -an to deactivate LVs?

Correct.

> Didn't you tell me once that vdsm had this problem of trying to deactivate
> LVs that were in use?  I suspect this goes back to the issue of activating
> LVs that you shouldn't be.

Correct, we added this 3 years ago as a workaround for auto activation during
boot with FC storage.

> Is vdsm running this lvchange?

Yes, when vdsm starts, it get the status of all lvs on the system, and then run
one lvchange command per vg, with all the lvs that are active but not open on
that vg.

> How could vdsm
> run lvchange and have it block for days without noticing any problem?

What happened here is this:

1. vdsm started this command during vdsm startup
2. The command did not return
3. Operator restarted vdsm
4. Vdsm was probably killed by the init script because it could not terminate
   within 10 seconds
5. The lvm command was orphaned, and because it was stuck, continue to run for
   almost a month

> There
> could also be an lvm bug related to running many lvchange -a commands
> concurrently.

Vdsm run the commands serially during startup, one command per vg.

But later during normal operation, vdsm deactivate lvs concurrently, in many
flows. When shutting down a vm, when finishing storage operations on lvs, etc.

Comment 19 Nir Soffer 2017-03-08 18:08:59 UTC
Moving to LVM after discussion with David.

Comment 21 David Teigland 2017-03-08 18:14:49 UTC
A core from the stuck lvchange commands might help us explain things in more detail, but the diagnosis and solution will probably remain the same as above:
apply the fix for the filters bug and avoid activating unwanted LVs (then vdsm should not even need the lvchange -an workaround.)

Comment 22 Greg Scott 2017-03-08 20:01:13 UTC
Can we fix lvm such that it returns an error status instead of hanging forever?

Comment 23 Nir Soffer 2017-03-08 21:32:55 UTC
Greg, we need core dump of the stuck lvm processes mentioned in comment 14.

Comment 24 Greg Scott 2017-03-08 22:53:00 UTC
Here is what I asked the customer to get:

On host jabphx0005

gcore -o /tmp/$(hostname) 3498
gcore -o /tmp/$(hostname) 3499
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt

On host jabphx0008

gcore -o /tmp/$(hostname) 19944
gcore -o /tmp/$(hostname) 19945
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt

On host jabphx0018

gcore -o /tmp/$(hostname) 4646
gcore -o /tmp/$(hostname) 4647
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt

On host jabphx0022

gcore -o /tmp/$(hostname) 33532
gcore -o /tmp/$(hostname) 33533
ls -al /var/lock/lvm > /tmp/$(hostname)-lvm.txt

Those commands will create two files in /tmp, named {hostname}.{pid} and one file named {hostname}-lvm.txt.  Post all those to the case.

And I still need

ps auxf
ls -al /var/lock/lvm

from all hosts in the other site.

Comment 25 Greg Scott 2017-03-09 01:38:54 UTC
I took this from a test RHEVH-6.7 system here at my place that boots from a local disk. It has never connected to a manager. I don't have a fiberchannel SAN here. My multipath.conf is virgin.  

As you can see below, all the names and all the entries in /dev/mapper look generic. Are the names different with boot from SAN?  I wonder if udev set up those symlinks in /dev/mapper? Or if /dev/dm-0 through /dev/dm-4 always maps the boot devices?

[root@rhevhtest admin]# mount
/dev/mapper/live-rw on / type ext2 (ro,noatime)
/dev/mapper/HostVG-Config on /config type ext4 (rw,noatime)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
/var/log on /tmp/early-logs type none (rw,bind)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/data/images on /var/lib/libvirt/images type none (rw,bind)
/data/core on /var/log/core type none (rw,bind)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[root@rhevhtest admin]#
[root@rhevhtest admin]#
[root@rhevhtest admin]# lvs
  LV      VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  Config  HostVG -wi-ao----  8.00m                                              
  Data    HostVG -wi-ao---- 43.38g                                              
  Logging HostVG -wi-ao----  2.00g                                              
  Swap    HostVG -wi-ao----  3.87g                                              
[root@rhevhtest admin]#
[root@rhevhtest admin]# vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  HostVG   1   4   0 wz--n- 49.28g 28.00m
[root@rhevhtest admin]#
[root@rhevhtest admin]# pvs
  PV         VG     Fmt  Attr PSize  PFree
  /dev/sda4  HostVG lvm2 a--  49.28g 28.00m
[root@rhevhtest admin]#
[root@rhevhtest admin]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/live-rw   1.5G  527M  971M  36% /
/dev/mapper/HostVG-Config
                      6.8M  1.5M  5.0M  23% /config
tmpfs                 1.9G   52K  1.9G   1% /dev/shm
[root@rhevhtest admin]#
[root@rhevhtest admin]# more /etc/redhat-release
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20160104.0.el6ev)
[root@rhevhtest admin]#
[root@rhevhtest admin]# ls /dev/mapper -al
total 0
drwxr-xr-x.  2 root root    160 2017-02-24 15:42 .
drwxr-xr-x. 20 root root   8820 2017-02-24 15:44 ..
crw-rw----.  1 root root 10, 58 2017-02-24 15:42 control
lrwxrwxrwx.  1 root root      7 2017-02-24 15:42 HostVG-Config -> ../dm-2
lrwxrwxrwx.  1 root root      7 2017-02-24 15:42 HostVG-Data -> ../dm-4
lrwxrwxrwx.  1 root root      7 2017-02-24 15:42 HostVG-Logging -> ../dm-3
lrwxrwxrwx.  1 root root      7 2017-02-24 15:42 HostVG-Swap -> ../dm-1
lrwxrwxrwx.  1 root root      7 2017-02-24 15:42 live-rw -> ../dm-0
[root@rhevhtest admin]#
[root@rhevhtest admin]#
[root@rhevhtest admin]# more /etc/multipath.conf

blacklist {
}

defaults {
        find_multipaths yes
getuid_callout "/lib/udev/scsi_id --replace-whitespace --whitelisted --device=/d
ev/%n"
}
[root@rhevhtest admin]#

Comment 26 Greg Scott 2017-03-09 12:00:11 UTC
Created attachment 1261518 [details]
ps auxf and ls -al /var/lock/lvm from hosts 0001 - 0015 in Grafton

I'm attaching ps auxf output from Grafton hosts 0001 through 0015. The operators tell me they will have the rest ready soon. Most of these have lvm lvchange commands that look like they've been sitting there a while.

Comment 27 Greg Scott 2017-03-12 20:49:32 UTC
My phone rang again today (Sunday). Here we go again. We have two hypervisors that won't activate after a reboot.  RHEV-H 6.7-2016-0104.

They have 23,000+ lsblk objects, hald is consuming 100 percent of the CPU, and we have a couple of processes trying a massive lvchange operation on each hypervisor.  So far, restarting vdsmd has been ineffective.

The uptime is 68 minutes.  I said to give it another hour and watch for hald in top to finish grinding.  Then try restarting vdsmd.  Hopefully I gave good advice.

By now, we've identified the problem. We know LVM filtering is a solution, but it won't be easy to do with boot from SAN and RHEVH-6.8.  And going to 6.8 with 34 hypervisors X 2 sites will take time - especially since every reboot will need at least an hour to grind through it all.

Here is what we need:

We need a workaround to make the best we can with what's in place right now.

We need an LVM filtering solution that works with boot from SAN.

We need a universal LVM filtering fix for RHEL 7.3.n, since other pure RHEL sites see a variation on this same theme.

We need that filtering to go into the soonest RHVH we can find after the RHEL one is built.  

And, of course, we need it yesterday.

Comment 28 Chris Williams 2017-06-13 19:02:52 UTC
Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.
 
The official life cycle policy can be reviewed here:
 
http://redhat.com/rhel/lifecycle
 
This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification.  Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:
 
https://access.redhat.com


Note You need to log in before you can comment on or make changes to this bug.