Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 850531[details]
lvm-messages a tail of LVM messages from /var/log/messages
Description of problem:
When running revolution_9 test scenario kill_secondary_and_log_synced_4_legs, clvmd crashes on all cluster nodes causing the cluster to reboot.
********* Mirror hash info for this scenario *********
* names: syncd_secondary_log_4legs_1 syncd_secondary_log_4legs_2
* sync: 1
* striped: 0
* leg devices: /dev/sdd1 /dev/sdf1 /dev/sdi1 /dev/sda1
* log devices: /dev/sde1
* no MDA devices:
* failpv(s): /dev/sdf1 /dev/sde1
* failnode(s): virt-122.cluster-qe.lab.eng.brq.redhat.com virt-123.cluster-qe.lab.eng.brq.redhat.com virt
-124.cluster-qe.lab.eng.brq.redhat.com
* lvmetad: 0
* leg fault policy: allocate
* log fault policy: allocate
******************************************************
The test just gets stuck after disabling the PVs since the nodes get rebooted. When fencing is turned off it is visible that clvmd is the culprit of these reboots.
On the nodes themselves some errors can be seen though:
Failed actions:
clvmd_monitor_60000 on virt-124.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=19, status=Timed Out, last-rc-change='Wed Jan 15 14:40:51 2014', queued=0ms, exec=0ms
clvmd_monitor_60000 on virt-122.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=21, status=Timed Out, last-rc-change='Wed Jan 15 14:40:53 2014', queued=0ms, exec=0ms
clvmd_monitor_60000 on virt-123.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=19, status=Timed Out, last-rc-change='Wed Jan 15 14:40:49 2014', queued=0ms, exec=0ms
clvmd.service - LSB: This service is Clusterd LVM Daemon.
Loaded: loaded (/etc/rc.d/init.d/clvmd)
Active: inactive (dead) since Wed 2014-01-15 14:41:14 CET; 10min ago
Process: 6948 ExecStop=/etc/rc.d/init.d/clvmd stop (code=exited, status=5)
Process: 1256 ExecStart=/etc/rc.d/init.d/clvmd start (code=exited, status=5)
Main PID: 1261 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/clvmd.service
Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[1256]: 0 logical volume(s) in volume group "helter_skelter" now active
Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[1256]: [FAILED]
Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started LSB: This service is Clusterd LVM Daemon..
Jan 15 14:41:07 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopping LSB: This service is Clusterd LVM Daemon....
Jan 15 14:41:12 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Deactivating clustered VG(s): Couldn't find device with uuid fhqnPF-ghGO-p5jJ-4dwN-Xqmp-swfl-8rlLXs.
Jan 15 14:41:12 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Couldn't find device with uuid pbDED5-dg4U-1u0r-ZqDa-jAEc-B9jP-wOI0VP.
Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Logical volume helter_skelter/syncd_secondary_log_4legs_1 contains a filesystem in use.
Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Can't deactivate volume group "helter_skelter" with 2 open logical volume(s)
Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: [FAILED]
Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopped LSB: This service is Clusterd LVM Daemon..
I will attach a file with a snipped from /var/log/messages as well not to spam the comments here.
Version-Release number of selected component (if applicable):
lvm2-2.02.103-10.el7.x86_64
lvm2-cluster-2.02.103-10.el7.x86_64
kernel-3.10.0-64.el7.x86_64
kernel-3.10.0-67.el7.x86_64
device-mapper-1.02.82-10.el7.x86_64
cmirror-2.02.103-10.el7.x86_64
How reproducible:
95% of the time
By running revolution_9 test, most reliably.
Steps to Reproduce:
Run revolution_9 test, scenario kill_secondary_and_log_synced_4_legs
It actually creates two LVs using the same order of PVs, a GFS2 FS is created on top and a leg and log PVs are failed after the mirrors are synced.
Actual results:
The whole cluster reboots due to self-fencing.
Expected results:
Not to fail so catastrophically as to reboot all the cluster nodes.
Please, try if this is still reproducible with latest lvm2-2.02.105-5.el7 and resource-agents-3.9.5-24.el7 (the clvmd now needs to be set as a cluster resource for pacemaker, the old initscripts are gone now).
This request was resolved in Red Hat Enterprise Linux 7.0.
Contact your manager or support representative in case you have further questions about the request.
Created attachment 850531 [details] lvm-messages a tail of LVM messages from /var/log/messages Description of problem: When running revolution_9 test scenario kill_secondary_and_log_synced_4_legs, clvmd crashes on all cluster nodes causing the cluster to reboot. ********* Mirror hash info for this scenario ********* * names: syncd_secondary_log_4legs_1 syncd_secondary_log_4legs_2 * sync: 1 * striped: 0 * leg devices: /dev/sdd1 /dev/sdf1 /dev/sdi1 /dev/sda1 * log devices: /dev/sde1 * no MDA devices: * failpv(s): /dev/sdf1 /dev/sde1 * failnode(s): virt-122.cluster-qe.lab.eng.brq.redhat.com virt-123.cluster-qe.lab.eng.brq.redhat.com virt -124.cluster-qe.lab.eng.brq.redhat.com * lvmetad: 0 * leg fault policy: allocate * log fault policy: allocate ****************************************************** The test just gets stuck after disabling the PVs since the nodes get rebooted. When fencing is turned off it is visible that clvmd is the culprit of these reboots. On the nodes themselves some errors can be seen though: Failed actions: clvmd_monitor_60000 on virt-124.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=19, status=Timed Out, last-rc-change='Wed Jan 15 14:40:51 2014', queued=0ms, exec=0ms clvmd_monitor_60000 on virt-122.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=21, status=Timed Out, last-rc-change='Wed Jan 15 14:40:53 2014', queued=0ms, exec=0ms clvmd_monitor_60000 on virt-123.cluster-qe.lab.eng.brq.redhat.com 'unknown error' (1): call=19, status=Timed Out, last-rc-change='Wed Jan 15 14:40:49 2014', queued=0ms, exec=0ms clvmd.service - LSB: This service is Clusterd LVM Daemon. Loaded: loaded (/etc/rc.d/init.d/clvmd) Active: inactive (dead) since Wed 2014-01-15 14:41:14 CET; 10min ago Process: 6948 ExecStop=/etc/rc.d/init.d/clvmd stop (code=exited, status=5) Process: 1256 ExecStart=/etc/rc.d/init.d/clvmd start (code=exited, status=5) Main PID: 1261 (code=exited, status=0/SUCCESS) CGroup: /system.slice/clvmd.service Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[1256]: 0 logical volume(s) in volume group "helter_skelter" now active Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[1256]: [FAILED] Jan 15 14:25:17 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started LSB: This service is Clusterd LVM Daemon.. Jan 15 14:41:07 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopping LSB: This service is Clusterd LVM Daemon.... Jan 15 14:41:12 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Deactivating clustered VG(s): Couldn't find device with uuid fhqnPF-ghGO-p5jJ-4dwN-Xqmp-swfl-8rlLXs. Jan 15 14:41:12 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Couldn't find device with uuid pbDED5-dg4U-1u0r-ZqDa-jAEc-B9jP-wOI0VP. Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Logical volume helter_skelter/syncd_secondary_log_4legs_1 contains a filesystem in use. Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: Can't deactivate volume group "helter_skelter" with 2 open logical volume(s) Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com clvmd[6948]: [FAILED] Jan 15 14:41:14 virt-124.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Stopped LSB: This service is Clusterd LVM Daemon.. I will attach a file with a snipped from /var/log/messages as well not to spam the comments here. Version-Release number of selected component (if applicable): lvm2-2.02.103-10.el7.x86_64 lvm2-cluster-2.02.103-10.el7.x86_64 kernel-3.10.0-64.el7.x86_64 kernel-3.10.0-67.el7.x86_64 device-mapper-1.02.82-10.el7.x86_64 cmirror-2.02.103-10.el7.x86_64 How reproducible: 95% of the time By running revolution_9 test, most reliably. Steps to Reproduce: Run revolution_9 test, scenario kill_secondary_and_log_synced_4_legs It actually creates two LVs using the same order of PVs, a GFS2 FS is created on top and a leg and log PVs are failed after the mirrors are synced. Actual results: The whole cluster reboots due to self-fencing. Expected results: Not to fail so catastrophically as to reboot all the cluster nodes.