Bug 1428444

Summary:

clvmd segfault during HA cluster recovery

Product:

Red Hat Enterprise Linux 7

Reporter:

Corey Marthaler <cmarthal>

Component:

lvm2

Assignee:

Alasdair Kergon <agk>

lvm2 sub component:

Clustering / clvmd

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

agk, cmarthal, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, prockai, rbednar, zkabelac

Version:

7.3

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

lvm2-2.02.175-1.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-10 15:18:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1469559

Attachments:

Description	Flags
log from host-077	none
log from host-078	none
strace of clvmd -d1 -f -C on host-113	none
debug logging of clvmd on host-114	none
debug logging of clvmd on host-115	none

Description Corey Marthaler 2017-03-02 15:52:38 UTC

Description of problem:
I had the following HA lvm mirror config set up and then killed host-076. During the recovery, clvmd on host-078 segfaulted resulting in additional fence operations. I'll attach the actual logs from the two nodes (-077 and -078) dealing with -076 going down.


[root@host-076 ~]# pcs status
Cluster name: STSRHTS21625
Stack: corosync
Current DC: host-078 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum
Last updated: Thu Mar  2 09:09:43 2017          Last change: Wed Mar  1 18:01:50 2017 by root via crm_resource on host-076

3 nodes and 15 resources configured

Online: [ host-076 host-077 host-078 ]

Full list of resources:

 fence-host-076 (stonith:fence_xvm):    Started host-076
 fence-host-077 (stonith:fence_xvm):    Started host-077
 fence-host-078 (stonith:fence_xvm):    Started host-078
 Clone Set: dlm-clone [dlm]
     Started: [ host-076 host-077 host-078 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ host-076 host-077 host-078 ]
 Resource Group: HA_LVM1
     lvm1       (ocf::heartbeat:LVM):   Started host-078
     fs1        (ocf::heartbeat:Filesystem):    Started host-078
 Resource Group: HA_LVM2
     lvm2       (ocf::heartbeat:LVM):   Started host-076
     fs2        (ocf::heartbeat:Filesystem):    Started host-076
 Resource Group: HA_LVM3
     lvm3       (ocf::heartbeat:LVM):   Started host-078
     fs3        (ocf::heartbeat:Filesystem):    Started host-078

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


[root@host-076 ~]# reboot -fin



# On host-078

Mar  2 09:10:02 host-078 pengine[2667]: warning: Node host-076 will be fenced because the node is no longer part of the cluster
Mar  2 09:10:02 host-078 pengine[2667]: warning: Node host-076 is unclean
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action fence-host-076_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action dlm:1_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action dlm:1_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action clvmd:1_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action clvmd:1_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action lvm2_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Action fs2_stop_0 on host-076 is unrunnable (offline)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Scheduling Node host-076 for STONITH
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    fence-host-076#011(Started host-076 -> host-077)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    fence-host-077#011(Started host-077 -> host-078)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Stop    dlm:1#011(host-076)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Stop    clvmd:1#011(host-076)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    lvm1#011(Started host-078 -> host-077)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    fs1#011(Started host-078 -> host-077)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    lvm2#011(Started host-076 -> host-078)
Mar  2 09:10:02 host-078 pengine[2667]:  notice: Move    fs2#011(Started host-076 -> host-078)
Mar  2 09:10:02 host-078 pengine[2667]: warning: Calculated transition 489 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-2.bz2
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Initiating stop operation fence-host-077_stop_0 on host-077
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Initiating stop operation fs1_stop_0 locally on host-078
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Requesting fencing (reboot) of node host-076
Mar  2 09:10:02 host-078 stonith-ng[2664]:  notice: Client crmd.2668.730f2278 wants to fence (reboot) 'host-076' with device '(any)'
Mar  2 09:10:02 host-078 stonith-ng[2664]:  notice: Merging stonith action reboot for node host-076 originating from client crmd.2668.ab0ff2c2 with identical request from stonith-api.16516 (72s)
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Initiating start operation fence-host-076_start_0 on host-077
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Initiating start operation fence-host-077_start_0 locally on host-078
Mar  2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: Running stop for /dev/STSRHTS216251/ha on /mnt/HA_fs1
Mar  2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: Trying to unmount /mnt/HA_fs1
Mar  2 09:10:02 host-078 kernel: XFS (dm-6): Unmounting Filesystem
Mar  2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: unmounted /mnt/HA_fs1 successfully
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Result of stop operation for fs1 on host-078: 0 (ok)
Mar  2 09:10:02 host-078 crmd[2668]:  notice: Initiating stop operation lvm1_stop_0 locally on host-078
Mar  2 09:10:05 host-078 stonith-ng[2664]:  notice: Operation 'reboot' [4051] (call 2 from stonith-api.16516) for host 'host-076' with device 'fence-host-076' returned: 0 (OK)
Mar  2 09:10:05 host-078 stonith-ng[2664]:  notice: Operation reboot of host-076 by host-078 for stonith-api.16516: OK
Mar  2 09:10:05 host-078 crmd[2668]:  notice: Peer host-076 was terminated (reboot) by host-078 for host-077: OK (ref=2d735f7b-c28a-4aea-9ac1-50ad64b337dd) by client stonith-api.16516
Mar  2 09:10:05 host-078 crmd[2668]:  notice: Transition aborted: External Fencing Operation
Mar  2 09:10:05 host-078 stonith-ng[2664]:  notice: Operation reboot of host-076 by host-078 for crmd.2668: OK
Mar  2 09:10:05 host-078 crmd[2668]:  notice: Stonith operation 2/67:489:0:45dc2b27-a4c4-4034-8f4d-5747c7e96c6d: OK (0)
Mar  2 09:10:06 host-078 dlm_controld[3054]: 63054 fence status 1 receive 0 from 2 walltime 1488467406 local 63054
Mar  2 09:10:06 host-078 crmd[2668]:  notice: Result of start operation for fence-host-077 on host-078: 0 (ok)
Mar  2 09:10:06 host-078 crmd[2668]:  notice: Peer host-076 was terminated (reboot) by host-078 for host-078: OK (ref=ab0ff2c2-2227-46e9-bd7b-0fedf5fc5224) by client crmd.2668
Mar  2 09:10:06 host-078 kernel: clvmd[4089]: segfault at f29 ip 00007f4c2fcf2d88 sp 00007ffefae74170 error 4 in clvmd[7f4c2fcdc000+136000]




[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/clvmd -T90 -d0'.
Program terminated with signal 6, Aborted.
#0  0x00007f4c2e2471d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f4c2e2471d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f4c2e2488c8 in __GI_abort () at abort.c:90
#2  0x00007f4c2e286f07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f4c2e391b48 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f4c2e28e503 in malloc_printerr (ar_ptr=0x7f4c2e5cc760 <main_arena>, ptr=<optimized out>, str=0x7f4c2e391c48 "free(): corrupted unsorted chunks", action=3) at malloc.c:5013
#4  _int_free (av=0x7f4c2e5cc760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3835
#5  0x00007f4c2fcf5909 in process_work_item (cmd=0x7f4c30342cc0) at clvmd.c:2007
#6  lvm_thread_fn (arg=<optimized out>) at clvmd.c:2058
#7  0x00007f4c2e9e5dc5 in start_thread (arg=0x7f4c2fb41700) at pthread_create.c:308
#8  0x00007f4c2e30973d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113




Version-Release number of selected component (if applicable):
3.10.0-514.6.2.el7.x86_64

lvm2-2.02.166-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
lvm2-libs-2.02.166-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
lvm2-cluster-2.02.166-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
device-mapper-1.02.135-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
device-mapper-libs-1.02.135-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
device-mapper-event-1.02.135-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
device-mapper-event-libs-1.02.135-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 05:29:13 CDT 2016
cmirror-2.02.166-1.el7_3.2    BUILT: Wed Nov 16 04:11:32 CST 2016

Comment 2 Corey Marthaler 2017-03-02 16:36:20 UTC

Created attachment 1259201 [details]
log from host-077

Comment 3 Corey Marthaler 2017-03-02 16:37:33 UTC

Created attachment 1259202 [details]
log from host-078

Comment 4 Corey Marthaler 2017-03-02 16:56:59 UTC

Now that the cluster is back up...

Storage:
 Resource Group: HA_LVM1
     lvm1       (ocf::heartbeat:LVM):   Started host-076
     fs1        (ocf::heartbeat:Filesystem):    Started host-076
 Resource Group: HA_LVM2
     lvm2       (ocf::heartbeat:LVM):   Started host-077
     fs2        (ocf::heartbeat:Filesystem):    Started host-077
 Resource Group: HA_LVM3
     lvm3       (ocf::heartbeat:LVM):   Started host-078
     fs3        (ocf::heartbeat:Filesystem):    Started host-078

[root@host-076 ~]# lvs -a -o +devices
  LV            VG            Attr       LSize   Log       Cpy%Sync Devices
  ha            STSRHTS216251 Mwi-aom---   8.00g [ha_mlog] 100.00   ha_mimage_0(0),ha_mimage_1(0)
  [ha_mimage_0] STSRHTS216251 iwi-aom---   8.00g                    /dev/sdg2(0)
  [ha_mimage_1] STSRHTS216251 iwi-aom---   8.00g                    /dev/sdg1(0)
  [ha_mlog]     STSRHTS216251 lwi-aom---   4.00m                    /dev/sdc2(0)
  ha            STSRHTS216252 Mwi---m---   8.00g [ha_mlog]          ha_mimage_0(0),ha_mimage_1(0)
  [ha_mimage_0] STSRHTS216252 Iwi---m---   8.00g                    /dev/sdc1(0)
  [ha_mimage_1] STSRHTS216252 Iwi---m---   8.00g                    /dev/sdf2(0)
  [ha_mlog]     STSRHTS216252 lwi---m---   4.00m                    /dev/sdf1(0)
  ha            STSRHTS216253 Mwi---m---   8.00g [ha_mlog]          ha_mimage_0(0),ha_mimage_1(0)
  [ha_mimage_0] STSRHTS216253 Iwi---m---   8.00g                    /dev/sdd2(0)
  [ha_mimage_1] STSRHTS216253 Iwi---m---   8.00g                    /dev/sdd1(0)



Config:

[root@host-076 ~]# pcs config
Cluster Name: STSRHTS21625
Corosync Nodes:
 host-076 host-077 host-078
Pacemaker Nodes:
 host-076 host-077 host-078

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
               monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Attributes: with_cmirrord=1
   Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
               stop interval=0s timeout=90 (clvmd-stop-interval-0s)
               monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
 Group: HA_LVM1
  Resource: lvm1 (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=STSRHTS216251 exclusive=true
   Operations: start interval=0s timeout=30 (lvm1-start-interval-0s)
               stop interval=0s timeout=30 (lvm1-stop-interval-0s)
               monitor interval=10 timeout=30 (lvm1-monitor-interval-10)
  Resource: fs1 (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/STSRHTS216251/ha directory=/mnt/HA_fs1 fstype=xfs options=noatime
   Operations: start interval=0s timeout=60 (fs1-start-interval-0s)
               stop interval=0s timeout=60 (fs1-stop-interval-0s)
               monitor interval=10s on-fail=fence (fs1-monitor-interval-10s)
 Group: HA_LVM2
  Resource: lvm2 (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=STSRHTS216252 exclusive=true
   Operations: start interval=0s timeout=30 (lvm2-start-interval-0s)
               stop interval=0s timeout=30 (lvm2-stop-interval-0s)
               monitor interval=10 timeout=30 (lvm2-monitor-interval-10)
  Resource: fs2 (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/STSRHTS216252/ha directory=/mnt/HA_fs2 fstype=xfs options=noatime
   Operations: start interval=0s timeout=60 (fs2-start-interval-0s)
               stop interval=0s timeout=60 (fs2-stop-interval-0s)
               monitor interval=10s on-fail=fence (fs2-monitor-interval-10s)
 Group: HA_LVM3
  Resource: lvm3 (class=ocf provider=heartbeat type=LVM)
   Attributes: volgrpname=STSRHTS216253 exclusive=true
   Operations: start interval=0s timeout=30 (lvm3-start-interval-0s)
               stop interval=0s timeout=30 (lvm3-stop-interval-0s)
               monitor interval=10 timeout=30 (lvm3-monitor-interval-10)
  Resource: fs3 (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/STSRHTS216253/ha directory=/mnt/HA_fs3 fstype=xfs options=noatime
   Operations: start interval=0s timeout=60 (fs3-start-interval-0s)
               stop interval=0s timeout=60 (fs3-stop-interval-0s)
               monitor interval=10s on-fail=fence (fs3-monitor-interval-10s)

Stonith Devices:
 Resource: fence-host-076 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-076 pcmk_host_map=host-076:host-076.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-076-monitor-interval-60s)
 Resource: fence-host-077 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-077 pcmk_host_map=host-077:host-077.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-077-monitor-interval-60s)
 Resource: fence-host-078 (class=stonith type=fence_xvm)
  Attributes: pcmk_host_check=static-list pcmk_host_list=host-078 pcmk_host_map=host-078:host-078.virt.lab.msp.redhat.com
  Operations: monitor interval=60s (fence-host-078-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: lvm3
    Enabled on: host-078 (score:INFINITY) (role: Started) (id:cli-prefer-lvm3)
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
  start clvmd-clone then start HA_LVM1 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM1-mandatory)
  start clvmd-clone then start HA_LVM2 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM2-mandatory)
  start clvmd-clone then start HA_LVM3 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM3-mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: STSRHTS21625
 dc-version: 1.1.15-11.el7_3.2-e174ec8
 have-watchdog: false
 no-quorum-policy: freeze

Quorum:
  Options:

Comment 5 Corey Marthaler 2017-03-02 20:23:23 UTC

This is reproducible and doesn't require an HA mirror volume.


Mar  2 14:09:56 host-078 pengine[23581]: warning: Node host-076 will be fenced because the node is no longer part of the cluster
Mar  2 14:09:56 host-078 pengine[23581]: warning: Node host-076 is unclean
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action fence-host-076_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action dlm:2_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action dlm:2_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action clvmd:2_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action clvmd:2_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action lvm1_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Action fs1_stop_0 on host-076 is unrunnable (offline)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Scheduling Node host-076 for STONITH
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    fence-host-076#011(Started host-076 -> host-077)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    fence-host-077#011(Started host-077 -> host-078)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Stop    dlm:2#011(host-076)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Stop    clvmd:2#011(host-076)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    lvm1#011(Started host-076 -> host-077)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    fs1#011(Started host-076 -> host-077)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    lvm2#011(Started host-077 -> host-078)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    fs2#011(Started host-077 -> host-078)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    lvm3#011(Started host-078 -> host-077)
Mar  2 14:09:56 host-078 pengine[23581]:  notice: Move    fs3#011(Started host-078 -> host-077)
Mar  2 14:09:56 host-078 pengine[23581]: warning: Calculated transition 71 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-6.bz2
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating stop operation fence-host-077_stop_0 on host-077
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating stop operation fs2_stop_0 on host-077
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating stop operation fs3_stop_0 locally on host-078
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Requesting fencing (reboot) of node host-076
Mar  2 14:09:56 host-078 stonith-ng[23578]:  notice: Client crmd.23582.39d9e3da wants to fence (reboot) 'host-076' with device '(any)'
Mar  2 14:09:56 host-078 stonith-ng[23578]:  notice: Merging stonith action reboot for node host-076 originating from client crmd.23582.5ebb69de with identical request from stonith-api.20939 (72s)
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating start operation fence-host-076_start_0 on host-077
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating start operation fence-host-077_start_0 locally on host-078
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating stop operation lvm2_stop_0 on host-077
Mar  2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: Running stop for /dev/STSRHTS216253/ha on /mnt/ha_fs3
Mar  2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: Trying to unmount /mnt/ha_fs3
Mar  2 14:09:56 host-078 kernel: XFS (dm-8): Unmounting Filesystem
Mar  2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: unmounted /mnt/ha_fs3 successfully
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Result of stop operation for fs3 on host-078: 0 (ok)
Mar  2 14:09:56 host-078 crmd[23582]:  notice: Initiating stop operation lvm3_stop_0 locally on host-078
Mar  2 14:09:58 host-078 stonith-ng[23578]:  notice: Operation reboot of host-076 by host-077 for stonith-api.20939: OK
Mar  2 14:09:58 host-078 stonith-ng[23578]:  notice: Operation reboot of host-076 by host-077 for crmd.23582: OK
Mar  2 14:09:59 host-078 crmd[23582]:  notice: Result of start operation for fence-host-077 on host-078: 0 (ok)
Mar  2 14:09:59 host-078 crmd[23582]:  notice: Peer host-076 was terminated (reboot) by host-077 for host-077: OK (ref=c5f1ad20-1f4a-41ad-b61a-de109d88669e) by client stonith-api.20939
Mar  2 14:09:59 host-078 crmd[23582]:  notice: Transition aborted: External Fencing Operation
Mar  2 14:09:59 host-078 crmd[23582]:  notice: Stonith operation 4/69:71:0:8af5282b-96ba-45da-8e1e-23e9a3002c01: OK (0)
Mar  2 14:09:59 host-078 crmd[23582]:  notice: Peer host-076 was terminated (reboot) by host-077 for host-078: OK (ref=5ebb69de-0af9-42cd-bde9-568d1f25dcd0) by client crmd.23582
Mar  2 14:09:59 host-078 dlm_controld[23739]: 17929 fence status 1 receive 0 from 2 walltime 1488485399 local 17929
Mar  2 14:09:59 host-078 kernel: clvmd[23906]: segfault at 1389 ip 00007f1454cd6d88 sp 00007ffc2761a030 error 4 in clvmd[7f1454cc0000+136000]



Core was generated by `/usr/sbin/clvmd -T90 -d0'.
Program terminated with signal 6, Aborted.
#0  0x00007f145322b1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f145322b1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f145322c8c8 in __GI_abort () at abort.c:90
#2  0x00007f145326af07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f1453375b48 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f1453272503 in malloc_printerr (ar_ptr=0x7f14535b0760 <main_arena>, ptr=<optimized out>, str=0x7f1453375c48 "free(): corrupted unsorted chunks", action=3) at malloc.c:5013
#4  _int_free (av=0x7f14535b0760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3835
#5  0x00007f1454cd9909 in process_work_item (cmd=0x7f1455738160) at clvmd.c:2007
#6  lvm_thread_fn (arg=<optimized out>) at clvmd.c:2058
#7  0x00007f14539c9dc5 in start_thread (arg=0x7f1454b25700) at pthread_create.c:308
#8  0x00007f14532ed73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 6 David Teigland 2017-03-08 16:58:19 UTC

I have seen things like this when lvm is compiled with debugging, but I'm not sure how to suggest you check for that.

Comment 7 David Teigland 2017-04-04 21:27:49 UTC

I think that the latest version should show the configure line (including debug) from 'lvm version', so you can check if that's enabled.

Comment 9 David Teigland 2017-05-09 20:24:41 UTC

I've saved a core file that corey reproduced. I did a little browsing in gdb on the core, and I think bad data is being passed to add_to_lvmqueue() from this loop which I can't understand:

for (thisfd = &local_client_head; thisfd; thisfd = nextfd, nextfd = thisfd ? thisfd->next : NULL) {
    /* frees certain entries, passes some entries to add_to_lvmqueue() */
}

That for loop appeared in commit be662439331abf6ccb16dd996bfe15eb613b4e53, so I'm hoping agk can shed some light on what might be wrong.


(gdb) info threads
  Id   Target Id         Frame 
  4    Thread 0x7f3ef46ce700 (LWP 7569) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  3    Thread 0x7f3eefe1c700 (LWP 5456) 0x00007f3ef35c970d in read () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x7f3ef4710700 (LWP 5457) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1    Thread 0x7f3ef487c880 (LWP 5450) 0x00007f3ef2e6a58d in _int_malloc (av=av@entry=0x7f3ef31a9760 <main_arena>, bytes=bytes@entry=48) at malloc.c:3571
(gdb) bt
#0  0x00007f3ef2e6a58d in _int_malloc (av=av@entry=0x7f3ef31a9760 <main_arena>, bytes=bytes@entry=48) at malloc.c:3571
#1  0x00007f3ef2e6d10c in __GI___libc_malloc (bytes=bytes@entry=48) at malloc.c:2897
#2  0x00007f3ef3e1fc6e in dm_malloc_aux (s=s@entry=48, file=file@entry=0x55b236ceba1d "clvmd.c", line=line@entry=2086) at mm/dbg_malloc.c:286
#3  0x00007f3ef3e1fce5 in dm_malloc_wrapper (s=s@entry=48, file=file@entry=0x55b236ceba1d "clvmd.c", line=line@entry=2086) at mm/dbg_malloc.c:340
#4  0x000055b236c242ff in add_to_lvmqueue (client=client@entry=0x7f3ef31a97b8 <main_arena+88>, msg=msg@entry=0x0, msglen=msglen@entry=0, csid=csid@entry=0x0) at clvmd.c:2086
#5  0x000055b236c21274 in main_loop (cmd_timeout=60) at clvmd.c:886
#6  main (argc=<optimized out>, argv=<optimized out>) at clvmd.c:627
(gdb) frame 4
#4  0x000055b236c242ff in add_to_lvmqueue (client=client@entry=0x7f3ef31a97b8 <main_arena+88>, msg=msg@entry=0x0, msglen=msglen@entry=0, csid=csid@entry=0x0) at clvmd.c:2086
2086		if (!(cmd = dm_malloc(sizeof(*cmd))))
(gdb) frame 5
#5  0x000055b236c21274 in main_loop (cmd_timeout=60) at clvmd.c:886
886					add_to_lvmqueue(free_fd, NULL, 0, NULL);
(gdb) p free_fd
$1 = (struct local_client *) 0x7f3ef31a97b8 <main_arena+88>
(gdb) p thisfd
$2 = (struct local_client *) 0x7f3ef31a97b8 <main_arena+88>
(gdb) p lastfd
$3 = <optimized out>
(gdb) p nextfd
$4 = (struct local_client *) 0x55b238cf9b50
(gdb) p *free_fd
$5 = {fd = 953138016, type = 21938, next = 0x55b238cf9b50, xid = 36240, callback = 0x7f3ef31a9df8 <main_arena+1688>, removeme = 200 '\310', bits = {localsock = {
      replies = 0x7f3ef31a97c8 <main_arena+104>, num_replies = -216361000, expected_replies = 32574, sent_time = 139908343306200, in_progress = -216360984, sent_out = 32574, 
      private = 0x7f3ef31a97e8 <main_arena+136>, cmd = 0x7f3ef31a97f8 <main_arena+152>, cmd_len = -216360968, pipe = 32574, finished = 953128960, all_success = 21938, 
      cleanup_needed = 953128960, pipe_client = 0x7f3ef31a9818 <main_arena+184>, threadid = 139908343306264, state = (unknown: 4078606376), mutex = {__data = {
          __lock = -216360920, __count = 32574, __owner = -216360904, __nusers = 32574, __kind = -216360904, __spins = 32574, __elision = 0, __list = {
            __prev = 0x7f3ef31a9848 <main_arena+232>, __next = 0x7f3ef31a9848 <main_arena+232>}}, 
        __size = "(\230\032\363>\177\000\000\070\230\032\363>\177\000\000\070\230\032\363>\177\000\000H\230\032\363>\177\000\000H\230\032\363>\177\000", 
        __align = 139908343306280}, cond = {__data = {__lock = -216360872, __futex = 32574, __total_seq = 139908343306328, __wakeup_seq = 139908343306344, 
          __woken_seq = 139908343306344, __mutex = 0x7f3ef31a9878 <main_arena+280>, __nwaiters = 4078606456, __broadcast_seq = 32574}, 
        __size = "X\230\032\363>\177\000\000X\230\032\363>\177\000\000h\230\032\363>\177\000\000h\230\032\363>\177\000\000x\230\032\363>\177\000\000x\230\032\363>\177\000", 
        __align = 139908343306328}}, pipe = {client = 0x7f3ef31a97c8 <main_arena+104>, threadid = 139908343306200}, net = {private = 0x7f3ef31a97c8 <main_arena+104>, 
      flags = -216361000}}}
(gdb) p *nextfd
$6 = {fd = 0, type = CLUSTER_MAIN_SOCK, next = 0x14b1, xid = 38840, callback = 0x7f3ef31a9df8 <main_arena+1688>, removeme = 0 '\000', bits = {localsock = {replies = 0x0, 
      num_replies = 0, expected_replies = 0, sent_time = 0, in_progress = 0, sent_out = 0, private = 0x0, cmd = 0x0, cmd_len = 0, pipe = 0, finished = 0, all_success = 0, 
      cleanup_needed = 0, pipe_client = 0x0, threadid = 0, state = PRE_COMMAND, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
          __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, 
          __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, pipe = {client = 0x0, 
      threadid = 0}, net = {private = 0x0, flags = 0}}}
(gdb)

Comment 11 David Teigland 2017-05-10 20:10:41 UTC

The management of this local_client_head list looks completely bogus, and I'm not sure how it's working at all.  The list iteration itself looks broken enough, but it turns out that cleanup_zombies() called within that list iteration is itself also iterating through the list and modifying it.  I don't know what any of this stuff is actually doing, I'm just looking at the mechanics of the code, and I think it's a miracle anything related to this list works at all.

Comment 12 David Teigland 2017-05-12 15:12:43 UTC

I don't know what this code is doing, but it looks highly suspect, so reassigning to agk who added this loop fairly recently.  It looks like add_to_lvmqueue() is segfaulting because it's passed a garbage (freed) pointer, which isn't surprising based on what's happening to this list.


for (thisfd = &local_client_head; thisfd; thisfd = nextfd, nextfd = thisfd ? thisfd->next : NULL) {


    /* in cleanup_zombie... */
    for (lastfd = &local_client_head; (delfd = lastfd->next); lastfd = delfd) {
        if (thisfd->bits.localsock.pipe_client == delfd) {
             thisfd->bits.localsock.pipe_client = NULL;
             lastfd->next = delfd->next;
             dm_free(delfd);
             break;
        }
    }

    free_fd = thisfd;
    add_to_lvmqueue(free_fd, NULL, 0, NULL);

    ... more stuff with list entries
}

Comment 13 Alasdair Kergon 2017-05-18 18:46:53 UTC

Well the code dct identifies hasn't changed since July 2015, so could this be a new test that wasn't run on previous versions, or are we looking for a regression or change in some other piece of code with this as a side-effect?

If there's nested list walking, that was there before 2015 too, but perhaps the difference is that 'break' changed to 'continue'.

Comment 14 Alasdair Kergon 2017-05-18 22:42:39 UTC

I'm not seeing the logs yet.  (I adjusted a lot of debug messages 2 years ago to try to make it clear what the code was doing.)  Could you attach the clvmd debug logs, so that I can see how to reproduce it?

Comment 15 Corey Marthaler 2017-05-26 21:08:38 UTC

Created attachment 1282766 [details]
strace of clvmd -d1 -f -C on host-113

Finally got clvmd to segfault while also in debugging mode. Must be a timing issue, as it doesn't happen as easily. I was unable to get it to happen with just one machine going down. In this case, host-113 was the only cluster node remaining after host-115, and then later host-114 were taken down.



[root@host-113 ~]# clvmd -d1 -f -C > /tmp/clvmd.log 2>&1


*** buffer overflow detected ***: clvmd terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f92f72a5d87]
/lib64/libc.so.6(+0x10df40)[0x7f92f72a3f40]
/lib64/libc.so.6(+0x10fcf7)[0x7f92f72a5cf7]
clvmd(+0x1720b)[0x5604c8ad920b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f92f71b7c05]
clvmd(+0x17ff0)[0x5604c8ad9ff0]
======= Memory map: ========
5604c8ac2000-5604c8c04000 r-xp 00000000 fd:00 9082764                    /usr/sbin/clvmd
5604c8e04000-5604c8e0d000 r--p 00142000 fd:00 9082764                    /usr/sbin/clvmd
5604c8e0d000-5604c8e14000 rw-p 0014b000 fd:00 9082764                    /usr/sbin/clvmd
[...]

Comment 16 Corey Marthaler 2017-05-26 21:11:53 UTC

Created attachment 1282767 [details]
debug logging of clvmd on host-114

This node was down and clvmd was not running when the issue occurred on host-113.

Comment 17 Corey Marthaler 2017-05-26 21:13:39 UTC

Created attachment 1282768 [details]
debug logging of clvmd on host-115

This node was down and clvmd was not running when the issue occurred on host-113.

Comment 20 Alasdair Kergon 2017-06-22 00:34:04 UTC

1) There's open-coded non-standard list handling in this code. We should use or define something standard instead. (Special properties of first list entry assumed, list walking done in different ways in different places etc.)

2) The way the variables are used and named isn't intuitive and has to be worked out before you can understand the code. We should rename/add comments explaining it all better.

3) The original fix for the deadlock back in 2013 accidentally introduced this nested list editing. (25bed9968155f43ef0b8832383ef711e7ae1685d). Most of the time it gets away with it, but if things happen in unusual sequences, you can see list corruption causing (I suspect) a variety of different failure modes.

4) We don't have a core dump for the most recent failure with debugging enabled, but the information suggests it is different from the one analysed higher up the bug.

5) Even better debug messages are needed - some existing messages are still missing key context information, the client_count output is miscalculated, but there's just about enough to understand the likely sequence of events and see how it can lead to list corruption.

6) SIGUSR2 doesn't do very much, and sometimes it has to be sent several times before it has the desired effect of abandoning the processing thread. Processing goes into wait states multiple times and continues writing out its response (without error) even after the process knows the client has gone away making the whole thing pointless.

(and that's not all the problems)

Comment 21 Alasdair Kergon 2017-06-30 00:03:58 UTC

On the testing side, it is very easy to obtain a variety of different crash symptoms from the list corruption just by starting concurrent basic lvm commands (such as 'lvs') that talk to clvmd then killing them before they finish ('killall -9 lvs') and starting new commands.  Instrumenting the code shows it does indeed sometimes continue for a little while after the initial corruption before actually crashing.

Comment 22 Alasdair Kergon 2017-07-03 12:27:55 UTC

Improvements to logging and fixes to the list handling:

https://www.redhat.com/archives/lvm-devel/2017-July/msg00000.html
https://www.redhat.com/archives/lvm-devel/2017-July/msg00001.html
https://www.redhat.com/archives/lvm-devel/2017-July/msg00002.html

https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=17ed25409136162d091809d412596a45e0a30fd4
https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=af789fd6d0cba45cc4897772bae3ca73a3f15b38
https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=f2eda36cfa9baf3f7e132dbc3a3f34c908cb36d4

Comment 26 Corey Marthaler 2018-01-18 14:07:28 UTC

Marking verified in the latest rpms.


3.10.0-826.el7.x86_64

lvm2-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-libs-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-cluster-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-lockd-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
lvm2-python-boom-0.8.1-5.el7    BUILT: Wed Dec  6 04:15:40 CST 2017
cmirror-2.02.176-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-libs-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-event-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-event-libs-1.02.145-5.el7    BUILT: Wed Dec  6 04:13:07 CST 2017
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 05:07:18 CST 2017



No longer able to reproduce in 50+ iterations of node recovery.

Cluster name: MCKINLEY
Stack: corosync
Current DC: mckinley-02 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Jan 17 17:21:34 2018
Last change: Tue Jan 16 14:11:06 2018 by root via cibadmin on mckinley-01

4 nodes configured
9 resources configured

Online: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]

Full list of resources:

 mckinley-apc   (stonith:fence_apc):    Started mckinley-02
 Clone Set: dlm-clone [dlm]
     Started: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]

/dev/mapper/MCKINLEY-MCKINLEY0 on /mnt/MCKINLEY0 type gfs2 (rw,relatime,seclabel)
/dev/mapper/MCKINLEY-MCKINLEY1 on /mnt/MCKINLEY1 type gfs2 (rw,relatime,seclabel)

[root@mckinley-01 ~]# lvs  -o +segtype
  LV        VG               Attr       LSize    Pool Origin Data%  Meta%  Move Log              Cpy%Sync Convert Type  
  MCKINLEY0 MCKINLEY         Mwi-aom---  250.00g                                [MCKINLEY0_mlog] 100.00           mirror
  MCKINLEY1 MCKINLEY         Mwi-aom---  250.00g                                [MCKINLEY1_mlog] 100.00           mirror

Comment 29 errata-xmlrpc 2018-04-10 15:18:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853