Bug 1428444
| Summary: | clvmd segfault during HA cluster recovery | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> | ||||||||||||
| Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||||||||||
| lvm2 sub component: | Clustering / clvmd | QA Contact: | cluster-qe <cluster-qe> | ||||||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||||||
| Severity: | high | ||||||||||||||
| Priority: | unspecified | CC: | agk, cmarthal, heinzm, jbrassow, lmiksik, mcsontos, msnitzer, prajnoha, prockai, rbednar, zkabelac | ||||||||||||
| Version: | 7.3 | ||||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | lvm2-2.02.175-1.el7 | Doc Type: | If docs needed, set a value | ||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2018-04-10 15:18:32 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | |||||||||||||||
| Bug Blocks: | 1469559 | ||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 1259201 [details]
log from host-077
Created attachment 1259202 [details]
log from host-078
Now that the cluster is back up...
Storage:
Resource Group: HA_LVM1
lvm1 (ocf::heartbeat:LVM): Started host-076
fs1 (ocf::heartbeat:Filesystem): Started host-076
Resource Group: HA_LVM2
lvm2 (ocf::heartbeat:LVM): Started host-077
fs2 (ocf::heartbeat:Filesystem): Started host-077
Resource Group: HA_LVM3
lvm3 (ocf::heartbeat:LVM): Started host-078
fs3 (ocf::heartbeat:Filesystem): Started host-078
[root@host-076 ~]# lvs -a -o +devices
LV VG Attr LSize Log Cpy%Sync Devices
ha STSRHTS216251 Mwi-aom--- 8.00g [ha_mlog] 100.00 ha_mimage_0(0),ha_mimage_1(0)
[ha_mimage_0] STSRHTS216251 iwi-aom--- 8.00g /dev/sdg2(0)
[ha_mimage_1] STSRHTS216251 iwi-aom--- 8.00g /dev/sdg1(0)
[ha_mlog] STSRHTS216251 lwi-aom--- 4.00m /dev/sdc2(0)
ha STSRHTS216252 Mwi---m--- 8.00g [ha_mlog] ha_mimage_0(0),ha_mimage_1(0)
[ha_mimage_0] STSRHTS216252 Iwi---m--- 8.00g /dev/sdc1(0)
[ha_mimage_1] STSRHTS216252 Iwi---m--- 8.00g /dev/sdf2(0)
[ha_mlog] STSRHTS216252 lwi---m--- 4.00m /dev/sdf1(0)
ha STSRHTS216253 Mwi---m--- 8.00g [ha_mlog] ha_mimage_0(0),ha_mimage_1(0)
[ha_mimage_0] STSRHTS216253 Iwi---m--- 8.00g /dev/sdd2(0)
[ha_mimage_1] STSRHTS216253 Iwi---m--- 8.00g /dev/sdd1(0)
Config:
[root@host-076 ~]# pcs config
Cluster Name: STSRHTS21625
Corosync Nodes:
host-076 host-077 host-078
Pacemaker Nodes:
host-076 host-077 host-078
Resources:
Clone: dlm-clone
Meta Attrs: interleave=true ordered=true
Resource: dlm (class=ocf provider=pacemaker type=controld)
Operations: start interval=0s timeout=90 (dlm-start-interval-0s)
stop interval=0s timeout=100 (dlm-stop-interval-0s)
monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
Clone: clvmd-clone
Meta Attrs: interleave=true ordered=true
Resource: clvmd (class=ocf provider=heartbeat type=clvm)
Attributes: with_cmirrord=1
Operations: start interval=0s timeout=90 (clvmd-start-interval-0s)
stop interval=0s timeout=90 (clvmd-stop-interval-0s)
monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
Group: HA_LVM1
Resource: lvm1 (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=STSRHTS216251 exclusive=true
Operations: start interval=0s timeout=30 (lvm1-start-interval-0s)
stop interval=0s timeout=30 (lvm1-stop-interval-0s)
monitor interval=10 timeout=30 (lvm1-monitor-interval-10)
Resource: fs1 (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/STSRHTS216251/ha directory=/mnt/HA_fs1 fstype=xfs options=noatime
Operations: start interval=0s timeout=60 (fs1-start-interval-0s)
stop interval=0s timeout=60 (fs1-stop-interval-0s)
monitor interval=10s on-fail=fence (fs1-monitor-interval-10s)
Group: HA_LVM2
Resource: lvm2 (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=STSRHTS216252 exclusive=true
Operations: start interval=0s timeout=30 (lvm2-start-interval-0s)
stop interval=0s timeout=30 (lvm2-stop-interval-0s)
monitor interval=10 timeout=30 (lvm2-monitor-interval-10)
Resource: fs2 (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/STSRHTS216252/ha directory=/mnt/HA_fs2 fstype=xfs options=noatime
Operations: start interval=0s timeout=60 (fs2-start-interval-0s)
stop interval=0s timeout=60 (fs2-stop-interval-0s)
monitor interval=10s on-fail=fence (fs2-monitor-interval-10s)
Group: HA_LVM3
Resource: lvm3 (class=ocf provider=heartbeat type=LVM)
Attributes: volgrpname=STSRHTS216253 exclusive=true
Operations: start interval=0s timeout=30 (lvm3-start-interval-0s)
stop interval=0s timeout=30 (lvm3-stop-interval-0s)
monitor interval=10 timeout=30 (lvm3-monitor-interval-10)
Resource: fs3 (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/STSRHTS216253/ha directory=/mnt/HA_fs3 fstype=xfs options=noatime
Operations: start interval=0s timeout=60 (fs3-start-interval-0s)
stop interval=0s timeout=60 (fs3-stop-interval-0s)
monitor interval=10s on-fail=fence (fs3-monitor-interval-10s)
Stonith Devices:
Resource: fence-host-076 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=host-076 pcmk_host_map=host-076:host-076.virt.lab.msp.redhat.com
Operations: monitor interval=60s (fence-host-076-monitor-interval-60s)
Resource: fence-host-077 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=host-077 pcmk_host_map=host-077:host-077.virt.lab.msp.redhat.com
Operations: monitor interval=60s (fence-host-077-monitor-interval-60s)
Resource: fence-host-078 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=host-078 pcmk_host_map=host-078:host-078.virt.lab.msp.redhat.com
Operations: monitor interval=60s (fence-host-078-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: lvm3
Enabled on: host-078 (score:INFINITY) (role: Started) (id:cli-prefer-lvm3)
Ordering Constraints:
start dlm-clone then start clvmd-clone (kind:Mandatory) (id:order-dlm-clone-clvmd-clone-mandatory)
start clvmd-clone then start HA_LVM1 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM1-mandatory)
start clvmd-clone then start HA_LVM2 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM2-mandatory)
start clvmd-clone then start HA_LVM3 (kind:Mandatory) (id:order-clvmd-clone-HA_LVM3-mandatory)
Colocation Constraints:
clvmd-clone with dlm-clone (score:INFINITY) (id:colocation-clvmd-clone-dlm-clone-INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS21625
dc-version: 1.1.15-11.el7_3.2-e174ec8
have-watchdog: false
no-quorum-policy: freeze
Quorum:
Options:
This is reproducible and doesn't require an HA mirror volume. Mar 2 14:09:56 host-078 pengine[23581]: warning: Node host-076 will be fenced because the node is no longer part of the cluster Mar 2 14:09:56 host-078 pengine[23581]: warning: Node host-076 is unclean Mar 2 14:09:56 host-078 pengine[23581]: warning: Action fence-host-076_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action dlm:2_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action dlm:2_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action clvmd:2_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action clvmd:2_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action lvm1_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Action fs1_stop_0 on host-076 is unrunnable (offline) Mar 2 14:09:56 host-078 pengine[23581]: warning: Scheduling Node host-076 for STONITH Mar 2 14:09:56 host-078 pengine[23581]: notice: Move fence-host-076#011(Started host-076 -> host-077) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move fence-host-077#011(Started host-077 -> host-078) Mar 2 14:09:56 host-078 pengine[23581]: notice: Stop dlm:2#011(host-076) Mar 2 14:09:56 host-078 pengine[23581]: notice: Stop clvmd:2#011(host-076) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move lvm1#011(Started host-076 -> host-077) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move fs1#011(Started host-076 -> host-077) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move lvm2#011(Started host-077 -> host-078) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move fs2#011(Started host-077 -> host-078) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move lvm3#011(Started host-078 -> host-077) Mar 2 14:09:56 host-078 pengine[23581]: notice: Move fs3#011(Started host-078 -> host-077) Mar 2 14:09:56 host-078 pengine[23581]: warning: Calculated transition 71 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-6.bz2 Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating stop operation fence-host-077_stop_0 on host-077 Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating stop operation fs2_stop_0 on host-077 Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating stop operation fs3_stop_0 locally on host-078 Mar 2 14:09:56 host-078 crmd[23582]: notice: Requesting fencing (reboot) of node host-076 Mar 2 14:09:56 host-078 stonith-ng[23578]: notice: Client crmd.23582.39d9e3da wants to fence (reboot) 'host-076' with device '(any)' Mar 2 14:09:56 host-078 stonith-ng[23578]: notice: Merging stonith action reboot for node host-076 originating from client crmd.23582.5ebb69de with identical request from stonith-api.20939 (72s) Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating start operation fence-host-076_start_0 on host-077 Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating start operation fence-host-077_start_0 locally on host-078 Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating stop operation lvm2_stop_0 on host-077 Mar 2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: Running stop for /dev/STSRHTS216253/ha on /mnt/ha_fs3 Mar 2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: Trying to unmount /mnt/ha_fs3 Mar 2 14:09:56 host-078 kernel: XFS (dm-8): Unmounting Filesystem Mar 2 14:09:56 host-078 Filesystem(fs3)[16988]: INFO: unmounted /mnt/ha_fs3 successfully Mar 2 14:09:56 host-078 crmd[23582]: notice: Result of stop operation for fs3 on host-078: 0 (ok) Mar 2 14:09:56 host-078 crmd[23582]: notice: Initiating stop operation lvm3_stop_0 locally on host-078 Mar 2 14:09:58 host-078 stonith-ng[23578]: notice: Operation reboot of host-076 by host-077 for stonith-api.20939: OK Mar 2 14:09:58 host-078 stonith-ng[23578]: notice: Operation reboot of host-076 by host-077 for crmd.23582: OK Mar 2 14:09:59 host-078 crmd[23582]: notice: Result of start operation for fence-host-077 on host-078: 0 (ok) Mar 2 14:09:59 host-078 crmd[23582]: notice: Peer host-076 was terminated (reboot) by host-077 for host-077: OK (ref=c5f1ad20-1f4a-41ad-b61a-de109d88669e) by client stonith-api.20939 Mar 2 14:09:59 host-078 crmd[23582]: notice: Transition aborted: External Fencing Operation Mar 2 14:09:59 host-078 crmd[23582]: notice: Stonith operation 4/69:71:0:8af5282b-96ba-45da-8e1e-23e9a3002c01: OK (0) Mar 2 14:09:59 host-078 crmd[23582]: notice: Peer host-076 was terminated (reboot) by host-077 for host-078: OK (ref=5ebb69de-0af9-42cd-bde9-568d1f25dcd0) by client crmd.23582 Mar 2 14:09:59 host-078 dlm_controld[23739]: 17929 fence status 1 receive 0 from 2 walltime 1488485399 local 17929 Mar 2 14:09:59 host-078 kernel: clvmd[23906]: segfault at 1389 ip 00007f1454cd6d88 sp 00007ffc2761a030 error 4 in clvmd[7f1454cc0000+136000] Core was generated by `/usr/sbin/clvmd -T90 -d0'. Program terminated with signal 6, Aborted. #0 0x00007f145322b1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f145322b1d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f145322c8c8 in __GI_abort () at abort.c:90 #2 0x00007f145326af07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f1453375b48 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007f1453272503 in malloc_printerr (ar_ptr=0x7f14535b0760 <main_arena>, ptr=<optimized out>, str=0x7f1453375c48 "free(): corrupted unsorted chunks", action=3) at malloc.c:5013 #4 _int_free (av=0x7f14535b0760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3835 #5 0x00007f1454cd9909 in process_work_item (cmd=0x7f1455738160) at clvmd.c:2007 #6 lvm_thread_fn (arg=<optimized out>) at clvmd.c:2058 #7 0x00007f14539c9dc5 in start_thread (arg=0x7f1454b25700) at pthread_create.c:308 #8 0x00007f14532ed73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 I have seen things like this when lvm is compiled with debugging, but I'm not sure how to suggest you check for that. I think that the latest version should show the configure line (including debug) from 'lvm version', so you can check if that's enabled. I've saved a core file that corey reproduced. I did a little browsing in gdb on the core, and I think bad data is being passed to add_to_lvmqueue() from this loop which I can't understand:
for (thisfd = &local_client_head; thisfd; thisfd = nextfd, nextfd = thisfd ? thisfd->next : NULL) {
/* frees certain entries, passes some entries to add_to_lvmqueue() */
}
That for loop appeared in commit be662439331abf6ccb16dd996bfe15eb613b4e53, so I'm hoping agk can shed some light on what might be wrong.
(gdb) info threads
Id Target Id Frame
4 Thread 0x7f3ef46ce700 (LWP 7569) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
3 Thread 0x7f3eefe1c700 (LWP 5456) 0x00007f3ef35c970d in read () at ../sysdeps/unix/syscall-template.S:81
2 Thread 0x7f3ef4710700 (LWP 5457) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1 Thread 0x7f3ef487c880 (LWP 5450) 0x00007f3ef2e6a58d in _int_malloc (av=av@entry=0x7f3ef31a9760 <main_arena>, bytes=bytes@entry=48) at malloc.c:3571
(gdb) bt
#0 0x00007f3ef2e6a58d in _int_malloc (av=av@entry=0x7f3ef31a9760 <main_arena>, bytes=bytes@entry=48) at malloc.c:3571
#1 0x00007f3ef2e6d10c in __GI___libc_malloc (bytes=bytes@entry=48) at malloc.c:2897
#2 0x00007f3ef3e1fc6e in dm_malloc_aux (s=s@entry=48, file=file@entry=0x55b236ceba1d "clvmd.c", line=line@entry=2086) at mm/dbg_malloc.c:286
#3 0x00007f3ef3e1fce5 in dm_malloc_wrapper (s=s@entry=48, file=file@entry=0x55b236ceba1d "clvmd.c", line=line@entry=2086) at mm/dbg_malloc.c:340
#4 0x000055b236c242ff in add_to_lvmqueue (client=client@entry=0x7f3ef31a97b8 <main_arena+88>, msg=msg@entry=0x0, msglen=msglen@entry=0, csid=csid@entry=0x0) at clvmd.c:2086
#5 0x000055b236c21274 in main_loop (cmd_timeout=60) at clvmd.c:886
#6 main (argc=<optimized out>, argv=<optimized out>) at clvmd.c:627
(gdb) frame 4
#4 0x000055b236c242ff in add_to_lvmqueue (client=client@entry=0x7f3ef31a97b8 <main_arena+88>, msg=msg@entry=0x0, msglen=msglen@entry=0, csid=csid@entry=0x0) at clvmd.c:2086
2086 if (!(cmd = dm_malloc(sizeof(*cmd))))
(gdb) frame 5
#5 0x000055b236c21274 in main_loop (cmd_timeout=60) at clvmd.c:886
886 add_to_lvmqueue(free_fd, NULL, 0, NULL);
(gdb) p free_fd
$1 = (struct local_client *) 0x7f3ef31a97b8 <main_arena+88>
(gdb) p thisfd
$2 = (struct local_client *) 0x7f3ef31a97b8 <main_arena+88>
(gdb) p lastfd
$3 = <optimized out>
(gdb) p nextfd
$4 = (struct local_client *) 0x55b238cf9b50
(gdb) p *free_fd
$5 = {fd = 953138016, type = 21938, next = 0x55b238cf9b50, xid = 36240, callback = 0x7f3ef31a9df8 <main_arena+1688>, removeme = 200 '\310', bits = {localsock = {
replies = 0x7f3ef31a97c8 <main_arena+104>, num_replies = -216361000, expected_replies = 32574, sent_time = 139908343306200, in_progress = -216360984, sent_out = 32574,
private = 0x7f3ef31a97e8 <main_arena+136>, cmd = 0x7f3ef31a97f8 <main_arena+152>, cmd_len = -216360968, pipe = 32574, finished = 953128960, all_success = 21938,
cleanup_needed = 953128960, pipe_client = 0x7f3ef31a9818 <main_arena+184>, threadid = 139908343306264, state = (unknown: 4078606376), mutex = {__data = {
__lock = -216360920, __count = 32574, __owner = -216360904, __nusers = 32574, __kind = -216360904, __spins = 32574, __elision = 0, __list = {
__prev = 0x7f3ef31a9848 <main_arena+232>, __next = 0x7f3ef31a9848 <main_arena+232>}},
__size = "(\230\032\363>\177\000\000\070\230\032\363>\177\000\000\070\230\032\363>\177\000\000H\230\032\363>\177\000\000H\230\032\363>\177\000",
__align = 139908343306280}, cond = {__data = {__lock = -216360872, __futex = 32574, __total_seq = 139908343306328, __wakeup_seq = 139908343306344,
__woken_seq = 139908343306344, __mutex = 0x7f3ef31a9878 <main_arena+280>, __nwaiters = 4078606456, __broadcast_seq = 32574},
__size = "X\230\032\363>\177\000\000X\230\032\363>\177\000\000h\230\032\363>\177\000\000h\230\032\363>\177\000\000x\230\032\363>\177\000\000x\230\032\363>\177\000",
__align = 139908343306328}}, pipe = {client = 0x7f3ef31a97c8 <main_arena+104>, threadid = 139908343306200}, net = {private = 0x7f3ef31a97c8 <main_arena+104>,
flags = -216361000}}}
(gdb) p *nextfd
$6 = {fd = 0, type = CLUSTER_MAIN_SOCK, next = 0x14b1, xid = 38840, callback = 0x7f3ef31a9df8 <main_arena+1688>, removeme = 0 '\000', bits = {localsock = {replies = 0x0,
num_replies = 0, expected_replies = 0, sent_time = 0, in_progress = 0, sent_out = 0, private = 0x0, cmd = 0x0, cmd_len = 0, pipe = 0, finished = 0, all_success = 0,
cleanup_needed = 0, pipe_client = 0x0, threadid = 0, state = PRE_COMMAND, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0,
__elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0,
__wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, pipe = {client = 0x0,
threadid = 0}, net = {private = 0x0, flags = 0}}}
(gdb)
The management of this local_client_head list looks completely bogus, and I'm not sure how it's working at all. The list iteration itself looks broken enough, but it turns out that cleanup_zombies() called within that list iteration is itself also iterating through the list and modifying it. I don't know what any of this stuff is actually doing, I'm just looking at the mechanics of the code, and I think it's a miracle anything related to this list works at all. I don't know what this code is doing, but it looks highly suspect, so reassigning to agk who added this loop fairly recently. It looks like add_to_lvmqueue() is segfaulting because it's passed a garbage (freed) pointer, which isn't surprising based on what's happening to this list.
for (thisfd = &local_client_head; thisfd; thisfd = nextfd, nextfd = thisfd ? thisfd->next : NULL) {
/* in cleanup_zombie... */
for (lastfd = &local_client_head; (delfd = lastfd->next); lastfd = delfd) {
if (thisfd->bits.localsock.pipe_client == delfd) {
thisfd->bits.localsock.pipe_client = NULL;
lastfd->next = delfd->next;
dm_free(delfd);
break;
}
}
free_fd = thisfd;
add_to_lvmqueue(free_fd, NULL, 0, NULL);
... more stuff with list entries
}
Well the code dct identifies hasn't changed since July 2015, so could this be a new test that wasn't run on previous versions, or are we looking for a regression or change in some other piece of code with this as a side-effect? If there's nested list walking, that was there before 2015 too, but perhaps the difference is that 'break' changed to 'continue'. I'm not seeing the logs yet. (I adjusted a lot of debug messages 2 years ago to try to make it clear what the code was doing.) Could you attach the clvmd debug logs, so that I can see how to reproduce it? Created attachment 1282766 [details]
strace of clvmd -d1 -f -C on host-113
Finally got clvmd to segfault while also in debugging mode. Must be a timing issue, as it doesn't happen as easily. I was unable to get it to happen with just one machine going down. In this case, host-113 was the only cluster node remaining after host-115, and then later host-114 were taken down.
[root@host-113 ~]# clvmd -d1 -f -C > /tmp/clvmd.log 2>&1
*** buffer overflow detected ***: clvmd terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f92f72a5d87]
/lib64/libc.so.6(+0x10df40)[0x7f92f72a3f40]
/lib64/libc.so.6(+0x10fcf7)[0x7f92f72a5cf7]
clvmd(+0x1720b)[0x5604c8ad920b]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f92f71b7c05]
clvmd(+0x17ff0)[0x5604c8ad9ff0]
======= Memory map: ========
5604c8ac2000-5604c8c04000 r-xp 00000000 fd:00 9082764 /usr/sbin/clvmd
5604c8e04000-5604c8e0d000 r--p 00142000 fd:00 9082764 /usr/sbin/clvmd
5604c8e0d000-5604c8e14000 rw-p 0014b000 fd:00 9082764 /usr/sbin/clvmd
[...]
Created attachment 1282767 [details]
debug logging of clvmd on host-114
This node was down and clvmd was not running when the issue occurred on host-113.
Created attachment 1282768 [details]
debug logging of clvmd on host-115
This node was down and clvmd was not running when the issue occurred on host-113.
1) There's open-coded non-standard list handling in this code. We should use or define something standard instead. (Special properties of first list entry assumed, list walking done in different ways in different places etc.) 2) The way the variables are used and named isn't intuitive and has to be worked out before you can understand the code. We should rename/add comments explaining it all better. 3) The original fix for the deadlock back in 2013 accidentally introduced this nested list editing. (25bed9968155f43ef0b8832383ef711e7ae1685d). Most of the time it gets away with it, but if things happen in unusual sequences, you can see list corruption causing (I suspect) a variety of different failure modes. 4) We don't have a core dump for the most recent failure with debugging enabled, but the information suggests it is different from the one analysed higher up the bug. 5) Even better debug messages are needed - some existing messages are still missing key context information, the client_count output is miscalculated, but there's just about enough to understand the likely sequence of events and see how it can lead to list corruption. 6) SIGUSR2 doesn't do very much, and sometimes it has to be sent several times before it has the desired effect of abandoning the processing thread. Processing goes into wait states multiple times and continues writing out its response (without error) even after the process knows the client has gone away making the whole thing pointless. (and that's not all the problems) On the testing side, it is very easy to obtain a variety of different crash symptoms from the list corruption just by starting concurrent basic lvm commands (such as 'lvs') that talk to clvmd then killing them before they finish ('killall -9 lvs') and starting new commands. Instrumenting the code shows it does indeed sometimes continue for a little while after the initial corruption before actually crashing.
Improvements to logging and fixes to the list handling: https://www.redhat.com/archives/lvm-devel/2017-July/msg00000.html https://www.redhat.com/archives/lvm-devel/2017-July/msg00001.html https://www.redhat.com/archives/lvm-devel/2017-July/msg00002.html https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=17ed25409136162d091809d412596a45e0a30fd4 https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=af789fd6d0cba45cc4897772bae3ca73a3f15b38 https://sourceware.org/git/?p=lvm2.git;a=commitdiff;h=f2eda36cfa9baf3f7e132dbc3a3f34c908cb36d4 Marking verified in the latest rpms.
3.10.0-826.el7.x86_64
lvm2-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
lvm2-libs-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
lvm2-cluster-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
lvm2-lockd-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
lvm2-python-boom-0.8.1-5.el7 BUILT: Wed Dec 6 04:15:40 CST 2017
cmirror-2.02.176-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
device-mapper-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
device-mapper-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
device-mapper-event-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
device-mapper-event-libs-1.02.145-5.el7 BUILT: Wed Dec 6 04:13:07 CST 2017
device-mapper-persistent-data-0.7.3-3.el7 BUILT: Tue Nov 14 05:07:18 CST 2017
No longer able to reproduce in 50+ iterations of node recovery.
Cluster name: MCKINLEY
Stack: corosync
Current DC: mckinley-02 (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Jan 17 17:21:34 2018
Last change: Tue Jan 16 14:11:06 2018 by root via cibadmin on mckinley-01
4 nodes configured
9 resources configured
Online: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]
Full list of resources:
mckinley-apc (stonith:fence_apc): Started mckinley-02
Clone Set: dlm-clone [dlm]
Started: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]
Clone Set: clvmd-clone [clvmd]
Started: [ mckinley-01 mckinley-02 mckinley-03 mckinley-04 ]
/dev/mapper/MCKINLEY-MCKINLEY0 on /mnt/MCKINLEY0 type gfs2 (rw,relatime,seclabel)
/dev/mapper/MCKINLEY-MCKINLEY1 on /mnt/MCKINLEY1 type gfs2 (rw,relatime,seclabel)
[root@mckinley-01 ~]# lvs -o +segtype
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Type
MCKINLEY0 MCKINLEY Mwi-aom--- 250.00g [MCKINLEY0_mlog] 100.00 mirror
MCKINLEY1 MCKINLEY Mwi-aom--- 250.00g [MCKINLEY1_mlog] 100.00 mirror
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853 |
Description of problem: I had the following HA lvm mirror config set up and then killed host-076. During the recovery, clvmd on host-078 segfaulted resulting in additional fence operations. I'll attach the actual logs from the two nodes (-077 and -078) dealing with -076 going down. [root@host-076 ~]# pcs status Cluster name: STSRHTS21625 Stack: corosync Current DC: host-078 (version 1.1.15-11.el7_3.2-e174ec8) - partition with quorum Last updated: Thu Mar 2 09:09:43 2017 Last change: Wed Mar 1 18:01:50 2017 by root via crm_resource on host-076 3 nodes and 15 resources configured Online: [ host-076 host-077 host-078 ] Full list of resources: fence-host-076 (stonith:fence_xvm): Started host-076 fence-host-077 (stonith:fence_xvm): Started host-077 fence-host-078 (stonith:fence_xvm): Started host-078 Clone Set: dlm-clone [dlm] Started: [ host-076 host-077 host-078 ] Clone Set: clvmd-clone [clvmd] Started: [ host-076 host-077 host-078 ] Resource Group: HA_LVM1 lvm1 (ocf::heartbeat:LVM): Started host-078 fs1 (ocf::heartbeat:Filesystem): Started host-078 Resource Group: HA_LVM2 lvm2 (ocf::heartbeat:LVM): Started host-076 fs2 (ocf::heartbeat:Filesystem): Started host-076 Resource Group: HA_LVM3 lvm3 (ocf::heartbeat:LVM): Started host-078 fs3 (ocf::heartbeat:Filesystem): Started host-078 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root@host-076 ~]# reboot -fin # On host-078 Mar 2 09:10:02 host-078 pengine[2667]: warning: Node host-076 will be fenced because the node is no longer part of the cluster Mar 2 09:10:02 host-078 pengine[2667]: warning: Node host-076 is unclean Mar 2 09:10:02 host-078 pengine[2667]: warning: Action fence-host-076_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action dlm:1_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action dlm:1_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action clvmd:1_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action clvmd:1_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action lvm2_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Action fs2_stop_0 on host-076 is unrunnable (offline) Mar 2 09:10:02 host-078 pengine[2667]: warning: Scheduling Node host-076 for STONITH Mar 2 09:10:02 host-078 pengine[2667]: notice: Move fence-host-076#011(Started host-076 -> host-077) Mar 2 09:10:02 host-078 pengine[2667]: notice: Move fence-host-077#011(Started host-077 -> host-078) Mar 2 09:10:02 host-078 pengine[2667]: notice: Stop dlm:1#011(host-076) Mar 2 09:10:02 host-078 pengine[2667]: notice: Stop clvmd:1#011(host-076) Mar 2 09:10:02 host-078 pengine[2667]: notice: Move lvm1#011(Started host-078 -> host-077) Mar 2 09:10:02 host-078 pengine[2667]: notice: Move fs1#011(Started host-078 -> host-077) Mar 2 09:10:02 host-078 pengine[2667]: notice: Move lvm2#011(Started host-076 -> host-078) Mar 2 09:10:02 host-078 pengine[2667]: notice: Move fs2#011(Started host-076 -> host-078) Mar 2 09:10:02 host-078 pengine[2667]: warning: Calculated transition 489 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-2.bz2 Mar 2 09:10:02 host-078 crmd[2668]: notice: Initiating stop operation fence-host-077_stop_0 on host-077 Mar 2 09:10:02 host-078 crmd[2668]: notice: Initiating stop operation fs1_stop_0 locally on host-078 Mar 2 09:10:02 host-078 crmd[2668]: notice: Requesting fencing (reboot) of node host-076 Mar 2 09:10:02 host-078 stonith-ng[2664]: notice: Client crmd.2668.730f2278 wants to fence (reboot) 'host-076' with device '(any)' Mar 2 09:10:02 host-078 stonith-ng[2664]: notice: Merging stonith action reboot for node host-076 originating from client crmd.2668.ab0ff2c2 with identical request from stonith-api.16516 (72s) Mar 2 09:10:02 host-078 crmd[2668]: notice: Initiating start operation fence-host-076_start_0 on host-077 Mar 2 09:10:02 host-078 crmd[2668]: notice: Initiating start operation fence-host-077_start_0 locally on host-078 Mar 2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: Running stop for /dev/STSRHTS216251/ha on /mnt/HA_fs1 Mar 2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: Trying to unmount /mnt/HA_fs1 Mar 2 09:10:02 host-078 kernel: XFS (dm-6): Unmounting Filesystem Mar 2 09:10:02 host-078 Filesystem(fs1)[4055]: INFO: unmounted /mnt/HA_fs1 successfully Mar 2 09:10:02 host-078 crmd[2668]: notice: Result of stop operation for fs1 on host-078: 0 (ok) Mar 2 09:10:02 host-078 crmd[2668]: notice: Initiating stop operation lvm1_stop_0 locally on host-078 Mar 2 09:10:05 host-078 stonith-ng[2664]: notice: Operation 'reboot' [4051] (call 2 from stonith-api.16516) for host 'host-076' with device 'fence-host-076' returned: 0 (OK) Mar 2 09:10:05 host-078 stonith-ng[2664]: notice: Operation reboot of host-076 by host-078 for stonith-api.16516: OK Mar 2 09:10:05 host-078 crmd[2668]: notice: Peer host-076 was terminated (reboot) by host-078 for host-077: OK (ref=2d735f7b-c28a-4aea-9ac1-50ad64b337dd) by client stonith-api.16516 Mar 2 09:10:05 host-078 crmd[2668]: notice: Transition aborted: External Fencing Operation Mar 2 09:10:05 host-078 stonith-ng[2664]: notice: Operation reboot of host-076 by host-078 for crmd.2668: OK Mar 2 09:10:05 host-078 crmd[2668]: notice: Stonith operation 2/67:489:0:45dc2b27-a4c4-4034-8f4d-5747c7e96c6d: OK (0) Mar 2 09:10:06 host-078 dlm_controld[3054]: 63054 fence status 1 receive 0 from 2 walltime 1488467406 local 63054 Mar 2 09:10:06 host-078 crmd[2668]: notice: Result of start operation for fence-host-077 on host-078: 0 (ok) Mar 2 09:10:06 host-078 crmd[2668]: notice: Peer host-076 was terminated (reboot) by host-078 for host-078: OK (ref=ab0ff2c2-2227-46e9-bd7b-0fedf5fc5224) by client crmd.2668 Mar 2 09:10:06 host-078 kernel: clvmd[4089]: segfault at f29 ip 00007f4c2fcf2d88 sp 00007ffefae74170 error 4 in clvmd[7f4c2fcdc000+136000] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/clvmd -T90 -d0'. Program terminated with signal 6, Aborted. #0 0x00007f4c2e2471d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f4c2e2471d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007f4c2e2488c8 in __GI_abort () at abort.c:90 #2 0x00007f4c2e286f07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f4c2e391b48 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007f4c2e28e503 in malloc_printerr (ar_ptr=0x7f4c2e5cc760 <main_arena>, ptr=<optimized out>, str=0x7f4c2e391c48 "free(): corrupted unsorted chunks", action=3) at malloc.c:5013 #4 _int_free (av=0x7f4c2e5cc760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3835 #5 0x00007f4c2fcf5909 in process_work_item (cmd=0x7f4c30342cc0) at clvmd.c:2007 #6 lvm_thread_fn (arg=<optimized out>) at clvmd.c:2058 #7 0x00007f4c2e9e5dc5 in start_thread (arg=0x7f4c2fb41700) at pthread_create.c:308 #8 0x00007f4c2e30973d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Version-Release number of selected component (if applicable): 3.10.0-514.6.2.el7.x86_64 lvm2-2.02.166-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 lvm2-libs-2.02.166-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 lvm2-cluster-2.02.166-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 device-mapper-1.02.135-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 device-mapper-libs-1.02.135-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 device-mapper-event-1.02.135-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 device-mapper-event-libs-1.02.135-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 cmirror-2.02.166-1.el7_3.2 BUILT: Wed Nov 16 04:11:32 CST 2016