Bug 623176
| Summary: | token loss during recovery can trigger abort | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Jaroslav Kortus <jkortus> | |
| Component: | openais | Assignee: | Jan Friesse <jfriesse> | |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 5.5 | CC: | ccaulfie, cluster-maint, djansa, edamato, jfriesse, jwest, ppecka, sdake | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | --- | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | openais-0.80.6-29.el5 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 675783 (view as bug list) | Environment: | ||
| Last Closed: | 2011-07-21 07:47:20 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 675783, 694182 | |||
| Attachments: | ||||
|
Description
Jaroslav Kortus
2010-08-11 14:23:37 UTC
Created attachment 438199 [details]
core of aisexec from crashed node.
Core was generated by `aisexec'.
Program terminated with signal 6, Aborted.
#0 0xa000000000010620 in __kernel_syscall_via_break ()
(gdb) where
#0 0xa000000000010620 in __kernel_syscall_via_break ()
#1 0x20000000000f8b00 in raise () from /lib/libc.so.6.1
#2 0x20000000000fbe80 in abort () from /lib/libc.so.6.1
#3 0x20000000000e87b0 in __assert_fail () from /lib/libc.so.6.1
#4 0x400000000001a970 in memb_state_commit_token_update (
instance=0x20000000017d0010, commit_token=0x60000000001437a8)
at totemsrp.c:2681
#5 memb_state_commit_enter (instance=0x20000000017d0010,
commit_token=0x60000000001437a8) at totemsrp.c:1754
#6 0x4000000000022030 in message_handler_memb_commit_token (
instance=0x20000000017d0010, msg=0x60000000001437a8,
msg_len=<value optimized out>,
endian_conversion_needed=<value optimized out>) at totemsrp.c:4062
#7 0x4000000000017a20 in main_deliver_fn (context=0x20000000017d0010,
msg=0x60000000001437a8, msg_len=1325018) at totemsrp.c:4135
#8 0x4000000000011a10 in none_mcast_recv (rrp_instance=<value optimized out>,
iface_no=0, context=0x20000000017d0010, msg=0x60000000001437a8,
msg_len=312) at totemrrp.c:476
#9 0x4000000000011d10 in rrp_deliver_fn (context=0x60000000001181b0,
msg=0x60000000001437a8, msg_len=312) at totemrrp.c:1319
#10 0x400000000000dcc0 in net_deliver_fn (handle=0, fd=<value optimized out>,
revents=1, data=0xc000000000000712) at totemnet.c:695
#11 0x4000000000007290 in poll_run (handle=0) at aispoll.c:402
---Type <return> to continue, or q <return> to quit---
#12 0x4000000000039380 in main (argc=<value optimized out>,
argv=<value optimized out>) at main.c:628
(gdb)
(gdb) up
#1 0x20000000000f8b00 in raise () from /lib/libc.so.6.1
(gdb) up
#2 0x20000000000fbe80 in abort () from /lib/libc.so.6.1
(gdb) up
#3 0x20000000000e87b0 in __assert_fail () from /lib/libc.so.6.1
(gdb) up
#4 0x400000000001a970 in memb_state_commit_token_update (
instance=0x20000000017d0010, commit_token=0x60000000001437a8)
at totemsrp.c:2681
2681 assert (commit_token->memb_index <= commit_token->addr_entries);
(gdb) prnt commit_token
Undefined command: "prnt". Try "help".
(gdb) print *commit_token
Cannot access memory at address 0x0
(gdb) print commit_token
$1 = <value optimized out>
(gdb) up
#5 memb_state_commit_enter (instance=0x20000000017d0010,
commit_token=0x60000000001437a8) at totemsrp.c:1754
1754 memb_state_commit_token_update (instance, commit_token);
(gdb) print *commit_token
$2 = {header = {type = 4 '\004', encapsulated = 0 '\000',
endian_detector = 65314, nodeid = 1}, token_seq = 6, ring_id = {rep = {
nodeid = 1, family = 2,
addr = "\n\017Y\263\b\000\002\000\n\017Y\263\b\000\004"}, seq = 128},
retrans_flg = 0, memb_index = 4, addr_entries = 3,
end_of_commit_token = 0x60000000001437de "\001"}
(gdb) print instance->memb_state
$4 = MEMB_STATE_GATHER
(gdb) print instance->my_proc_list_entries
$6 = 3
(gdb) print instance->my_failed_list_entries
$7 = 0
(gdb) up
#6 0x4000000000022030 in message_handler_memb_commit_token (
instance=0x20000000017d0010, msg=0x60000000001437a8,
msg_len=<value optimized out>,
endian_conversion_needed=<value optimized out>) at totemsrp.c:4062
4062 memb_state_commit_enter (instance, memb_commit_token);
(gdb) print sub_entries
$8 = 3
(gdb) print sub[0]
$9 = {addr = {{nodeid = 1, family = 2,
addr = "\n\017Y\263\b\000\002\000\n\017Y\263\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print sub[1]
$10 = {addr = {{nodeid = 2, family = 2,
addr = "\n\017Y\264\b\000\002\000\n\017Y\264\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print sub[2]
$11 = {addr = {{nodeid = 3, family = 2,
addr = "\n\017Y\265\b\000\002\000\n\017Y\265\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print my_proc_list[0]
No symbol "my_proc_list" in current context.
(gdb) print instance->my_proc_list[0]
$12 = {addr = {{nodeid = 1, family = 2,
addr = "\n\017Y\263\b\000\002\000\n\017Y\263\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print instance->my_proc_list[1]
$13 = {addr = {{nodeid = 2, family = 2,
addr = "\n\017Y\264\b\000\002\000\n\017Y\264\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print instance->my_proc_list[2]
$14 = {addr = {{nodeid = 3, family = 2,
addr = "\n\017Y\265\b\000\002\000\n\017Y\265\b\000\004"}, {nodeid = 0,
family = 0, addr = '\000' <repeats 15 times>}}}
(gdb) print instance->my_ring_id
$15 = {rep = {nodeid = 1, family = 2,
addr = "\n\017Y\263\b\000\002\000\n\017Y\263\b\000\004"}, seq = 124}
From the logs, we see the processor lost a token in the recovery state, then entered the gather state. From the core dump, we can see the processor attempted to enter the commit state from the gather state, but then segfaulted.
It would be nice to have a reproducer, but I believe the issue is that the full ring sequence number is restored on entering recovery state (ring_state_restore()) and again when a token timeout occurs during recovery. This is corrupting the ring sequence numbers.
What happens then is as follows:
nodes 1,2,3 with seq = 4
commit token seq = 8
commit token sent from 1->2 with seq = 8
commit token sent from 2->3 with seq = 8
commit token sent from 3->1 (1 enters recover state)
commit token sent from 1->2 (2 drops token for long period of time)
failure to send commit token from 3->1 because of token loss
node 1 has token timeout in recovery state, which restores ring sequence number to 4 (!!), and enters gather
finally that commit token makes it to node 1, and the commit token is accepted as a new commit token (rather then being rejected)
When that commit token makes it back around to 1 in recovery, it should start the recovery process. But because it is in gather, it thinks its a valid commit token (even though its from a now discarded ring configuration)
the code is as follows:
case MEMB_STATE_GATHER:
memb_set_subtract (sub, &sub_entries,
instance->my_proc_list, instance->my_proc_list_entries,
instance->my_failed_list, instance->my_failed_list_entries);
if (memb_set_equal (addr,
memb_commit_token->addr_entries,
sub,
sub_entries) &&
memb_commit_token->ring_id.seq > instance->my_ring_id.seq) {
memb_state_commit_enter (instance, memb_commit_token);
}
break;
in this case, the commit token seq is 8, node 1's seq is 4 (because of ring_state_restore). Then node 1 thinks this is a valid commit token to form a new ring. Instead node 1 should reject the commit token, which will trigger a token loss in the recovery state on all nodes, resulting in a new round of consensus.
The totem spec states for token loss timeout expired in recovery state:
"1. discard all new messages received on the new ring
2. empty retrans_message queue
3. determine current old ring aru
4. call shift_to_gather"
It appears the code currently takes point 3 and expands it into restoring all state about the current ring, which is wrong.
*** Bug 631835 has been marked as a duplicate of this bug. *** Reproducer (in code) now exists for corosync https://bugzilla.redhat.com/attachment.cgi?id=482098&action=diff brilliant work on comment #9. Honza, Please backport the resolution from corosync master patch. Thanks! -steve Created attachment 484103 [details]
Backport from corosync master branch
Backport of Corosync 6aa47fde953bf2179f5bd2dd07815fc7d80f47bb
Patch sent to ML. *** Bug 556804 has been marked as a duplicate of this bug. *** According to Steve, this patch introduced a regression in the code which was exposed during QE testing. I'm flipping the state. following patches need backports 6aa47fde953bf2179f5bd2dd07815fc7d80f47bb d99fba72e65545d8a3573b754525bd2ec8dcc540 336741ee96caf3ae435b609ee8a76788902c9edf 0eabeee63eca7a4cc1d907607057ac668fafbcae 0fcf760977a73ef78bacff47e61c30ba21929faa Created attachment 488395 [details]
Backport of d99fba72e65545d8a3573b754525bd2ec8dcc540
Created attachment 488396 [details]
Backport of 336741ee96caf3ae435b609ee8a76788902c9edf
Created attachment 488400 [details]
Backport of Corosync 0eabeee63eca7a4cc1d907607057ac668fafbcae
Created attachment 488401 [details]
Backport of 0fcf760977a73ef78bacff47e61c30ba21929faa
Created attachment 488697 [details]
Backport of Corosync 0eabeee63eca7a4cc1d907607057ac668fafbcae
Fixed missing instance->old_ring_state_saved = 1;
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1012.html |