Bug 590898 - corosync blocks on exit with debug: on enabled
corosync blocks on exit with debug: on enabled
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: corosync (Show other bugs)
rawhide
All Linux
urgent Severity urgent
: ---
: ---
Assigned To: Steven Dake
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-10 17:58 EDT by Steven Dake
Modified: 2016-04-26 17:50 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-07-07 12:46:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steven Dake 2010-05-10 17:58:18 EDT
Description of problem:
corosync gets stuck in shutdown

Version-Release number of selected component (if applicable):
corosync-1.2.1

How reproducible:
opensuse dependent

Steps to Reproduce:
1.
2.
3.
  
Actual results:
locks up

Expected results:
doesn't lock up

Additional info:

User attached to process and found this backtrace of all threads:

Thread 3 (Thread 0x7f679067e910 (LWP 19541)):
#0  0x00007f6792c41da6 in logsys_worker_thread (data=<value optimized out>) at logsys.c:766
#1  0x00007f679261865d in start_thread () from /lib64/libpthread.so.0
#2  0x00007f6792183e1d in clone () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f679317dfb0 (LWP 19542)):
#0  0x00007f679261d965 in ?? () from /lib64/libpthread.so.0
#1  0x00000000004091b8 in prioritized_timer_thread (data=<value optimized out>) at timer.c:135
#2  0x00007f679261865d in start_thread () from /lib64/libpthread.so.0
#3  0x00007f6792183e1d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f67932756f0 (LWP 19540)):
#0  0x00007f679261996d in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000407595 in _corosync_exit_error (err=AIS_DONE_EXIT, file=<value optimized out>, line=<value optimized out>) at util.c:97
#2  0x0000000000406d3b in unlink_all_completed () at main.c:160
#3  0x0000000000408aa3 in service_exit_schedwrk_handler (data=0x7f679067e9e0) at service.c:614
#4  0x000000000040c64b in schedwrk_do (type=<value optimized out>, context=<value optimized out>) at schedwrk.c:77
#5  0x00007f6792e5b561 in token_callbacks_execute (type=<value optimized out>, instance=<value optimized out>) at totemsrp.c:3209
#6  message_handler_orf_token (type=<value optimized out>, instance=<value optimized out>) at totemsrp.c:3601
#7  0x00007f6792e51cd3 in rrp_deliver_fn (context=0x63e790, msg=0x661cd8, msg_len=70) at totemrrp.c:1393
#8  0x00007f6792e50cf2 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=<value optimized out>) at totemudp.c:1223
#9  0x00007f6792e4cdda in poll_run (handle=2240235047305084928) at coropoll.c:396
#10 0x0000000000405c44 in main (argc=4, argv=<value optimized out>) at main.c:1556
Comment 1 Steven Dake 2010-05-10 18:06:35 EDT
logsys.c:766 is
                        log_rec_idx = record_read (buf, log_rec_idx, &log_msg);

What if this function is spinning.

In that case
logsys.c:785 would never call pthread_exit

and then the pthread_join in the main thread would not collect the exit status of the thread and block indefinately on exit.


a break statement that occurs when no messages are waiting for flushing
Comment 2 Steven Dake 2010-05-10 18:20:59 EDT
steps to reproduce
place debug: on in config file
service corosync start
test/cpgbench
wait 10 seconds
service corosync stop

generates exact stack trace above.
Comment 3 Jan Friesse 2010-05-11 04:46:28 EDT
From my debug it is really problem in logsys (overwriting own its memory).

Because of: <sdake> about got logsys rewritten, reassigning back to Steve.

Note You need to log in before you can comment on or make changes to this bug.