1181203 – glusterd/libglusterfs: Various failures when multi threading epoll, due to racy state updates/maintenance

Bug 1181203 - glusterd/libglusterfs: Various failures when multi threading epoll, due to racy state updates/maintenance

Summary: glusterd/libglusterfs: Various failures when multi threading epoll, due to ra...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	krishnan parthasarathi
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-12 15:12 UTC by Shyamsundar
Modified:	2015-11-03 23:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:
Environment:
Last Closed:	2015-05-14 17:28:57 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Shyamsundar 2015-01-12 15:12:46 UTC

Description of problem:

Here are 2 problems that were exchanged via mail on this,

<<<< 1 >>>>
Test Case:
./tests/basic/volume-snapshot.t

Failure:
At step 30 in the last run reported by Gerrit (the test case has one additional step now, so fails in one of the last 2 steps), OR, now fails at, 30/31.

The above steps are (were),
TEST ! volume_exists $V0
TEST ! volume_exists $V1

The manner of failure states that the volumes are not deleted, but on removing cleanup routine from the test case and checking manually if the volumes are present, results in the volumes actually being absent, so the test case should not fail, or it is a false failure report.

The real reason however, seems to be that the command "/usr/local/sbin/gluster --glusterd-sock=/d/backends/1/glusterd/gd.sock volume info <patchy|patchy2>", results in no output string, hence the grep trick to determine if this volume has been deleted results in a failure (the grep trick is in volume.rc in function volume_exists, from where this command was captured)

Reason for failure:
From my debugging, the reason for failure is that, we reach exit statement in input.c:37:cli_batch() before reaching the vfprintf(...) in _cli_err, which is invoked from, gf_cli_get_volume_cbk post processing the response for the RPC and sending the broadcast that the RPC is done.

The thread that sends the request waits on the condition that the RPC response is processed, and then does a series of returns that end in the exit in input.c. Whereas the other thread is processing the output and still printing it to stderr.

If we hit the exit first, there is no stderr (or it could be closed) or the running threads could itself be _terminated_ before completion.

Further proof of RCA as above:
1) Tried this command in gdb, with a breakpoint at gf_cli_get_volume_cbk, it is invoked every time and results in an output of err_str as required, but sometimes fails to print it.

2) Put a breakpoint in input.c:exit ()... and sometimes this break point is hit before the print to stderr from _cli_err function (which is bad)

3) Put a sleep of 3 seconds just before the exit() call, and the problem never appeared.

4) Removed the sleep and this happens about 50% of the time.

Solution:
Here is where I need your advice/direction. My thought is obviously to broadcast the response post processing the results and printing them. IOW, call cli_cmd_broadcast_response once the RPC post-processing is really done.

We may need to do this for other types of RPC call back handlers as well. So a different solution could be arrived at, where we wait before exit (somehow?).

Strangeness:
The one strange thing is that without the multi thread epoll patch, this problem never appears (not once in 200 iterations). So what is the weirdness in thread scheduling when we have more than one epoll thread, I am not sure, but I guess something there is kicking up this storm. (i.e the RPC request thread gets woken up and scheduled for execution when there is 1 more thread in the executable (or something like that )).

Reproducible nature of the problem:
I cannot reproduce this in my machine, I had to borrow slave22 from the regression setup to get through this. jFYI.

<<<< 1 >>>>

<<<< 2 >>>>
< I did not do much analysis on this one, thought it was straight forward>

Crash observed when 2 epoll threads are conflicting on updating the xprt->list in glusterd.c:glusterd_rpcsvc_notify

GDB session details follow (see NOTE):
(gdb) t a a bt

Thread 8 (Thread 0x7f7960ce5700 (LWP 29623)):
#0  0x00007f7969c9498e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f796a74f77f in syncenv_task (proc=0xcab700) at syncop.c:535
#2  0x00007f796a755c20 in syncenv_processor (thdata=0xcab700) at syncop.c:613
#3  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f79695fa9dd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f7953fff700 (LWP 29750)):
#0  0x00007f7969c97264 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f7969c92508 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f7969c923d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f795bcea895 in glusterd_rpcsvc_notify (rpc=<value optimized out>, xl=<value optimized out>, event=<value optimized out>, data=0x7f7940006f00) at glusterd.c:324
#4  0x00007f796a4e4d14 in rpcsvc_program_notify (trans=<value optimized out>, mydata=<value optimized out>, event=<value optimized out>, data=0x7f7940006f00) at rpcsvc.c:327
#5  rpcsvc_accept (trans=<value optimized out>, mydata=<value optimized out>, event=<value optimized out>, data=0x7f7940006f00) at rpcsvc.c:350
#6  rpcsvc_notify (trans=<value optimized out>, mydata=<value optimized out>, event=<value optimized out>, data=0x7f7940006f00) at rpcsvc.c:775
#7  0x00007f796a4e63c8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:516
#8  0x00007f795a9af07e in socket_server_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0xcbbb90, poll_in=<value optimized out>, poll_out=<value optimized out>, poll_err=<value optimized out>) at socket.c:2773
#9  0x00007f796a7698bc in event_dispatch_epoll_handler (data=0xc9c3f0) at event-epoll.c:550
#10 event_dispatch_epoll_worker (data=0xc9c3f0) at event-epoll.c:608
#11 0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f79695fa9dd in clone () from /lib64/libc.so.6

NOTE: Above thread is waiting for the lock after INIT_LIST_HEAD (&xprt->list); which is outside the lock.

Thread 6 (Thread 0x7f79616e6700 (LWP 29622)):
#0  0x00007f7969c9498e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f796a74f77f in syncenv_task (proc=0xcab340) at syncop.c:535
#2  0x00007f796a755c20 in syncenv_processor (thdata=0xcab340) at syncop.c:613
#3  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f79695fa9dd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f7958ce5700 (LWP 29749)):
#0  0x00007f7969c945bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f795bd8cfe3 in hooks_worker (args=<value optimized out>) at glusterd-hooks.c:502
#2  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f79695fa9dd in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f79620e7700 (LWP 29620)):
#0  0x00007f7969c97f3d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f796a7306bc in gf_timer_proc (ctx=0xc7e010) at timer.c:170
#2  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f79695fa9dd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f79596e6700 (LWP 29621)):
#0  0x00007f7969c984b5 in sigwait () from /lib64/libpthread.so.0
#1  0x000000000040560b in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1734
#2  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f79695fa9dd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f796aba2700 (LWP 29619)):
#0  0x00007f7969c9122d in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f796a7694d0 in event_dispatch_epoll (event_pool=<value optimized out>) at event-epoll.c:632
#2  0x0000000000407e78 in main (argc=9, argv=0x7fff225292b8) at glusterfsd.c:2052

Thread 1 (Thread 0x7f79535fe700 (LWP 29751)):
#0  list_del (rpc=<value optimized out>, xl=0xcb3860, event=<value optimized out>, data=0x7f7940006f00) at ../../../../libglusterfs/src/list.h:76
#1  glusterd_rpcsvc_notify (rpc=<value optimized out>, xl=0xcb3860, event=<value optimized out>, data=0x7f7940006f00) at glusterd.c:332
#2  0x00007f796a4e32d5 in rpcsvc_handle_disconnect (svc=0xcbb5a0, trans=0x7f7940006f00) at rpcsvc.c:741
#3  0x00007f796a4e4e10 in rpcsvc_notify (trans=0x7f7940006f00, mydata=<value optimized out>, event=<value optimized out>, data=0x7f7940006f00) at rpcsvc.c:779
#4  0x00007f796a4e63c8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:516
#5  0x00007f795a9b2031 in socket_event_poll_err (fd=<value optimized out>, idx=<value optimized out>, data=0x7f7940006f00, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:1163
#6  socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x7f7940006f00, poll_in=<value optimized out>, poll_out=0, poll_err=0) at socket.c:2364
#7  0x00007f796a7698bc in event_dispatch_epoll_handler (data=0xc9c3f0) at event-epoll.c:550
---Type <return> to continue, or q <return> to quit---
#8  event_dispatch_epoll_worker (data=0xc9c3f0) at event-epoll.c:608
#9  0x00007f7969c909d1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f79695fa9dd in clone () from /lib64/libc.so.6

NOTE: This thread has the lock and is doing list_del operations

(gdb) list
71      }
72
73      static inline void
74      list_del (struct list_head *old)
75      {
76              old->prev->next = old->next;
77              old->next->prev = old->prev;
78
79              old->next = (void *)0xbabebabe;
80              old->prev = (void *)0xcafecafe;
(gdb) f 1
#1  glusterd_rpcsvc_notify (rpc=<value optimized out>, xl=0xcb3860, event=<value optimized out>, data=0x7f7940006f00) at glusterd.c:332
332                     list_del (&xprt->list);
(gdb) lisy
Undefined command: "lisy".  Try "help".
(gdb) list
327                     break;
328             }
329             case RPCSVC_EVENT_DISCONNECT:
330             {
331                     pthread_mutex_lock (&priv->xprt_lock);
332                     list_del (&xprt->list);
333                     pthread_mutex_unlock (&priv->xprt_lock);
334                     pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt);
335                     break;
336             }
(gdb) list -10
322                     INIT_LIST_HEAD (&xprt->list);
323
324                     pthread_mutex_lock (&priv->xprt_lock);
325                     list_add_tail (&xprt->list, &priv->xprt_list);
326                     pthread_mutex_unlock (&priv->xprt_lock);
327                     break;
328             }
329             case RPCSVC_EVENT_DISCONNECT:
330             {
331                     pthread_mutex_lock (&priv->xprt_lock);
(gdb)
332                     list_del (&xprt->list);
333                     pthread_mutex_unlock (&priv->xprt_lock);
334                     pmap_registry_remove (this, 0, NULL, GF_PMAP_PORT_NONE, xprt);
335                     break;
336             }
337
338             default:
339                     break;
340             }
341

Problem:
Initializing the xprt list outside the lock should be avoided, as I assume that is the cause for the core, and as 2 threads are now processing events, this is causing the problem.

<<<< 2 >>>>

Other information captured above. I think I got the component wrong, please correct as appropriate.

Comment 1 Anand Avati 2015-01-13 06:28:03 UTC

REVIEW: http://review.gluster.org/9413 (rpc: initialise transport's list on creation) posted (#3) for review on master by Krishnan Parthasarathi (kparthas)

Comment 2 Anand Avati 2015-01-14 03:48:36 UTC

REVIEW: http://review.gluster.org/9399 (cli: volume-info <VOLNAME>, peer-status to report failure before exit) posted (#4) for review on master by Krishnan Parthasarathi (kparthas)

Comment 3 Anand Avati 2015-01-15 08:06:40 UTC

COMMIT: http://review.gluster.org/9413 committed in master by Raghavendra G (rgowdapp) 
------
commit c4561b6fd90c52667eebec9eecf974b4c0786382
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu Jan 8 09:43:50 2015 +0530

    rpc: initialise transport's list on creation
    
    Initialising the transport's list, meant to hold clients connected to
    it, on the first connection event is prone to race, especially with the
    introduction of multi-threaded event layer.
    
    BUG: 1181203
    Change-Id: I6a20686a2012c1f49a279cc9cd55a03b8c7615fc
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/9413
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Tested-by: Raghavendra G <rgowdapp>

Comment 4 Anand Avati 2015-01-21 17:00:57 UTC

COMMIT: http://review.gluster.org/9399 committed in master by Vijay Bellur (vbellur) 
------
commit 217253b692610c4f6ce5d8426832c1d69844966c
Author: Krishnan Parthasarathi <kparthas>
Date:   Wed Jan 7 09:53:52 2015 +0530

    cli: volume-info <VOLNAME>, peer-status to report failure before exit
    
    BUG: 1181203
    Change-Id: I84260d66135b1980f93b258d5de58c9e6b7a8402
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/9399
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 5 Anand Avati 2015-01-23 04:19:43 UTC

REVIEW: http://review.gluster.org/9479 (glusterd: Set connection status to true only after peer handshake is completed) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 6 Anand Avati 2015-01-27 15:46:33 UTC

REVIEW: http://review.gluster.org/9498 (test: Check till probe timeout, for conf change sync across glusterd) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)

Comment 7 Anand Avati 2015-01-28 06:13:10 UTC

COMMIT: http://review.gluster.org/9498 committed in master by Krishnan Parthasarathi (kparthas) 
------
commit 2f640b3f3ce21c6790c661f53ff39474192b5a21
Author: Shyam <srangana>
Date:   Tue Jan 27 10:42:54 2015 -0500

    test: Check till probe timeout, for conf change sync across glusterd
    
    Currently the test case changed here, checks for the peer count to
    be 1 until probe timeout and then checks for the changed
    configuration, if it has been synced.
    
    The peer count is not a gurantee that the configuration is also
    in sync, hence changing this test case to check for the conf
    update till probe timeout, by which time it should be in sync
    (or at least that is our tolerance), and the test case deemed
    as passing.
    
    Change-Id: I4b1560979cfde3bd3bd691852d7d3a63e253bcf2
    BUG: 1181203
    Signed-off-by: Shyam <srangana>
    Reviewed-on: http://review.gluster.org/9498
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Krishnan Parthasarathi <kparthas>
    Tested-by: Krishnan Parthasarathi <kparthas>

Comment 8 Niels de Vos 2015-05-14 17:28:57 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 9 Niels de Vos 2015-05-14 17:35:48 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 10 Niels de Vos 2015-05-14 17:38:10 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:45:36 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.