Bug 890472 - glusterfs client crashed due to stack overflow
glusterfs client crashed due to stack overflow
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
unspecified
Unspecified Unspecified
medium Severity high
: ---
: ---
Assigned To: Raghavendra G
Sachidananda Urs
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-27 02:43 EST by Raghavendra Bhat
Modified: 2013-09-23 18:29 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.4rhs-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-23 18:29:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2012-12-27 02:43:54 EST
Description of problem:

2 machines. Created the storage pool, then created a replicate volume with a brick running on each machine. Started the volume. On a 3rd machine mounted the volume via cifs (i.e samba reexport of the fuse mount) on 2 mount points (mounted from 2 machines of the storage pool). 

On the cifs mounts started executing ping_pong test (./ping_pong -rw <filename> 4). There was no data increment shown in ping_pong tests when 2nd instance of ping_pong test was started on 2nd cifs mount and soon number of locks per second dropped down to zero. Took the statedump and found a blocked inodelk on the file upon which ping_pong was being run.

After some time terminated the ping_pong on both the mounts and unmounted the cifs mount points. Stopped the volume.

Found a core with the below backtrace.

Core was generated by `/usr/sbin/glusterfs --volfile-id=mirror --volfile-server=localhost.localdomain'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007faebbdcd6eb in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
Missing separate debuginfos, use: debuginfo-install glusterfs-3.3.0rhsvirt1-8.el6rhs.x86_64
(gdb) #0  0x00007faebbdcd6eb in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#1  0x00007faebbdd152a in client3_1_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#2  0x00007faebbdc944e in client_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#3  0x00007faebbb9587d in afr_nonblocking_inodelk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#4  0x00007faebbb79900 in afr_lock_rec () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#5  0x00007faebbb7c8b5 in afr_transaction () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#6  0x00007faebbb75333 in afr_do_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#7  0x00007faebbb735a4 in afr_open_fd_fix () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#8  0x00007faebbb741b0 in afr_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#9  0x00007faebb94ac81 in wb_fulfill_head ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#10 0x00007faebb94c64d in wb_process_queue ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#11 0x00007faebb94c6bf in wb_fulfill_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#12 0x00007faebbb6f9fd in afr_writev_unwind () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#13 0x00007faebbb747bb in afr_writev_done () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#14 0x00007faebbb7b6a9 in afr_post_blocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#15 0x00007faebbb93b57 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#16 0x00007faebbb9413d in afr_unlock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#17 0x00007faebbb962bd in afr_lock_blocking () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#18 0x00007faebbb96a8e in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#19 0x00007faebbb96c75 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#20 0x00007faebbddc664 in client3_1_finodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#21 0x0000003f7740fdd6 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#22 0x00007faebbdcd8d3 in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#23 0x00007faebbdd152a in client3_1_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#24 0x00007faebbdc944e in client_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#25 0x00007faebbb95fbb in afr_lock_blocking () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#26 0x00007faebbb96984 in afr_blocking_lock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#27 0x00007faebbb7b5b5 in afr_post_nonblocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#28 0x00007faebbb93b57 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#29 0x00007faebbb9413d in afr_unlock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#30 0x00007faebbb94775 in afr_nonblocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#31 0x00007faebbddc664 in client3_1_finodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#32 0x0000003f7740fdd6 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#33 0x00007faebbdcd8d3 in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#34 0x00007faebbdd152a in client3_1_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#35 0x00007faebbdc944e in client_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#36 0x00007faebbb9587d in afr_nonblocking_inodelk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#37 0x00007faebbb79900 in afr_lock_rec () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#38 0x00007faebbb7c8b5 in afr_transaction () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#39 0x00007faebbb75333 in afr_do_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#40 0x00007faebbb735a4 in afr_open_fd_fix () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#41 0x00007faebbb741b0 in afr_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#42 0x00007faebb94ac81 in wb_fulfill_head ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#43 0x00007faebb94c64d in wb_process_queue ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#44 0x00007faebb94c6bf in wb_fulfill_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/performance/write-behind.so
#45 0x00007faebbb6f9fd in afr_writev_unwind () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#46 0x00007faebbb747bb in afr_writev_done () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#47 0x00007faebbb7b6a9 in afr_post_blocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#48 0x00007faebbb93b57 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#49 0x00007faebbb9413d in afr_unlock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#50 0x00007faebbb962bd in afr_lock_blocking () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#51 0x00007faebbb96a8e in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#52 0x00007faebbb96c75 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#53 0x00007faebbddc664 in client3_1_finodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#54 0x0000003f7740fdd6 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#55 0x00007faebbdcd8d3 in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#56 0x00007faebbdd152a in client3_1_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#57 0x00007faebbdc944e in client_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#58 0x00007faebbb95fbb in afr_lock_blocking () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#59 0x00007faebbb96984 in afr_blocking_lock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#60 0x00007faebbb7b5b5 in afr_post_nonblocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#61 0x00007faebbb93b57 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#62 0x00007faebbb9413d in afr_unlock () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#63 0x00007faebbb94775 in afr_nonblocking_inodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#64 0x00007faebbddc664 in client3_1_finodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#65 0x0000003f7740fdd6 in rpc_clnt_submit () from /usr/lib64/libgfrpc.so.0
#66 0x00007faebbdcd8d3 in client_submit_request ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#67 0x00007faebbdd152a in client3_1_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#68 0x00007faebbdc944e in client_finodelk () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
#69 0x00007faebbb9587d in afr_nonblocking_inodelk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#70 0x00007faebbb79900 in afr_lock_rec () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#71 0x00007faebbb7c8b5 in afr_transaction () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#72 0x00007faebbb75333 in afr_do_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#73 0x00007faebbb735a4 in afr_open_fd_fix () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
#74 0x00007faebbb741b0 in afr_writev () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
.
.
.

---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) f 23580
#23580 0x00007faebbb96a8e in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
(gdb) up
#23581 0x00007faebbb96c75 in ?? () from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/cluster/replicate.so
(gdb) 
#23582 0x00007faebbddc664 in client3_1_finodelk_cbk ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/xlator/protocol/client.so
(gdb) 
#23583 0x0000003f7740eea4 in saved_frames_unwind () from /usr/lib64/libgfrpc.so.0
(gdb) 
#23584 0x0000003f7740ef3e in saved_frames_destroy () from /usr/lib64/libgfrpc.so.0
(gdb) 
#23585 0x0000003f7740f4d0 in rpc_clnt_connection_cleanup () from /usr/lib64/libgfrpc.so.0
(gdb) 
#23586 0x0000003f7740f818 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
(gdb) 
#23587 0x0000003f7740b018 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
(gdb) 
#23588 0x00007faec0cb09e4 in socket_event_poll_err ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/rpc-transport/socket.so
(gdb) 
#23589 0x00007faec0cb7a9e in socket_event_handler ()
   from /usr/lib64/glusterfs/3.3.0rhsvirt1/rpc-transport/socket.so
(gdb) 
#23590 0x0000003f7683ee34 in ?? () from /usr/lib64/libglusterfs.so.0
(gdb) 
#23591 0x0000000000407600 in main ()
(gdb) 
Initial frame selected; you cannot go up.
(gdb) 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create, start a replicate volume and reexport the volume via samba
2. On a machine mount the samba export on 2 mountpoints each using separate machine as server
3. start running ping_pong on both the cifs mount points.
4. when ping_pong blocks ( 0 locks/sec and data increment = 1) kill ping_pong, unmount the cifs mount points and stop the volume
  
Actual results:
glusterfs client crashed.

Expected results:
glusterfs client should not crash

Additional info:
gluster v i
 
Volume Name: mirror
Type: Replicate
Volume ID: 1b3cbb41-df00-41f0-b869-00eb03cc330c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.37.136:/export/mirror
Brick2: 10.70.37.147:/export/mirror
Comment 2 Amar Tumballi 2012-12-27 03:46:37 EST
Considering a lot of things are AFR related, assigning to Pranith.

Pranith, do feel free to re-assign if you suspect 'wb_process_queue()' is the culprit. Meantime, asked RaBhat to check if this issue happens after the write-behind cleanup (present in latest 2.0.z branch, and upstream). Once he confirms, lets get resolution of the bug.
Comment 3 Pranith Kumar K 2013-01-07 06:58:30 EST
You are correct Amar, wb_process_queue is calling wb_fulfill and wb_fulfill_cbk is calling wb_process_queue, because of this the C stack never gets a chance to shrink if the fop does not go on the wire. According to afr frames in bt the finodelks are failing and the fop is unwinding. Looking at the client xlator related frames in bt the connections to the bricks are lost.

I saw that same wb_process_queue is present in master's fulfill_cbk as well. So the bug is most-likely present on master as well.

Assigning it to Raghavendra G.
Comment 4 Raghavendra G 2013-02-22 02:28:36 EST

*** This bug has been marked as a duplicate of bug 765473 ***
Comment 5 Amar Tumballi 2013-02-22 02:46:17 EST
reopening as it was marked duplicate of upstream bug.
Comment 6 Amar Tumballi 2013-02-22 02:46:44 EST
on master its fixed with bug 765473's patch.
Comment 7 Scott Haines 2013-03-08 15:56:27 EST
Per 03/05 email exchange w/ PM, targeting for Arches.
Comment 8 Scott Haines 2013-04-11 12:36:53 EDT
Per 04-10-2013 Storage bug triage meeting, targeting for Big Bend.
Comment 9 Sachidananda Urs 2013-08-01 06:35:28 EDT
Verified on: glusterfs 3.4.0.14rhs built on Jul 30 2013 09:09:36

Unable to reproduce the bug. Marking verified.
Comment 10 Scott Haines 2013-09-23 18:29:52 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.