Bug 1003665

Summary: smbd crashing when doing brick and volume operations on GlusterFS
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Lalatendu Mohanty <lmohanty>
Component: sambaAssignee: Raghavendra Talur <rtalur>
Status: CLOSED ERRATA QA Contact: Lalatendu Mohanty <lmohanty>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 2.1CC: amarts, sdharane, vagarwal, vkoppad
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.31rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:32:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lalatendu Mohanty 2013-09-02 15:33:40 UTC
Description of problem:

smbd crashing which doing brick and volume operations.

From /var/log/messages

Sep  2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.696476,  0] lib/util.c:1117(smb_panic)
Sep  2 10:33:02 dhcp159-136 smbd[348]:   PANIC (pid 348): internal error
Sep  2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.739896,  0] lib/util.c:1221(log_stack_trace)
Sep  2 10:33:02 dhcp159-136 smbd[348]:   BACKTRACE: 18 stack frames:
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #0 smbd(log_stack_trace+0x1a) [0x7fd4493d64fa]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #1 smbd(smb_panic+0x2b) [0x7fd4493d65cb]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #2 smbd(+0x41a054) [0x7fd4493c7054]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #3 /lib64/libc.so.6(+0x3ff1832960) [0x7fd44527f960]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #4 /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7fd4439ff220]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #5 /usr/lib64/libglusterfs.so.0(iobuf_get2+0x42) [0x7fd4468a5b32]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #6 /usr/lib64/libgfapi.so.0(mgmt_submit_request+0x14f) [0x7fd446aeda8f]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #7 /usr/lib64/libgfapi.so.0(glfs_volfile_fetch+0x113) [0x7fd446aedc43]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #8 /usr/lib64/libgfapi.so.0(mgmt_cbk_spec+0x10) [0x7fd446aede50]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #9 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_cbk+0x132) [0x7fd44665fc12]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #10 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1b8) [0x7fd446660fa8]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #11 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28) [0x7fd44665c838]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #12 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0x8be6) [0x7fd43801ebe6]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #13 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0xa4fd) [0x7fd4380204fd]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #14 /usr/lib64/libglusterfs.so.0(+0x3ff245e8c7) [0x7fd4468c58c7]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #15 /usr/lib64/libgfapi.so.0(+0x5834) [0x7fd446aec834]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #16 /lib64/libpthread.so.0(+0x3ff2007851) [0x7fd4439fd851]
Sep  2 10:33:02 dhcp159-136 smbd[348]:    #17 /lib64/libc.so.6(clone+0x6d) [0x7fd44533594d]
Sep  2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.740650,  0] lib/fault.c:372(dump_core)
Sep  2 10:33:02 dhcp159-136 smbd[348]:   dumping core in /var/log/core

Version-Release number of selected component (if applicable):

From /var/log/glusterfs/.cmd_log_history

[2013-09-02 14:28:42.276446]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b2 10.16.159.16:/rhs/brick3/testvol1-b2 status : SUCCESS
[2013-09-02 14:28:52.784146]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b2 10.16.159.16:/rhs/brick3/testvol1-b2 commit : SUCCESS
[2013-09-02 14:29:15.537714]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 start : SUCCESS
[2013-09-02 14:29:32.078008]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS
[2013-09-02 14:29:59.595928]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS
[2013-09-02 14:30:02.720132]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS
[2013-09-02 14:30:24.049064]  : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 commit : SUCCESS
[2013-09-02 14:31:36.814017]  : v stop testvol1 : SUCCESS
[2013-09-02 14:31:46.613379]  : v delete testvol1 : SUCCESS
[2013-09-02 14:33:02.697326]  : v create testvol2 10.16.159.136:/rhs/brick1/testvol2-b1 10.16.159.16:/rhs/brick1/testvol2-b1 : SUCCESS
[2013-09-02 14:33:11.381025]  : v start testvol2 : SUCCESS
[2013-09-02 14:33:39.581957]  : v stop testvol2 : SUCCESS
[2013-09-02 14:33:49.287161]  : v delete testvol2 : SUCCESS
[2013-09-02 14:34:23.763115]  : v create testvol3 replica 2 10.16.159.136:/rhs/brick1/testvol3-b1 10.16.159.16:/rhs/brick1/testvol3-b1 : SUCCESS
[2013-09-02 14:34:41.278591]  : v start testvol3 : SUCCESS
[2013-09-02 14:42:13.905074]  : v add-brick testvol3 10.16.159.136:/rhs/brick3/testvol3-b2 10.16.159.16:/rhs/brick3/testvol3-b2 : SUCCESS
[2013-09-02 14:43:04.893445]  : v rebalance testvol3 start : SUCCESS

How reproducible:

Intermittent 

Steps to Reproduce:

I am not sure which command exactly caused the issue but below are things I was performing on the volume

1. Create a replica 2 volume, start, run IO from Windows client
2. Do couple of add bricks and rebalance (IO running) (add-brick should run after rebalance finished for previous add-brick)
3. do couple of remove brick operation (start, staus->finished, commit) (while IO running)
4. Stop the volume then delete the volume


Actual results:

smbd should not crash

Expected results:


Additional info:

[root@dhcp159-136 core]# rpm -qa | grep samba
samba-doc-3.6.9-160.3.el6rhs.x86_64
samba-debuginfo-3.6.9-160.3.el6rhs.x86_64
samba-winbind-3.6.9-160.3.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
samba-swat-3.6.9-160.3.el6rhs.x86_64
samba-winbind-krb5-locator-3.6.9-160.3.el6rhs.x86_64
samba-domainjoin-gui-3.6.9-160.3.el6rhs.x86_64
samba-common-3.6.9-160.3.el6rhs.x86_64
samba-3.6.9-160.3.el6rhs.x86_64
samba-client-3.6.9-160.3.el6rhs.x86_64
samba-winbind-devel-3.6.9-160.3.el6rhs.x86_64
samba4-libs-4.0.0-55.el6.rc4.x86_64
samba-winbind-clients-3.6.9-160.3.el6rhs.x86_64


[root@dhcp159-136 core]# rpm -qa | grep glusterfs
glusterfs-geo-replication-3.4.0.30rhs-2.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
glusterfs-libs-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-server-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-rdma-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-api-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-fuse-3.4.0.30rhs-2.el6rhs.x86_64

Comment 1 Lalatendu Mohanty 2013-09-02 15:44:30 UTC
Below error message came in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log

[2013-09-02 14:33:37.486818] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/f0f5ead6df49d75409697344fc14d75b.socket error: No such file or directory
[2013-09-02 14:33:38.518344] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/7d84c9af07428fda82993a87b9baed72.socket error: Permission denied

Comment 3 Vijaykumar Koppad 2013-09-03 14:22:44 UTC
I was also able to hit this issue, 

from /var/log/messages 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ep  3 18:27:27 redlemon smbd[12960]:   From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
Sep  3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.601037,  0] lib/fault.c:51(fault_report)
Sep  3 18:27:27 redlemon smbd[12960]:   ===============================================================
Sep  3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.601146,  0] lib/util.c:1117(smb_panic)
Sep  3 18:27:27 redlemon smbd[12960]:   PANIC (pid 12960): internal error
Sep  3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.603693,  0] lib/util.c:1221(log_stack_trace)
Sep  3 18:27:27 redlemon smbd[12960]:   BACKTRACE: 18 stack frames:
Sep  3 18:27:27 redlemon smbd[12960]:    #0 smbd(log_stack_trace+0x1a) [0x7f60edb3f4fa]
Sep  3 18:27:27 redlemon smbd[12960]:    #1 smbd(smb_panic+0x2b) [0x7f60edb3f5cb]
Sep  3 18:27:27 redlemon smbd[12960]:    #2 smbd(+0x41a054) [0x7f60edb30054]
Sep  3 18:27:27 redlemon smbd[12960]:    #3 /lib64/libc.so.6(+0x31cee32920) [0x7f60e99e8920]
Sep  3 18:27:27 redlemon smbd[12960]:    #4 /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7f60e8168220]
Sep  3 18:27:27 redlemon smbd[12960]:    #5 /usr/lib64/libglusterfs.so.0(iobuf_get2+0x42) [0x7f60eb00eb32]
Sep  3 18:27:27 redlemon smbd[12960]:    #6 /usr/lib64/libgfapi.so.0(mgmt_submit_request+0x14f) [0x7f60eb256a8f]
Sep  3 18:27:27 redlemon smbd[12960]:    #7 /usr/lib64/libgfapi.so.0(glfs_volfile_fetch+0x113) [0x7f60eb256c43]
Sep  3 18:27:27 redlemon smbd[12960]:    #8 /usr/lib64/libgfapi.so.0(mgmt_cbk_spec+0x10) [0x7f60eb256e50]
Sep  3 18:27:27 redlemon smbd[12960]:    #9 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_cbk+0x132) [0x7f60eadc8c12]
Sep  3 18:27:27 redlemon smbd[12960]:    #10 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1b8) [0x7f60eadc9fa8]
Sep  3 18:27:27 redlemon smbd[12960]:    #11 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28) [0x7f60eadc5838]
Sep  3 18:27:27 redlemon smbd[12960]:    #12 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0x8be6) [0x7f60dc580be6]
Sep  3 18:27:27 redlemon smbd[12960]:    #13 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0xa4fd) [0x7f60dc5824fd]
Sep  3 18:27:27 redlemon smbd[12960]:    #14 /usr/lib64/libglusterfs.so.0(+0x3cee45e8c7) [0x7f60eb02e8c7]
Sep  3 18:27:27 redlemon smbd[12960]:    #15 /usr/lib64/libgfapi.so.0(+0x5834) [0x7f60eb255834]
Sep  3 18:27:27 redlemon smbd[12960]:    #16 /lib64/libpthread.so.0(+0x31cf607851) [0x7f60e8166851]
Sep  3 18:27:27 redlemon smbd[12960]:    #17 /lib64/libc.so.6(clone+0x6d) [0x7f60e9a9e90d]
Sep  3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.605142,  0] lib/fault.c:372(dump_core)
Sep  3 18:27:27 redlemon smbd[12960]:   dumping core in /var/log/core
Sep  3 18:27:27 redlemon smbd[12960]: 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

from cmd_hostory 

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2013-09-03 12:34:04.839103]  : v geo master ssh://10.70.43.25::slave status detail : SUCCESS
[2013-09-03 12:56:15.250958]  : v geo master ssh://10.70.43.25::slave stop : SUCCESS
[2013-09-03 12:56:25.365787]  : v geo master ssh://10.70.43.25::slave delete : SUCCESS
[2013-09-03 12:56:35.878855]  : v stop master : SUCCESS
[2013-09-03 12:56:40.632577]  : v delet master : SUCCESS
[2013-09-03 12:57:27.723205]  : volume create master replica 2 10.70.43.13:/bricks/brick1 10.70.43.18:/bricks/brick2 10.70.43.22:/bricks/brick3 10.70.43.24:/bricks/brick4 : SUCCESS
[2013-09-03 12:57:29.637723]  : volume start master : SUCCESS
[2013-09-03 12:57:38.409823]  : volume set master rollover-time 20 : SUCCESS
[2013-09-03 12:57:40.433215]  : volume set master encoding ascii : SUCCESS
[2013-09-03 12:57:44.698822]  : volume set master fsync-interval 3 : SUCCESS
[2013-09-03 12:58:22.360688]  : v geo master ssh://10.70.43.25::slave create force : SUCCESS
[2013-09-03 13:16:55.020394]  : v geo stat : SUCCESS
[2013-09-03 13:17:25.361524]  : v geo master ssh://10.70.43.25::slave start : SUCCESS
[2013-09-03 13:48:40.418363]  : v geo master ssh://10.70.43.25::slave stop : SUCCESS
[2013-09-03 13:50:08.599613]  : v geo master ssh://10.70.43.25::slave start : SUCCESS
[2013-09-03 13:52:09.764168]  : v geo master ssh://10.70.43.25::slave stop : SUCCESS
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 4 Raghavendra Talur 2013-09-05 11:19:29 UTC
Posted patch for review at
https://code.engineering.redhat.com/gerrit/#/c/12523/

Comment 5 Vivek Agarwal 2013-09-05 16:58:06 UTC
*** Bug 1004417 has been marked as a duplicate of this bug. ***

Comment 6 Lalatendu Mohanty 2013-09-06 09:32:28 UTC
I am not getting core file any more. Hence marking this as verified.

glusterfs-server-3.4.0.31rhs-1
samba-common-3.6.9-160.3

Comment 7 Scott Haines 2013-09-23 22:32:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html