Hide Forgot
Description of problem: smbd crashing which doing brick and volume operations. From /var/log/messages Sep 2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.696476, 0] lib/util.c:1117(smb_panic) Sep 2 10:33:02 dhcp159-136 smbd[348]: PANIC (pid 348): internal error Sep 2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.739896, 0] lib/util.c:1221(log_stack_trace) Sep 2 10:33:02 dhcp159-136 smbd[348]: BACKTRACE: 18 stack frames: Sep 2 10:33:02 dhcp159-136 smbd[348]: #0 smbd(log_stack_trace+0x1a) [0x7fd4493d64fa] Sep 2 10:33:02 dhcp159-136 smbd[348]: #1 smbd(smb_panic+0x2b) [0x7fd4493d65cb] Sep 2 10:33:02 dhcp159-136 smbd[348]: #2 smbd(+0x41a054) [0x7fd4493c7054] Sep 2 10:33:02 dhcp159-136 smbd[348]: #3 /lib64/libc.so.6(+0x3ff1832960) [0x7fd44527f960] Sep 2 10:33:02 dhcp159-136 smbd[348]: #4 /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7fd4439ff220] Sep 2 10:33:02 dhcp159-136 smbd[348]: #5 /usr/lib64/libglusterfs.so.0(iobuf_get2+0x42) [0x7fd4468a5b32] Sep 2 10:33:02 dhcp159-136 smbd[348]: #6 /usr/lib64/libgfapi.so.0(mgmt_submit_request+0x14f) [0x7fd446aeda8f] Sep 2 10:33:02 dhcp159-136 smbd[348]: #7 /usr/lib64/libgfapi.so.0(glfs_volfile_fetch+0x113) [0x7fd446aedc43] Sep 2 10:33:02 dhcp159-136 smbd[348]: #8 /usr/lib64/libgfapi.so.0(mgmt_cbk_spec+0x10) [0x7fd446aede50] Sep 2 10:33:02 dhcp159-136 smbd[348]: #9 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_cbk+0x132) [0x7fd44665fc12] Sep 2 10:33:02 dhcp159-136 smbd[348]: #10 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1b8) [0x7fd446660fa8] Sep 2 10:33:02 dhcp159-136 smbd[348]: #11 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28) [0x7fd44665c838] Sep 2 10:33:02 dhcp159-136 smbd[348]: #12 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0x8be6) [0x7fd43801ebe6] Sep 2 10:33:02 dhcp159-136 smbd[348]: #13 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0xa4fd) [0x7fd4380204fd] Sep 2 10:33:02 dhcp159-136 smbd[348]: #14 /usr/lib64/libglusterfs.so.0(+0x3ff245e8c7) [0x7fd4468c58c7] Sep 2 10:33:02 dhcp159-136 smbd[348]: #15 /usr/lib64/libgfapi.so.0(+0x5834) [0x7fd446aec834] Sep 2 10:33:02 dhcp159-136 smbd[348]: #16 /lib64/libpthread.so.0(+0x3ff2007851) [0x7fd4439fd851] Sep 2 10:33:02 dhcp159-136 smbd[348]: #17 /lib64/libc.so.6(clone+0x6d) [0x7fd44533594d] Sep 2 10:33:02 dhcp159-136 smbd[348]: [2013/09/02 10:33:02.740650, 0] lib/fault.c:372(dump_core) Sep 2 10:33:02 dhcp159-136 smbd[348]: dumping core in /var/log/core Version-Release number of selected component (if applicable): From /var/log/glusterfs/.cmd_log_history [2013-09-02 14:28:42.276446] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b2 10.16.159.16:/rhs/brick3/testvol1-b2 status : SUCCESS [2013-09-02 14:28:52.784146] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b2 10.16.159.16:/rhs/brick3/testvol1-b2 commit : SUCCESS [2013-09-02 14:29:15.537714] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 start : SUCCESS [2013-09-02 14:29:32.078008] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS [2013-09-02 14:29:59.595928] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS [2013-09-02 14:30:02.720132] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 status : SUCCESS [2013-09-02 14:30:24.049064] : v remove-brick testvol1 10.16.159.136:/rhs/brick3/testvol1-b1 10.16.159.16:/rhs/brick3/testvol1-b1 commit : SUCCESS [2013-09-02 14:31:36.814017] : v stop testvol1 : SUCCESS [2013-09-02 14:31:46.613379] : v delete testvol1 : SUCCESS [2013-09-02 14:33:02.697326] : v create testvol2 10.16.159.136:/rhs/brick1/testvol2-b1 10.16.159.16:/rhs/brick1/testvol2-b1 : SUCCESS [2013-09-02 14:33:11.381025] : v start testvol2 : SUCCESS [2013-09-02 14:33:39.581957] : v stop testvol2 : SUCCESS [2013-09-02 14:33:49.287161] : v delete testvol2 : SUCCESS [2013-09-02 14:34:23.763115] : v create testvol3 replica 2 10.16.159.136:/rhs/brick1/testvol3-b1 10.16.159.16:/rhs/brick1/testvol3-b1 : SUCCESS [2013-09-02 14:34:41.278591] : v start testvol3 : SUCCESS [2013-09-02 14:42:13.905074] : v add-brick testvol3 10.16.159.136:/rhs/brick3/testvol3-b2 10.16.159.16:/rhs/brick3/testvol3-b2 : SUCCESS [2013-09-02 14:43:04.893445] : v rebalance testvol3 start : SUCCESS How reproducible: Intermittent Steps to Reproduce: I am not sure which command exactly caused the issue but below are things I was performing on the volume 1. Create a replica 2 volume, start, run IO from Windows client 2. Do couple of add bricks and rebalance (IO running) (add-brick should run after rebalance finished for previous add-brick) 3. do couple of remove brick operation (start, staus->finished, commit) (while IO running) 4. Stop the volume then delete the volume Actual results: smbd should not crash Expected results: Additional info: [root@dhcp159-136 core]# rpm -qa | grep samba samba-doc-3.6.9-160.3.el6rhs.x86_64 samba-debuginfo-3.6.9-160.3.el6rhs.x86_64 samba-winbind-3.6.9-160.3.el6rhs.x86_64 samba-glusterfs-3.6.9-160.3.el6rhs.x86_64 samba-swat-3.6.9-160.3.el6rhs.x86_64 samba-winbind-krb5-locator-3.6.9-160.3.el6rhs.x86_64 samba-domainjoin-gui-3.6.9-160.3.el6rhs.x86_64 samba-common-3.6.9-160.3.el6rhs.x86_64 samba-3.6.9-160.3.el6rhs.x86_64 samba-client-3.6.9-160.3.el6rhs.x86_64 samba-winbind-devel-3.6.9-160.3.el6rhs.x86_64 samba4-libs-4.0.0-55.el6.rc4.x86_64 samba-winbind-clients-3.6.9-160.3.el6rhs.x86_64 [root@dhcp159-136 core]# rpm -qa | grep glusterfs glusterfs-geo-replication-3.4.0.30rhs-2.el6rhs.x86_64 samba-glusterfs-3.6.9-160.3.el6rhs.x86_64 glusterfs-libs-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-server-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-rdma-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-api-3.4.0.30rhs-2.el6rhs.x86_64 glusterfs-fuse-3.4.0.30rhs-2.el6rhs.x86_64
Below error message came in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log [2013-09-02 14:33:37.486818] E [glusterd-utils.c:1337:glusterd_brick_unlink_socket_file] 0-management: Failed to remove /var/run/f0f5ead6df49d75409697344fc14d75b.socket error: No such file or directory [2013-09-02 14:33:38.518344] E [glusterd-utils.c:3797:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/7d84c9af07428fda82993a87b9baed72.socket error: Permission denied
I was also able to hit this issue, from /var/log/messages >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ep 3 18:27:27 redlemon smbd[12960]: From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf Sep 3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.601037, 0] lib/fault.c:51(fault_report) Sep 3 18:27:27 redlemon smbd[12960]: =============================================================== Sep 3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.601146, 0] lib/util.c:1117(smb_panic) Sep 3 18:27:27 redlemon smbd[12960]: PANIC (pid 12960): internal error Sep 3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.603693, 0] lib/util.c:1221(log_stack_trace) Sep 3 18:27:27 redlemon smbd[12960]: BACKTRACE: 18 stack frames: Sep 3 18:27:27 redlemon smbd[12960]: #0 smbd(log_stack_trace+0x1a) [0x7f60edb3f4fa] Sep 3 18:27:27 redlemon smbd[12960]: #1 smbd(smb_panic+0x2b) [0x7f60edb3f5cb] Sep 3 18:27:27 redlemon smbd[12960]: #2 smbd(+0x41a054) [0x7f60edb30054] Sep 3 18:27:27 redlemon smbd[12960]: #3 /lib64/libc.so.6(+0x31cee32920) [0x7f60e99e8920] Sep 3 18:27:27 redlemon smbd[12960]: #4 /lib64/libpthread.so.0(pthread_mutex_lock+0) [0x7f60e8168220] Sep 3 18:27:27 redlemon smbd[12960]: #5 /usr/lib64/libglusterfs.so.0(iobuf_get2+0x42) [0x7f60eb00eb32] Sep 3 18:27:27 redlemon smbd[12960]: #6 /usr/lib64/libgfapi.so.0(mgmt_submit_request+0x14f) [0x7f60eb256a8f] Sep 3 18:27:27 redlemon smbd[12960]: #7 /usr/lib64/libgfapi.so.0(glfs_volfile_fetch+0x113) [0x7f60eb256c43] Sep 3 18:27:27 redlemon smbd[12960]: #8 /usr/lib64/libgfapi.so.0(mgmt_cbk_spec+0x10) [0x7f60eb256e50] Sep 3 18:27:27 redlemon smbd[12960]: #9 /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_cbk+0x132) [0x7f60eadc8c12] Sep 3 18:27:27 redlemon smbd[12960]: #10 /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1b8) [0x7f60eadc9fa8] Sep 3 18:27:27 redlemon smbd[12960]: #11 /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28) [0x7f60eadc5838] Sep 3 18:27:27 redlemon smbd[12960]: #12 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0x8be6) [0x7f60dc580be6] Sep 3 18:27:27 redlemon smbd[12960]: #13 /usr/lib64/glusterfs/3.4.0.30rhs/rpc-transport/socket.so(+0xa4fd) [0x7f60dc5824fd] Sep 3 18:27:27 redlemon smbd[12960]: #14 /usr/lib64/libglusterfs.so.0(+0x3cee45e8c7) [0x7f60eb02e8c7] Sep 3 18:27:27 redlemon smbd[12960]: #15 /usr/lib64/libgfapi.so.0(+0x5834) [0x7f60eb255834] Sep 3 18:27:27 redlemon smbd[12960]: #16 /lib64/libpthread.so.0(+0x31cf607851) [0x7f60e8166851] Sep 3 18:27:27 redlemon smbd[12960]: #17 /lib64/libc.so.6(clone+0x6d) [0x7f60e9a9e90d] Sep 3 18:27:27 redlemon smbd[12960]: [2013/09/03 18:27:27.605142, 0] lib/fault.c:372(dump_core) Sep 3 18:27:27 redlemon smbd[12960]: dumping core in /var/log/core Sep 3 18:27:27 redlemon smbd[12960]: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from cmd_hostory >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2013-09-03 12:34:04.839103] : v geo master ssh://10.70.43.25::slave status detail : SUCCESS [2013-09-03 12:56:15.250958] : v geo master ssh://10.70.43.25::slave stop : SUCCESS [2013-09-03 12:56:25.365787] : v geo master ssh://10.70.43.25::slave delete : SUCCESS [2013-09-03 12:56:35.878855] : v stop master : SUCCESS [2013-09-03 12:56:40.632577] : v delet master : SUCCESS [2013-09-03 12:57:27.723205] : volume create master replica 2 10.70.43.13:/bricks/brick1 10.70.43.18:/bricks/brick2 10.70.43.22:/bricks/brick3 10.70.43.24:/bricks/brick4 : SUCCESS [2013-09-03 12:57:29.637723] : volume start master : SUCCESS [2013-09-03 12:57:38.409823] : volume set master rollover-time 20 : SUCCESS [2013-09-03 12:57:40.433215] : volume set master encoding ascii : SUCCESS [2013-09-03 12:57:44.698822] : volume set master fsync-interval 3 : SUCCESS [2013-09-03 12:58:22.360688] : v geo master ssh://10.70.43.25::slave create force : SUCCESS [2013-09-03 13:16:55.020394] : v geo stat : SUCCESS [2013-09-03 13:17:25.361524] : v geo master ssh://10.70.43.25::slave start : SUCCESS [2013-09-03 13:48:40.418363] : v geo master ssh://10.70.43.25::slave stop : SUCCESS [2013-09-03 13:50:08.599613] : v geo master ssh://10.70.43.25::slave start : SUCCESS [2013-09-03 13:52:09.764168] : v geo master ssh://10.70.43.25::slave stop : SUCCESS >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Posted patch for review at https://code.engineering.redhat.com/gerrit/#/c/12523/
*** Bug 1004417 has been marked as a duplicate of this bug. ***
I am not getting core file any more. Hence marking this as verified. glusterfs-server-3.4.0.31rhs-1 samba-common-3.6.9-160.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html