1000545 – Add brick operation is causing one of the smbd process in server to crash

Bug 1000545 - Add brick operation is causing one of the smbd process in server to crash

Summary: Add brick operation is causing one of the smbd process in server to crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	samba
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Talur
QA Contact:	Lalatendu Mohanty
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1002577
TreeView+	depends on / blocked

Reported:	2013-08-23 15:47 UTC by Lalatendu Mohanty
Modified:	2013-09-23 22:32 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.29rhs-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1002577 (view as bug list)
Environment:
Last Closed:	2013-09-23 22:32:16 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Lalatendu Mohanty 2013-08-23 15:47:43 UTC

Description of problem:

Without "service smb restart", samba share is not accessible from win7 client post add brick and rebalance.  

But during the time it was inaccessible from win7 client, I could access it/volume from fuse, nfs, and linux-cifs mount.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.22rhs-2.el6rhs.x86_64

How reproducible:

always (I could reproduce it twice)

Steps to Reproduce:
1. Create a 1X2 volume
2. Mount it on windows 
3. create some files on it
4. Add two bricks
   gluster v add-brick testvol 10.70.37.131:/rhs/brick1/testvol-b1 10.70.37.149:/rhs/brick1/testvol-b1
5. start rebalance 
gluster volume rebalance testvol start
6. wait till it is completed
   gluster volume rebalance testvol status
7. try to access the smaba share from windows. <- it wont be accessible but should be
8. Now try to access it from linux cifs mount <- it would be accessible
9. restart smb service and again check the access



Actual results:

After rebalance successfully completed It was not accessible form Win7 client

Expected results:

It should be accessible form Windows clients.

Additional info:
I only tested it for replicated volume, need to check it for simple dht volume

Comment 2 Lalatendu Mohanty 2013-08-27 09:48:41 UTC

We again checked and found that core files are getting generated, just after Add brick operation.

Below log messages are copied from /var/log/messages for the core files getting generated.

Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:   From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]: [2013/08/27 14:40:15.207660,  0] lib/fault.c:51(fault_report)
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:   ===============================================================
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]: [2013/08/27 14:40:15.208137,  0] lib/util.c:1117(smb_panic)
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:   PANIC (pid 8629): internal error
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]: [2013/08/27 14:40:15.212885,  0] lib/util.c:1221(log_stack_trace)
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:   BACKTRACE: 26 stack frames:
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #0 smbd(log_stack_trace+0x1a) [0x7f622b7de58a]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #1 smbd(smb_panic+0x2b) [0x7f622b7de65b]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #2 smbd(+0x41a0e4) [0x7f622b7cf0e4]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #3 /lib64/libc.so.6(+0x39dda32920) [0x7f6227688920]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #4 /usr/lib64/libgfapi.so.0(glfs_refresh_inode_safe+0x59) [0x7f6228efc609]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #5 /usr/lib64/libgfapi.so.0(glfs_migrate_fd_safe+0x80) [0x7f6228efc7c0]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #6 /usr/lib64/libgfapi.so.0(__glfs_migrate_fd+0x46) [0x7f6228efcc06]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #7 /usr/lib64/libgfapi.so.0(__glfs_resolve_fd+0x58) [0x7f6228efcd78]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #8 /usr/lib64/libgfapi.so.0(glfs_resolve_fd+0x73) [0x7f6228efd3e3]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #9 /usr/lib64/libgfapi.so.0(glfs_close+0x62) [0x7f6228efb4e2]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #10 smbd(fd_close+0x43) [0x7f622b50f383]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #11 smbd(+0x15f1f2) [0x7f622b5141f2]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #12 smbd(close_file+0x44d) [0x7f622b51579d]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #13 smbd(reply_close+0x8b) [0x7f622b4e0a8b]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #14 smbd(+0x178414) [0x7f622b52d414]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #15 smbd(+0x17882b) [0x7f622b52d82b]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #16 smbd(+0x178c45) [0x7f622b52dc45]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #17 smbd(run_events_poll+0x377) [0x7f622b7ed907]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #18 smbd(smbd_process+0x86d) [0x7f622b52b78d]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #19 smbd(+0x69502f) [0x7f622ba4a02f]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #20 smbd(run_events_poll+0x377) [0x7f622b7ed907]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #21 smbd(+0x438dbf) [0x7f622b7eddbf]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #22 /usr/lib64/libtevent.so.0(_tevent_loop_once+0x9d) [0x7f622802149d]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #23 smbd(main+0xf3b) [0x7f622ba4b32b]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #24 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f6227674cdd]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:    #25 smbd(+0xf4b99) [0x7f622b4a9b99]
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]: [2013/08/27 14:40:15.215018,  0] lib/fault.c:372(dump_core)
Aug 27 14:40:15 bvt-rhs1 GlusterFS[8629]:   dumping core in /var/log/core

Comment 3 Lalatendu Mohanty 2013-08-27 10:01:37 UTC

Renaming the bug summary as "Add brick operation is causing smbd precess to crash" also below are more information to my last comment.

Below are the operations can be done to reproduce the issue

Create the volume

1. gluster volume create test-vol-3 replica 2 10.70.37.93:/rhs/brick1/test-vol-3-b1 10.70.37.136:/rhs/brick1/testvol-3-b1
2. gluster v start test-vol-3

Below few steps i.e. 3 to 6 are for AD integration

3.mount -t glusterfs -o acl 10.70.37.93:/test-vol-3 /mnt/testvol/
4. mkdir /mnt/testvol/rhs-smb-test-vol-3
5. chgrp "domain users" /mnt/testvol/rhs-smb-test-vol-3
6. chmod 770 /mnt/testvol/rhs-smb-test-vol-3


Set "stat-prefetch off" as it is the recommended settings 
7. gluster volume set test-vol-3 stat-prefetch off

8. Create files on the share using win7Client

Add Brick

9. gluster v add-brick test-vol-3 10.70.37.131:/rhs/brick1/test-vol-3-b1 10.70.37.149:/rhs/brick1/test-vol-3-b1

step-9 , causes one of the smbd process to crash in the samba server and creates the core file.

This in turn causes  samba share not accessible from the win7client.

Comment 4 surabhi 2013-08-27 11:10:35 UTC

Saw the same issue on Linux cifs client.
To reproduce the issue : 
Create a dis-rep volume
Do a cifs mount on linux client.
Create files on mount point.
Do add brick on the volume
The core will get generated with add-brick operation.

Comment 5 Christopher R. Hertel 2013-08-28 02:37:45 UTC

A work-around is available, so this is not likely a blocker.

Note that much of the stack trace shown above is heavily invested in libgfapi.  I don't know the internals of libgfapi, but based on the stack trace it would appear that this particular crash occurred when the vfs module called glfs_close(), which then needs to traverse a set of functions to resolve the Gluster file descriptor to "real" file descriptors in the underlying XFS file system(s).  My guess is that file descriptor resolution is producing bad values.

 #4 /usr/lib64/libgfapi.so.0(glfs_refresh_inode_safe+0x59) [0x7f6228efc609]
 #5 /usr/lib64/libgfapi.so.0(glfs_migrate_fd_safe+0x80) [0x7f6228efc7c0]
 #6 /usr/lib64/libgfapi.so.0(__glfs_migrate_fd+0x46) [0x7f6228efcc06]
 #7 /usr/lib64/libgfapi.so.0(__glfs_resolve_fd+0x58) [0x7f6228efcd78]
 #8 /usr/lib64/libgfapi.so.0(glfs_resolve_fd+0x73) [0x7f6228efd3e3]
 #9 /usr/lib64/libgfapi.so.0(glfs_close+0x62) [0x7f6228efb4e2]
#10 smbd(fd_close+0x43) [0x7f622b50f383]

Samba (smbd) is trying to close an file handle, and the vfs module passes the request to glfs_close(), which needs to resolve the fd...but (at a guess) the resolution is getting messed up somehow.

Note that the failure sequence is the same as given in Bug #1001614, which may be a duplicate.

Comment 6 Raghavendra Talur 2013-08-29 20:16:06 UTC

Patch posted for review at 
https://code.engineering.redhat.com/gerrit/#/c/12214/

Comment 7 Lalatendu Mohanty 2013-09-03 10:06:47 UTC

I am not seeing this crash any-more with latest gluster packages i.e. glusterfs-server-3.4.0.30rhs-2.el6rhs.x86_64.

Hence Marking this bug verified

Comment 8 Scott Haines 2013-09-23 22:32:16 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.