Bug 1335584 - SAMBA CRASH : Multiple smbd crash while performing VSS functionality in windows client
Summary: SAMBA CRASH : Multiple smbd crash while performing VSS functionality in windo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: samba
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.1.3
Assignee: rjoseph
QA Contact: Vivek Das
URL:
Whiteboard:
Depends On:
Blocks: 1311817
TreeView+ depends on / blocked
 
Reported: 2016-05-12 14:45 UTC by Vivek Das
Modified: 2016-06-23 05:37 UTC (History)
6 users (show)

Fixed In Version: samba-4.4.3-4.el7rhgs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-23 05:37:08 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1245 0 normal SHIPPED_LIVE gluster-smb bug fix and enhancement update 2016-06-23 09:13:06 UTC
Samba Project 11844 0 None None None 2016-05-17 12:31:51 UTC

Description Vivek Das 2016-05-12 14:45:54 UTC
Description of problem:

With all the vss plugin enabled in Samba and all uss related volume options set.Took a snapshot while playing a video file in windows client mount. Over the windows mount stopped the video file and deleted the video file from the mount. Now again i copied back the video file to the mount and started playing it.
Now while the video was playing i tried to do a restore to which windows responded that cant restore as the file is in use.
On the server i found multiple smbd core dump files in "/var/log/samba/cores/smbd/ " (9 files) all within a minutes time.

Version-Release number of selected component (if applicable):
samba-4.4.3-2.el7rhgs.x86_64
glusterfs-3.7.9-4.el7rhgs.x86_64
WINDOWS 8.1

How reproducible:
Reporting on the first encounter

Steps to Reproduce:
1.vss plugin enabled in Samba and all uss related volume options set
2.Copy a video file (say V1.mp4) to the mount point in windows client and start playing
3.Take a snapshot
4.On the client stop the video file (V1.mp4) and delete the file.
5.Copy back V1.mp4 to the mount and start playing
6.Right click over the mount and try to restore
7.Back in server check for core dump

Actual results:
smbd crashed with multiple cores and with different messages

Expected results:
Should never crash

Additional info:

#5  0x00007f9baf82ff98 in smbd_smb2_oplock_break_send (in_oplock_level=1 '\001', fsp=0x7f9bb0f9c3c0, 
    smb2req=0x7f9bb10009d0, ev=0x7f9bb0c44df0, mem_ctx=0x7f9bb10009d0) at ../source3/smbd/smb2_break.c:209
#6  smbd_smb2_request_process_break (req=req@entry=0x7f9bb10009d0) at ../source3/smbd/smb2_break.c:86
#7  0x00007f9baf81823c in smbd_smb2_request_dispatch (req=req@entry=0x7f9bb10009d0)
    at ../source3/smbd/smb2_server.c:2545
#8  0x00007f9baf819702 in smbd_smb2_io_handler (fde_flags=<optimized out>, xconn=0x7f9bb0c69df0)
    at ../source3/smbd/smb2_server.c:3698
#9  smbd_smb2_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, 
    private_data=<optimized out>) at ../source3/smbd/smb2_server.c:3736
#10 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, 
    num_pfds=6) at ../source3/lib/events.c:257
#11 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>)
    at ../source3/lib/events.c:326
#12 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#13 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#14 0x00007f9baf806671 in smbd_process (ev_ctx=ev_ctx@entry=0x7f9bb0c44df0, 
    msg_ctx=msg_ctx@entry=0x7f9bb0c44ee0, sock_fd=sock_fd@entry=39, interactive=interactive@entry=false)
    at ../source3/smbd/process.c:4117
#15 0x00007f9bb02f0304 in smbd_accept_connection (ev=0x7f9bb0c44df0, fde=<optimized out>, 
    flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#16 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, 
    num_pfds=7) at ../source3/lib/events.c:257
#17 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>)
    at ../source3/lib/events.c:326
#18 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#19 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#20 0x00007f9bb02ebad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f9bb0c44df0)
    at ../source3/smbd/server.c:1127
#21 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780


================================

#5  0x00007f9baf7c1a98 in reply_lockingX (req=req@entry=0x7f9bb0d105a0) at ../source3/smbd/reply.c:8261
#6  0x00007f9baf801e90 in switch_message (type=<optimized out>, req=req@entry=0x7f9bb0d105a0)
    at ../source3/smbd/process.c:1726
#7  0x00007f9baf803c30 in construct_reply (deferred_pcd=0x0, encrypted=false, seqnum=0, unread_bytes=0, 
    size=55, inbuf=0x0, xconn=0x7f9bb0c62f80) at ../source3/smbd/process.c:1762
#8  process_smb (xconn=xconn@entry=0x7f9bb0c69df0, inbuf=<optimized out>, nread=55, unread_bytes=0, seqnum=0, 
    encrypted=<optimized out>, deferred_pcd=deferred_pcd@entry=0x0) at ../source3/smbd/process.c:2008
#9  0x00007f9baf805321 in smbd_server_connection_read_handler (xconn=0x7f9bb0c69df0, fd=39)
    at ../source3/smbd/process.c:2607
#10 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, 
    num_pfds=5) at ../source3/lib/events.c:257
#11 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>)
    at ../source3/lib/events.c:326
#12 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#13 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#14 0x00007f9baf806671 in smbd_process (ev_ctx=ev_ctx@entry=0x7f9bb0c44df0, 
    msg_ctx=msg_ctx@entry=0x7f9bb0c44ee0, sock_fd=sock_fd@entry=39, interactive=interactive@entry=false)
    at ../source3/smbd/process.c:4117
#15 0x00007f9bb02f0304 in smbd_accept_connection (ev=0x7f9bb0c44df0, fde=<optimized out>, 
    flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#16 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, 
    num_pfds=7) at ../source3/lib/events.c:257
#17 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>)
    at ../source3/lib/events.c:326
#18 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#19 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#20 0x00007f9bb02ebad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f9bb0c44df0)
    at ../source3/smbd/server.c:1127
#21 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780

Comment 3 Vivek Das 2016-05-12 15:22:42 UTC
Time of test execution
-----------------------
Thu May 12 19:16:48 UTC 2016

Comment 4 rjoseph 2016-05-13 13:12:31 UTC
Analyzed core.15119 and core.9719 both looks similar. 

core.15119 

In function "smbd_smb2_oplock_break_send" call to "downgrade_oplock" is failing. After the op lock was downgraded "update_num_read_oplocks" function is called which in-turn calls "brl_get_locks_readonly" function. This function is failing with following log msg:

"Could not parse byte range lock record: NT_STATUS_OBJECT_NAME_NOT_FOUND"

Failure of "downgrade_oplock" leads to smb_panic.


core.9719

Following is the bt from this crash:
#0  0x00007f9babe0d5f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f9babe0ece8 in __GI_abort () at abort.c:90
#2  0x00007f9bad76dbeb in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f9bad760fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f9bafc5257f in smb_panic (why=why@entry=0x7f9baf970be0 "assert failed: d->num_share_modes == 1") at ../lib/util/fault.c:166
#5  0x00007f9baf841489 in update_num_read_oplocks (fsp=fsp@entry=0x7f9bb0d0a380, lck=lck@entry=0x7f9bb0c6c0c0) at ../source3/smbd/oplock.c:193
#6  0x00007f9baf7e63c6 in grant_fsp_oplock_type (lease=0x0, oplock_request=2, lck=0x7f9bb0c6c0c0, fsp=0x7f9bb0d0a380, req=0xc0000017) at ../source3/smbd/open.c:1868
#7  open_file_ntcreate (conn=conn@entry=0x7f9bb0c769c0, req=req@entry=0x7f9bb0c6f7a0, access_mask=access_mask@entry=1180063, share_access=share_access@entry=1, create_disposition=create_disposition@entry=1, 
    create_options=create_options@entry=64, new_dos_attributes=new_dos_attributes@entry=0, oplock_request=oplock_request@entry=2, lease=lease@entry=0x0, private_flags=private_flags@entry=0, 
    pinfo=pinfo@entry=0x7fff6e103d88, fsp=fsp@entry=0x7f9bb0d0a380) at ../source3/smbd/open.c:3147
#8  0x00007f9baf7e8e29 in create_file_unixpath (conn=conn@entry=0x7f9bb0c769c0, req=req@entry=0x7f9bb0c6f7a0, smb_fname=smb_fname@entry=0x7f9bb0d0ce50, access_mask=access_mask@entry=1180063, 
    share_access=share_access@entry=1, create_disposition=create_disposition@entry=1, create_options=create_options@entry=64, file_attributes=file_attributes@entry=0, oplock_request=oplock_request@entry=2, 
    lease=lease@entry=0x0, allocation_size=allocation_size@entry=0, private_flags=private_flags@entry=0, sd=sd@entry=0x0, ea_list=ea_list@entry=0x0, result=result@entry=0x7fff6e103ec0, 
    pinfo=pinfo@entry=0x7fff6e103ebc) at ../source3/smbd/open.c:4654
#9  0x00007f9baf7e9e7f in create_file_default (conn=0x7f9bb0c769c0, req=0x7f9bb0c6f7a0, root_dir_fid=<optimized out>, smb_fname=0x7f9bb0d0ce50, access_mask=1180063, share_access=1, 
    create_disposition=create_disposition@entry=1, create_options=create_options@entry=64, file_attributes=file_attributes@entry=0, oplock_request=oplock_request@entry=2, lease=lease@entry=0x0, 
    allocation_size=allocation_size@entry=0, private_flags=private_flags@entry=0, sd=sd@entry=0x0, ea_list=ea_list@entry=0x0, result=result@entry=0x7fff6e104168, pinfo=pinfo@entry=0x7fff6e10414c, 
    in_context_blobs=in_context_blobs@entry=0x7fff6e104190, out_context_blobs=out_context_blobs@entry=0x7f9bb0c6f6b0) at ../source3/smbd/open.c:5063
#10 0x00007f9baf8cac1e in vfswrap_create_file (handle=<optimized out>, req=<optimized out>, root_dir_fid=<optimized out>, smb_fname=<optimized out>, access_mask=<optimized out>, share_access=<optimized out>, 
    create_disposition=1, create_options=64, file_attributes=0, oplock_request=2, lease=0x0, allocation_size=0, private_flags=0, sd=0x0, ea_list=0x0, result=0x7fff6e104168, pinfo=0x7fff6e10414c, 
    in_context_blobs=0x7fff6e104190, out_context_blobs=0x7f9bb0c6f6b0) at ../source3/modules/vfs_default.c:570
#11 0x00007f9baf7f0b28 in smb_vfs_call_create_file (handle=<optimized out>, req=req@entry=0x7f9bb0c6f7a0, root_dir_fid=root_dir_fid@entry=0, smb_fname=<optimized out>, access_mask=access_mask@entry=1180063, 
    share_access=share_access@entry=1, create_disposition=create_disposition@entry=1, create_options=create_options@entry=64, file_attributes=file_attributes@entry=0, oplock_request=2, lease=lease@entry=0x0, 
    allocation_size=allocation_size@entry=0, private_flags=private_flags@entry=0, sd=sd@entry=0x0, ea_list=ea_list@entry=0x0, result=result@entry=0x7fff6e104168, pinfo=pinfo@entry=0x7fff6e10414c, 
    in_context_blobs=in_context_blobs@entry=0x7fff6e104190, out_context_blobs=out_context_blobs@entry=0x7f9bb0c6f6b0) at ../source3/smbd/vfs.c:1556
#12 0x00007f9baf821eb1 in smbd_smb2_create_send (in_context_blobs=..., in_name=0x7f9bb0c6ed60 "OyeOye\\Courage The Cowardly Dog - King Ramses Curse, Episode 7 The Clutching Foot I Full HD 1080p.mp4", 
    in_create_options=64, in_create_disposition=<optimized out>, in_share_access=1, in_file_attributes=0, in_desired_access=1180063, in_impersonation_level=2, in_oplock_level=9 '\t', smb2req=0x7f9bb0c6e890, 
    ev=0x7f9bb0c44df0, mem_ctx=0x7f9bb0c6e890) at ../source3/smbd/smb2_create.c:1145
#13 smbd_smb2_request_process_create (smb2req=smb2req@entry=0x7f9bb0c6e890) at ../source3/smbd/smb2_create.c:227
#14 0x00007f9baf817f5c in smbd_smb2_request_dispatch (req=req@entry=0x7f9bb0c6e890) at ../source3/smbd/smb2_server.c:2456
#15 0x00007f9baf819702 in smbd_smb2_io_handler (fde_flags=<optimized out>, xconn=0x7f9bb0c6b520) at ../source3/smbd/smb2_server.c:3698
#16 smbd_smb2_connection_handler (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/smb2_server.c:3736
#17 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, num_pfds=5) at ../source3/lib/events.c:257
#18 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>) at ../source3/lib/events.c:326
#19 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#20 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#21 0x00007f9baf806671 in smbd_process (ev_ctx=ev_ctx@entry=0x7f9bb0c44df0, msg_ctx=msg_ctx@entry=0x7f9bb0c44ee0, sock_fd=sock_fd@entry=39, interactive=interactive@entry=false) at ../source3/smbd/process.c:4117
#22 0x00007f9bb02f0304 in smbd_accept_connection (ev=0x7f9bb0c44df0, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at ../source3/smbd/server.c:762
#23 0x00007f9bad77639c in run_events_poll (ev=0x7f9bb0c44df0, pollrtn=<optimized out>, pfds=0x7f9bb0c695a0, num_pfds=7) at ../source3/lib/events.c:257
#24 0x00007f9bad7765f0 in s3_event_loop_once (ev=0x7f9bb0c44df0, location=<optimized out>) at ../source3/lib/events.c:326
#25 0x00007f9bac19d40d in _tevent_loop_once () from /lib64/libtevent.so.0
#26 0x00007f9bac19d5ab in tevent_common_loop_wait () from /lib64/libtevent.so.0
#27 0x00007f9bb02ebad4 in smbd_parent_loop (parent=<optimized out>, ev_ctx=0x7f9bb0c44df0) at ../source3/smbd/server.c:1127
#28 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1780

This function also fails in "update_num_read_oplocks" function. Its an assertion failure at the following line:
SMB_ASSERT(d->num_share_modes == 1);

where d->num_share_modes was 2. This crash happened because as mentioned in analysis of core.15119 "downgrade_oplock" function downgraded the lock but could not clear the entry from the share_modes list.

Comment 5 rjoseph 2016-05-13 13:12:59 UTC
This bug looks similar to following upstream samba bug:
https://bugzilla.samba.org/show_bug.cgi?id=11844

Comment 6 Vivek Das 2016-05-13 13:36:58 UTC
This might add to the present scenario.

While running a dd command, performed graph change once and found below error messages : 

[2016/05/13 18:57:07.562408,  0] ../source3/smbd/oplock.c:260(remove_oplock)
  remove_oplock: update_num_read_oplocks failed for file file.dd, fnum 25238, 4e743b9d:8f38f3db5eefe080:0
[2016/05/13 18:57:07.563145,  1] ../source3/lib/dbwrap/dbwrap_watch.c:349(dbwrap_watch_record_stored)
  dbwrap_watch_record_stored: dbwrap_parse_record failed: NT_STATUS_OBJECT_NAME_NOT_FOUND
[2016/05/13 18:57:07.563199,  2] ../source3/smbd/close.c:478(close_remove_share_mode)
  close_remove_share_mode: removing kernel flock for file.dd failed: Function not implemented

Comment 7 Michael Adam 2016-05-15 21:22:49 UTC
(In reply to rjoseph from comment #5)
> This bug looks similar to following upstream samba bug:
> https://bugzilla.samba.org/show_bug.cgi?id=11844

Thanks for the analysis, Rajesh. This seems to be spot-on.
We need the patch from samba 11844 anyways. It is critical
and likely to fix this issue!

It has already recently been pushed to samba 4.4 upstream.

Comment 8 Michael Adam 2016-05-15 21:27:28 UTC
(In reply to Vivek Das from comment #6)
> This might add to the present scenario.
> 
> While running a dd command, performed graph change once and found below
> error messages : 
> 
> [2016/05/13 18:57:07.562408,  0] ../source3/smbd/oplock.c:260(remove_oplock)
>   remove_oplock: update_num_read_oplocks failed for file file.dd, fnum
> 25238, 4e743b9d:8f38f3db5eefe080:0
> [2016/05/13 18:57:07.563145,  1]
> ../source3/lib/dbwrap/dbwrap_watch.c:349(dbwrap_watch_record_stored)
>   dbwrap_watch_record_stored: dbwrap_parse_record failed:
> NT_STATUS_OBJECT_NAME_NOT_FOUND

Yeah, this is the one from Rajesh's analysis.
Thanks for quoting these important lines here!

> [2016/05/13 18:57:07.563199,  2]
> ../source3/smbd/close.c:478(close_remove_share_mode)
>   close_remove_share_mode: removing kernel flock for file.dd failed:
> Function not implemented

This is also an interesting observation, Vivek!
Not sure this is critical (at least for this bug),
but I am preparing an upstream patch to avoid this.
This is also apparently not new in the code.

Comment 9 Michael Adam 2016-05-15 21:36:54 UTC
(In reply to Michael Adam from comment #8)
> (In reply to Vivek Das from comment #6)
> > 
> > [2016/05/13 18:57:07.563199,  2]
> > ../source3/smbd/close.c:478(close_remove_share_mode)
> >   close_remove_share_mode: removing kernel flock for file.dd failed:
> > Function not implemented
> 
> This is also an interesting observation, Vivek!
> Not sure this is critical (at least for this bug),
> but I am preparing an upstream patch to avoid this.
> This is also apparently not new in the code.

Created upstream bug for this:

https://bugzilla.samba.org/show_bug.cgi?id=11919

Comment 11 rjoseph 2016-05-17 08:50:42 UTC
As per comment 7 moving the bug to POST

Comment 15 Vivek Das 2016-05-24 11:09:40 UTC
Verified with the above steps i.e
Steps to Reproduce:
1.vss plugin enabled in Samba and all uss related volume options set
2.Copy a video file (say V1.mp4) to the mount point in windows client and start playing
3.Take a snapshot
4.On the client stop the video file (V1.mp4) and delete the file.
5.Copy back V1.mp4 to the mount and start playing
6.Right click over the mount and try to restore
7.Back in server check for core dump

Multiple client operations tests where also ran over the build version 
glusterfs-cli-3.7.9-5.el7rhgs.x86_64
samba-client-4.4.3-4.el7rhgs.x86_64

Did not see any crash , hence marking it as verified.

Comment 17 errata-xmlrpc 2016-06-23 05:37:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1245


Note You need to log in before you can comment on or make changes to this bug.