Bug 1716440 - SMBD thread panics when connected to from OS X machine
Summary: SMBD thread panics when connected to from OS X machine
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: GlusterFS
Classification: Community
Component: gluster-smb
Version: 6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Anoop C S
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-03 14:06 UTC by ryan
Modified: 2019-07-25 04:09 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-07-25 04:09:17 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Debug level 10 log of client connection when panic occurs (1.46 MB, text/plain)
2019-06-03 14:06 UTC, ryan
no flags Details
Debug level 10 log of issue after adding streams_xattr and fruit (8.14 MB, text/plain)
2019-06-04 06:56 UTC, ryan
no flags Details
Client panic logs (8.17 MB, text/plain)
2019-06-25 11:16 UTC, ryan
no flags Details

Description ryan 2019-06-03 14:06:06 UTC
Created attachment 1576680 [details]
Debug level 10 log of client connection when panic occurs

Description of problem:
When connecting to a share, the SMB thread for that client panics and constantly restarts. This was tested from a machine running OS X 10.14.4. I've not been able to test from a windows machine yet.

Version-Release number of selected component (if applicable):
Gluster = 6.1
Samba = 4.9.6

How reproducible:
Every time

SMB configuration:
[global]
security = user
netbios name = NAS01
clustering = no
server signing = no

max log size = 10000
log file = /var/log/samba/log-%M-test.smbd
logging = file
log level = 10

passdb backend = tdbsam
guest account = nobody
map to guest = bad user

force directory mode = 0777
force create mode = 0777
create mask = 0777
directory mask = 0777

store dos attributes = yes

load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes

glusterfs:volfile_server = localhost

kernel share modes = No

[VFS]
vfs objects = glusterfs
glusterfs:volume = mcv02
path = /
read only = no
guest ok = yes

Steps to Reproduce:
1. Use provided SMB configuration
2. Restart SMB service
3. Connect to share from client using guest user
4. Tail client logs on server to see panics

Actual results:
SMB thread panics and restarts


Expected results:
Client connects and SMB thread doesn't panic

Additional info:
Tested without Gluster VFS and used the FUSE mount point instead and system did not panic

Comment 1 Anoop C S 2019-06-04 05:30:09 UTC
(In reply to ryan from comment #0)
> [VFS]
> vfs objects = glusterfs

'fruit' and 'stream_xattr' vfs modules are recommended to be loaded while connecting/accessing/operating on SMB shares using Samba from Mac OS X clients. Can you re-try connecting to shares with following additional settings:

vfs objects = fruit streams_xattr glusterfs
fruit:encoding = native

Also please add the following in [global] section:
ea support = yes
fruit:aapl = yes

Comment 2 ryan 2019-06-04 06:55:25 UTC
Hi Anoop,

Thanks for getting back to me.
I've tried your suggestion but unfortunately the issue still remains. Here is my updated smb.conf:

[global]
security = user
netbios name = NAS01
clustering = no
server signing = no

max log size = 10000
log file = /var/log/samba/log-%M-test.smbd
logging = file
log level = 10

passdb backend = tdbsam
guest account = nobody
map to guest = bad user

force directory mode = 0777
force create mode = 0777
create mask = 0777
directory mask = 0777

store dos attributes = yes

load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes

glusterfs:volfile_server = localhost
ea support = yes
fruit:aapl = yes
kernel share modes = No

[VFS]
vfs objects = fruit streams_xattr glusterfs
fruit:encoding = native
glusterfs:volume = mcv02
path = /
read only = no
guest ok = yes



This time when creating a new folder at the root of the share, it creates, then disappears, sometimes coming back, sometimes not.
When I was able to traverse into a sub-folder, the same error is received.

I will attach the debug level 10 logs to the bug.

Many thanks for you help,
Ryan

Comment 3 ryan 2019-06-04 06:56:23 UTC
Created attachment 1576920 [details]
Debug level 10 log of issue after adding streams_xattr and fruit

Comment 4 Anoop C S 2019-06-12 05:46:52 UTC
(In reply to ryan from comment #2)
> Hi Anoop,
> 
> Thanks for getting back to me.
> I've tried your suggestion but unfortunately the issue still remains. Here
> is my updated smb.conf:
> 
> [global]
> security = user
> netbios name = NAS01
> clustering = no

How many nodes are there in your cluster? In case of a Samba cluster having more than one node it is recommended to use CTDB with clustering parameter set to 'yes'.

> server signing = no
> 
> max log size = 10000
> log file = /var/log/samba/log-%M-test.smbd
> logging = file
> log level = 10
> 
> passdb backend = tdbsam
> guest account = nobody
> map to guest = bad user
> 
> force directory mode = 0777
> force create mode = 0777
> create mask = 0777
> directory mask = 0777
> 
> store dos attributes = yes
> 
> load printers = no
> printing = bsd
> printcap name = /dev/null
> disable spoolss = yes
> 
> glusterfs:volfile_server = localhost
> ea support = yes
> fruit:aapl = yes
> kernel share modes = No
> 
> [VFS]
> vfs objects = fruit streams_xattr glusterfs
> fruit:encoding = native
> glusterfs:volume = mcv02
> path = /
> read only = no
> guest ok = yes
> 
> This time when creating a new folder at the root of the share, it creates,
> then disappears, sometimes coming back, sometimes not.
> When I was able to traverse into a sub-folder, the same error is received.

Can you re-check after restarting the services with 'posix locking' parameter set to 'no' in [global] section of smb.conf?

Comment 5 ryan 2019-06-12 11:44:27 UTC
Hi Anoop,

Usually we have 2 in our development cluster, however for testing I stopped the CTDB services on one node and performed the test.
I didn't stop the services on the second node however.

After repeating the testing with both stopped, I can't re-create the issue.
Was there something in the logs about CTDB?

Best,
Ryan

Comment 6 Anoop C S 2019-06-12 12:53:29 UTC
(In reply to ryan from comment #5)
> Hi Anoop,
> 
> Usually we have 2 in our development cluster, however for testing I stopped
> the CTDB services on one node and performed the test.

May I ask why?

Also in that case how were you accessing the server from Mac client machine? Using public IP available on the node(where CTDB is running) or with direct node IP?

> I didn't stop the services on the second node however.
> After repeating the testing with both stopped, I can't re-create the issue.

Now when CTDB is stopped on both nodes you must have accessed shares using node IP.

> Was there something in the logs about CTDB?

You cannot expect CTDB logging in smbd log file specific to a client. CTDB logs entries in /var/log/log.ctdb.

My gut feeling is that the behaviour you are facing is due to lack of synchronized tdbs across nodes in the cluster which is one of the point why we run CTDB in a cluster. Therefore I would suggest you run CTDB on both nodes and access the cluster using public IPs after making sure that the cluster is in HEALTHY state.

Comment 7 ryan 2019-06-14 08:10:40 UTC
Hi Anoop,

Usually when discovering an issue, we try to reduce as many variables as possible whilst still being able to reproduce the issue.
For the tests, we use the node's IP address as CTDB is usually disabled when we carry out the testing.

The issue was discovered when using the cluster in it's usual configuration, which is using CTDB, Winbind and Samba, and then connecting via the CTDB IP addresses.

I can share our usual configuration with you if this helps?

Please let me know if I can gather any more info for you.
Best,
Ryan

Comment 8 ryan 2019-06-25 11:16:21 UTC
Hi Anoop,

Unfortunately the issue seems to have re-occurred.
I've attached the latest log files from the client.
Would you mind having a look to see if there's anything obvious?

It seems to be panicing and hangs the entire finder. I was connecting to the 'QC' share.
Here is the current smb.conf:

[global]
security = ADS
workgroup = MAGENTA
realm = MAGENTA.LOCAL
netbios name = MAGENTANAS01
max protocol = SMB3
min protocol = SMB2
ea support = yes
clustering = yes
server signing = no
max log size = 10000
glusterfs:loglevel = 5
log file = /var/log/samba/log-%M.smbd
logging = file
log level = 3
template shell = /sbin/nologin
winbind offline logon = false
winbind refresh tickets = yes
winbind enum users = Yes
winbind enum groups = Yes
allow trusted domains = yes
passdb backend = tdbsam
idmap cache time = 604800
idmap negative cache time = 300
winbind cache time = 604800
idmap config magenta:backend = rid
idmap config magenta:range = 10000-999999
idmap config * : backend = tdb
idmap config * : range = 3000-7999
guest account = nobody
map to guest = bad user
force directory mode = 0777
force create mode = 0777
create mask = 0777
directory mask = 0777
hide unreadable = no
store dos attributes = no
unix extensions = no
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
glusterfs:volfile_server = localhost
kernel share modes = No
strict locking = auto
oplocks = yes
durable handles = yes
kernel oplocks = no
posix locking = no
level2 oplocks = no
readdir_attr:aapl_rsize = yes
readdir_attr:aapl_finder_info = no
readdir_attr:aapl_max_access = no

[Grading]
read only = no
guest ok = yes
vfs objects = catia fruit streams_xattr glusterfs
glusterfs:volume = mcv01
path = "/data"
recycle:keeptree = yes
recycle:directory_mode = 0770
recycle:versions = yes
recycle:repository = .recycle
recycle:subdir_mode = 0777
valid users = "nobody" @"audio" "MAGENTA\r.launchbury"

[Edits]
read only = no
guest ok = yes
vfs objects = catia fruit streams_xattr glusterfs_fuse
fruit:resource = xattr
fruit:metadata = netatalk
fruit:locking = netatalk
fruit:encoding = native
path = "/mnt/mcv01/data"
valid users = "nobody" @"Editors"
recycle:repository = .recycle
recycle:keeptree = yes
recycle:versions = yes
recycle:directory_mode = 0770
recycle:subdir_mode = 0777

[Ingest]
guest ok = no
read only = no
vfs objects = glusterfs
glusterfs:volume = mcv01
path = "/data/ingest_only"
valid users = @"Ingests"
recycle:repository = .recycle
recycle:keeptree = yes
recycle:versions = yes
recycle:directory_mode = 0770
recycle:subdir_mode = 0777

[QC]
guest ok = no
read only = no
vfs objects = glusterfs
glusterfs:volume = mcv01
path = "/data/qc_only"
valid users = @"QC_ops"
recycle:repository = .recycle
recycle:keeptree = yes
recycle:versions = yes
recycle:directory_mode = 0770
recycle:subdir_mode = 0777

Comment 9 ryan 2019-06-25 11:16:50 UTC
Created attachment 1584283 [details]
Client panic logs

Comment 10 ryan 2019-06-25 11:27:54 UTC
Please ignore the last bug.
That share wans't using fruit or streams_xattr.

Is there a way to get normal shares to work with OS X?
Previous Samba versions have been able to work with OS X without these VFS objects, albit with reduced OS X performance.
It seems like there has been some stability regression with OS X with newer gluster_vfs versions.

Best,
Ryan

Comment 11 Anoop C S 2019-07-04 13:00:38 UTC
(In reply to ryan from comment #10)
> Please ignore the last bug.
> That share wans't using fruit or streams_xattr.
> 
> Is there a way to get normal shares to work with OS X?

Sorry..can you explain a bit more?

> Previous Samba versions have been able to work with OS X without these VFS
> objects, albit with reduced OS X performance.

Those VFS modules are required to handle extra metadata handling from Mac OS X.

> It seems like there has been some stability regression with OS X with newer
> gluster_vfs versions.

What versions and configurations are being compared here?

Comment 12 ryan 2019-07-24 09:12:40 UTC
Hi Anoop,

On previous builds, OS X machines could still use the share without issue when the Fruit VFS item was not used on the share. The share would not benefit from OS X metadata handling, but wouldn't panic.
It seems on newer builds this is required, otherwise we get panics in SMBD threads.

Unfortunately, I don't have exact version numbers as we were monitoring those systems at that time for this particular issue.
I believe we would have been using Gluster 3.12.x and Samba 4.6.x.

Best,
Ryan

Comment 13 Anoop C S 2019-07-24 13:41:46 UTC
(In reply to ryan from comment #12)
> Hi Anoop,
> 
> On previous builds, OS X machines could still use the share without issue
> when the Fruit VFS item was not used on the share. The share would not
> benefit from OS X metadata handling, but wouldn't panic.
> It seems on newer builds this is required, otherwise we get panics in SMBD
> threads.

Ok. I understand.
But at the same time I don't think it is worth spending time in debugging the
failures from Mac client while accessing shares configured without fruit and
streams_xattr VFS modules given that these modules(rather recommended) can
handle Mac clients in its entirety. 

What do you think?

> Unfortunately, I don't have exact version numbers as we were monitoring
> those systems at that time for this particular issue.
> I believe we would have been using Gluster 3.12.x and Samba 4.6.x.

Samba 4.6.x has reached EOL and is the case with Gluster 3.12.x. Therefore I would
suggest to keep your current setup with those VFS modules and report back in case
of any issues.

Comment 14 ryan 2019-07-24 13:45:51 UTC
Hi Anoop,

I agree, feel free to close this.
Only issue major issue we're seeing currently is bug 1728183 with the VFS module.

Many thanks for the help.

Best,
Ryan

Comment 15 Anoop C S 2019-07-25 04:09:17 UTC
Closing the bug report as per comment #14


Note You need to log in before you can comment on or make changes to this bug.