Created attachment 1518442 [details] Python 3 script to replicate issue --------------------------------------------------------------------------- Description of problem: If glusterfs VFS is used with Samba, and the global option "store dos attributes = yes" is set, the SMBD rss memory usage balloons. If a FUSE mount is used with Samba, and the global option "store dos attributes = yes" is set, the Gluster FUSE mount process rss memory usage balloons. --------------------------------------------------------------------------- Version-Release number of selected component (if applicable): Samba 4.9.4 Gluster 4.1 How reproducible: Can reproduce every time with attached python script --------------------------------------------------------------------------- Gluster volume options: Volume Name: mcv02 Type: Distribute Volume ID: 5debe2f4-16c4-457c-8496-fcf32b298ccf Status: Started Snapshot Count: 0 Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: mcn01:/mnt/h1a/test_data Brick2: mcn02:/mnt/h1b/test_data Brick3: mcn01:/mnt/h2a/test_data Brick4: mcn02:/mnt/h2b/test_data Options Reconfigured: network.ping-timeout: 5 storage.batch-fsync-delay-usec: 0 performance.cache-size: 1000MB performance.stat-prefetch: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-invalidation: on performance.cache-samba-metadata: on performance.md-cache-timeout: 600 performance.io-thread-count: 32 performance.parallel-readdir: on performance.nl-cache: on performance.nl-cache-timeout: 600 cluster.lookup-optimize: on performance.write-behind-window-size: 1MB performance.client-io-threads: on client.event-threads: 4 server.event-threads: 4 auth.allow: 172.30.30.* transport.address-family: inet features.quota: on features.inode-quota: on nfs.disable: on features.quota-deem-statfs: on cluster.brick-multiplex: off cluster.server-quorum-ratio: 50% --------------------------------------------------------------------------- smb.conf file: [global] security = user netbios name = NAS01 clustering = no server signing = no max log size = 10000 log file = /var/log/samba/log-%M-test.smbd logging = file@1 log level = 1 passdb backend = tdbsam guest account = nobody map to guest = bad user force directory mode = 0777 force create mode = 0777 create mask = 0777 directory mask = 0777 store dos attributes = yes load printers = no printing = bsd printcap name = /dev/null disable spoolss = yes glusterfs:volfile_server = localhost kernel share modes = No [VFS] vfs objects = glusterfs glusterfs:volume = mcv02 path = / read only = no guest ok = yes valid users = "nobody" [FUSE] read only = no guest ok = yes path = "/mnt/mcv02" valid users = "nobody" ------------------------------------------------------------------------- Steps to Reproduce: 1. Install/compile Samba (Tested with 4.8.4,4.8.6,4.9.4). Install HTOP 2. Add 'store dos attributes = yes' to the Global section of the /etc/samba/smb.conf file 3.Restart the SMB service 4. Map the Share to a drive in windows 5. Download the attached python script, change line 41 to the mapped drive in Windows 6. Run attached Python script from a Windows OS (Tested with Win 10 & Python 3.7.1) 7. Run 'htop' or watch the RSS memory usage of the SMBD process Actual results: SMBD and FUSE memory balloons over 2-4GB on the process, and does not decrease even when IO has finished Expected results: SMBD and FUSE memory increases slightly, but then stabilises. Rarely going over 200MB Additional info:
I can confirm this issue also affects OS X clients connecting to the system. Samba 4.9 has 'store dos attributes' set to True/on by default now, so it's very likely others will encounter this issue. Please let me know if I can assist or provide more data. Many thanks, Ryan
*** Bug 1654642 has been marked as a duplicate of this bug. ***
Hello, Could you advise if there is any update on this?
(In reply to ryan from comment #0) > Created attachment 1518442 [details] > Python 3 script to replicate issue > > --------------------------------------------------------------------------- > Description of problem: > If glusterfs VFS is used with Samba, and the global option "store dos > attributes = yes" is set, the SMBD rss memory usage balloons. > > If a FUSE mount is used with Samba, and the global option "store dos > attributes = yes" is set, the Gluster FUSE mount process rss memory usage > balloons. How did you manage to find out its because of "store dos attributes" parameter that RSS memory is shooting up to GBs? Following is the GlusterFS volume configuration on which I tried running the attached script to reproduce the issue which I couldn't as RSS value went till ~110 MB only. Volume Name: vol Type: Distributed-Replicate Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Options Reconfigured: performance.readdir-ahead: on performance.parallel-readdir: on performance.nl-cache-timeout: 600 performance.nl-cache: on performance.cache-samba-metadata: on network.inode-lru-limit: 200000 performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on user.smb: enable diagnostics.brick-log-level: INFO performance.stat-prefetch: on transport.address-family: inet nfs.disable: on user.cifs: enable cluster.enable-shared-storage: disable smb.conf global parameters -------------------------- # Global parameters [global] clustering = Yes dns proxy = No kernel change notify = No log file = /usr/local/var/log/samba/log.%m security = USER server string = Samba Server fruit:aapl = yes idmap config * : backend = tdb include = /usr/local/etc/samba/smb-ext.conf kernel share modes = No posix locking = No Versions used ------------- Fairly recent mainline source for Samba and GlusterFS I could see couple of more volume set options from bug description which leaves us with more configurations to be tried on unless we are sure about "store dos attribute" parameter causing the high memory consumption.
Hi Anoop, Sorry for the delay. I've tried to re-test, however we're now using Gluster 6.1 and Samba 4.9.6. Another issue has come up which is preventing me testing this issue. I've raised a bug for it here https://bugzilla.redhat.com/show_bug.cgi?id=1716440. Once i'm able to re-test I will update this ticket. Best, Ryan
We're seeing this issue on nearly all of our clusters in production. One common factor is the type of application which is using the share. These applications are Media Asset Management tools which either walk the filesystem or listen for file system notifications, and then process the file. We are seeing the issue on systems that have 'store dos attributes = no' set, although the memory usage pattern is very different. With 'store dos attributes = yes', the issue will cause a system with 64GB of memory to OOM within 24hrs. With 'store dos attributes = no' the same system will not OOM for months. The memory usage is slow and gradual, but we still have multiple SMBD threads with over 6GB of RSS memory usage. The sernet/samba team has assisted us in tracing this back through the stack and have confirmed the issue seems to be within the gluster VFS module. Please let me know if I can get any more data, logs etc to progress this issue. Many thanks, Ryan
Hello, I'm trying to gather more information of this memory balloning, by following the steps documented here: https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md#mempools Is there a way to find the PID of a Gluster VFS client based the Samba PID that is ballooning/showing the memory leak issue? Many thanks, Ryan
(In reply to ryan from comment #7) > Hello, > > I'm trying to gather more information of this memory balloning, by following > the steps documented here: > https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump. > md#mempools > > Is there a way to find the PID of a Gluster VFS client based the Samba PID > that is ballooning/showing the memory leak issue? Gluster client stack is within smbd. So it is the smbd process connected to your SMB client which can be figured out from the output of `smbstatus`. But for taking statedump I think it is better to execute the command on the node where required smbd is running with hostname as localhost.
Can you please update on the current status of this bug report?
Hi Anoop, Unfortunately we were not able to get a statedump as the customer could not provide us a downtime window. We've since migrated all customers to 6.6 and cannot re-create the issue. Best, Ryan
(In reply to ryan from comment #10) > Hi Anoop, > > Unfortunately we were not able to get a statedump as the customer could not > provide us a downtime window. > We've since migrated all customers to 6.6 and cannot re-create the issue. Great. Given that v4.x is EOL and we are unable to reproduce the issue, shall we go ahead and close(as INSUFFICIENT_DATA) the bug report?
Please feel free to close Best, Ryan
v4.1 has reached EOL and issue is not reproducible on current stable release(as per comment #10). Therefore closing the bug report.