1619416 – memory grows until swap is 100% utilized and some brick daemons crashes during creating of large number of small files

Bug 1619416 - memory grows until swap is 100% utilized and some brick daemons crashes during creating of large number of small files

Summary: memory grows until swap is 100% utilized and some brick daemons crashes durin...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	protocol
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Amar Tumballi
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-08-20 19:19 UTC by Martin Bukatovic
Modified:	2019-01-09 14:52 UTC (History)
CC List:	10 users (show)
Fixed In Version:	glusterfs-3.12.2-17
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 06:52:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
screenshot usm1 (affected cluster): host dashboard with crash and growing memory utilization visible (270.38 KB, image/png) 2018-08-20 19:31 UTC, Martin Bukatovic	no flags	Details
screenshot of host dashboard from cluster using older builds, which doesn't crash during same workload (287.62 KB, image/png) 2018-08-20 19:58 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:53:50 UTC

Description Martin Bukatovic 2018-08-20 19:19:26 UTC

Description of problem
======================

For workload which consists of creating lot of small files (see details below),
memory consumption on gluster storage machines grows to the point swap is 100%
utilized and some brick daemons crashes.

This has been observed during testing of RHGS WA, so the reproducer (and evidence) contains installation of RHGS WA.

While this has been observed on storage machines with 2GB ram only (which is
way below expected production values), the fact that gluster filled up the swap
100% and then some bricks crashed is a recent behavior, not seen in gluster
builds few weeks ago (exact versions available below).

Version-Release number of selected component
============================================

glusterfs-3.12.2-16.el7rhgs.x86_64                                                              
glusterfs-api-3.12.2-16.el7rhgs.x86_64
glusterfs-cli-3.12.2-16.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-16.el7rhgs.x86_64
glusterfs-events-3.12.2-16.el7rhgs.x86_64
glusterfs-fuse-3.12.2-16.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-16.el7rhgs.x86_64
glusterfs-libs-3.12.2-16.el7rhgs.x86_64
glusterfs-rdma-3.12.2-16.el7rhgs.x86_64
glusterfs-server-3.12.2-16.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
python2-gluster-3.12.2-16.el7rhgs.x86_64
tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

How reproducible
================

1/1

Steps to Reproduce
==================

1. Install Gluster trusted storage pool via gdeploy on 6 virtual machines
   with 2 GB ram, using gdeploy config files[1]
2. Install dedicated client machine, and mount all gluster volumes there
3. Install RHGS WA on dedicated machine and just created trusted storage pool
4. Import the trusted pool into WA, with gluster volume profiling enabled
5. On client machine, prepare wikipedia tarball[2]
6. On client, cd to the mountpoint of beta volume and run:

   # bzcat /tmp/enwiki-latest-pages-articles.xml.bz2 | wiki-export-split.py --noredir --filenames=sha1 --sha1sum=wikipages.sha1 --max-files=1000000

This will start extracting wikipedia articles into individual pages.

7. Wait few hours.

[1] https://github.com/usmqe/usmqe-setup/blob/master/gdeploy_config/volume_beta_arbiter_2_plus_1x2.create.conf https://github.com/usmqe/usmqe-setup/blob/master/gdeploy_config/volume_gama_disperse_4_plus_2x2.create.conf
[2] https://github.com/usmqe/usmqe-setup/blob/master/test_setup.wiki_tarball.yml

Actual results
==============

Storage machines runs out of free memory and swap, some gluster bricks there
crashes, switching the volume into read only mode stopping the workload before
completion:

```
[root@mbukatov-usm1-client volume_beta_arbiter_2_plus_1x2]# bzcat /tmp/enwiki-latest-pages-articles.xml.bz2 | wiki-export-split.py --noredir --filenames=sha1 --sha1sum=wikipages.sha1 --max-files=1000000
Traceback (most recent call last):                                              
  File "/usr/local/bin/wiki-export-split.py", line 222, in <module>             
    sys.exit(main())                                                            
  File "/usr/local/bin/wiki-export-split.py", line 216, in main                 
    return process_xml(sys.stdin, opts)                                         
  File "/usr/local/bin/wiki-export-split.py", line 200, in process_xml          
    parser.parse(xml_file)                                                      
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 107, in parse        
    xmlreader.IncrementalParser.parse(self, source)                             
  File "/usr/lib64/python2.7/xml/sax/xmlreader.py", line 123, in parse          
    self.feed(buffer)                                                           
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 210, in feed         
    self._parser.Parse(data, isFinal)                                           
  File "/usr/lib64/python2.7/xml/sax/expatreader.py", line 307, in end_element  
    self._cont_handler.endElement(name)                                         
  File "/usr/local/bin/wiki-export-split.py", line 150, in endElement           
    self.page.end()                                                             
  File "/usr/local/bin/wiki-export-split.py", line 97, in end                   
    self._file.close()                                                          
  File "/usr/lib64/python2.7/tempfile.py", line 412, in close                   
    self.file.close()                                                           
IOError: [Errno 107] Transport endpoint is not connected                        
close failed in file object destructor:                                         
IOError: [Errno 30] Read-only file system                                       
[root@mbukatov-usm1-client volume_beta_arbiter_2_plus_1x2] 
```

See attached usm1 screenshots.

Expected results
================

Gluster bricks doesn't crash, the workload finishes after some time.

Additional info
===============

I retried that on older setup (which I used few weeks ago and haven't noticed
the problem), setting the ram to 2 GB there and enabling profiling, to achieve
the same environment.

I could see some growing memory utilization, but nothing has crashed and the
workload finishes with success.

The working version:

glusterfs-3.12.2-14.el7rhgs.x86_64                                                     
glusterfs-api-3.12.2-14.el7rhgs.x86_64
glusterfs-cli-3.12.2-14.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-14.el7rhgs.x86_64
glusterfs-events-3.12.2-14.el7rhgs.x86_64
glusterfs-fuse-3.12.2-14.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-14.el7rhgs.x86_64
glusterfs-libs-3.12.2-14.el7rhgs.x86_64
glusterfs-rdma-3.12.2-14.el7rhgs.x86_64
glusterfs-server-3.12.2-14.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64
python2-gluster-3.12.2-14.el7rhgs.x86_64
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch

Comment 1 Martin Bukatovic 2018-08-20 19:31:42 UTC

Created attachment 1477364 [details]
screenshot usm1 (affected cluster): host dashboard with crash and growing memory utilization visible

See 2 events in Host dashboard of affected storage machine:

1st the memory runs out (Swap Utilization chart peaks at 100%), brick crashes
(see Brick Down counter) and copying stops (Total Brick Utilization
chart stops gworing, Brick IOPS chart shows goes down to zero).

Then the workload is stopped (no IOPS reported), but the memory keeps growing
until it reaches 100% of swap utilization again and then 2nd brick daemon on
the machine crashes.

Comment 3 Martin Bukatovic 2018-08-20 19:58:42 UTC

Created attachment 1477367 [details]
screenshot of host dashboard from cluster using older builds, which doesn't crash during same workload

Screenshot from another cluster (usm1), which uses older gluster and wa builds,
where gluster doesn't crash during the same workload.

Comment 11 errata-xmlrpc 2018-09-04 06:52:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.