Bug 1381822
Summary: | glusterd.log is flooded with socket.management: accept on 11 failed (Too many open files) and glusterd service stops | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> |
Component: | glusterfs | Assignee: | Mohit Agrawal <moagrawa> |
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | rhgs-3.2 | CC: | abhishku, amukherj, asoman, bsrirama, kaushal, knarra, moagrawa, rabhat, rhinduja, rhs-bugs, sasundar, storage-qa-internal, vbellur, vdas |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-3 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 06:08:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1336371, 1387960 | ||
Bug Blocks: | 1351528 |
Description
RamaKasturi
2016-10-05 06:43:53 UTC
I have also copied the glusterd.log file which has grown up to 14G to the link mentioned in comment 2 proposing this bug as blocker as glusterd.log gets flooded with these messages and all glusterd commands are hung on my system. Additional information: There are lots of error saying that - "error in polling loop", which should not happen once the setup is up and running with SSL/TLS configured properly. There are about 1580 such error messages in 1 day and it keeps ticking. # grep -i 'error in polling loop' /var/log/glusterfs/glusterd.log | wc -l 1580 This considerably makes glusterd log grow. # ls -lsah /var/log/glusterfs/glusterd.log 8.1M -rw-------. 1 root root 8.1M Oct 7 16:51 /var/log/glusterfs/glusterd.log (In reply to RamaKasturi from comment #4) > proposing this bug as blocker as glusterd.log gets flooded with these > messages and all glusterd commands are hung on my system. Please do not use blocker flag till dev freeze. TestBlocker keyword should suffice. Hi, I am facing similar kind of issue. With glusterd logs are flooded with messages, resulting in root disk space getting full and blocks the ongoing work. I did very limited (say twice) removing and creating the secure-access file from the management and trying the mount & umount in the client along with glusterd restart. But with in a span of 2 days huge number of data is created (5GB) in the glusterd log file. # df -h / #Filesystem Size Used Avail Use% Mounted on #/dev/mapper/rhgs_dhcp47--12-root 7.6G 7.6G 20K 100% / # du -sh /var/log/glusterfs #4.6G /var/log/glusterfs # du -sh /var/log/glusterfs/glusterd.log #4.1G /var/log/glusterfs/glusterd.log I am raising this as a highest of priority as within a week's time it will overload to 100GB of data resulting to a block for the on going work. Atin, apologies for the delay. I will update this by EOD. I have been continuously hitting other RHV-H problems due to which i could not test this. This is happening because the fix for #1336339 that was merged with http://review.gluster.org/14891 is only partial. With the change pipes are being only closed when spawning own-thread (socket_spawn) fails. They are still being left open when a working connection disconnects. So the fix is only partial. It still requires http://review.gluster.org/14356 which closes pipes after an SSL connection disconnects. I'll refresh my change upstream and get it merged. It can then be backported. Hi Kasturi, This is updated test rpm( this is updated link https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11924412 <magrawal> it have both fixes) after merged patch is shared by kaushal. You can test it and share the result. Regards Mohit Agrawal Hi Kasturi, There was some problem in last build,I have done some minor update. I have tried some basic testing on below build specific to SSL ,it is working fine. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11928955 Please do install this and share the result. Regards Mohit Agrawal Hi Mohith, I installed the latest build which you provided in comment 13 and i am able to mount the volume. Monitoring the system, will update the status in the bug. Thanks kasturi Hi Kasturi, Thank you very much for sharing the result. Regards Mohit Agrawal Has not seen the logs flooded with Too many open files message. I will leave the setup over night monitor and update the bug. upstream mainline : http://review.gluster.org/14356 upstream 3.8 : http://review.gluster.org/#/c/15703 (not merged) upstream 3.9 : http://review.gluster.org/#/c/15702 (not merged) downstream patch :https://code.engineering.redhat.com/gerrit/87981 Tested with the private build given. Do not see that glusterd.log files is getting flooded with too many open file errors. Verified and works fine with build glusterfs-3.8.4-3.el7rhgs.x86_64. I do not see the log files getting flooded with "Too many files open issue". Hi Abhishek/Atin, The bugzilla was raised to resolve the error "Too many open files" due to fd leak issue in the socket_poller code. We can provide the patch only to resolve this issue(fd leak) but in later version(after this version) we had done multiple changes in socket code,after this patch you will not get fd leak issue but you can face other issue. Regards Mohit Agrawal Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |