Description of problem: I see that glusterd.log is flooded with messages below and log file grows huge with in minutes and glusterd just stops. [2016-10-05 06:20:46.491886] W [socket.c:2645:socket_server_event_handler] 0-socket.management: accept on 11 failed (Too many open files) [2016-10-05 06:20:46.491895] W [socket.c:2645:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-10-05 06:20:46.491904] W [socket.c:2645:socket_server_event_handler] 0-socket.management: accept on 11 failed (Too many open files) [2016-10-05 06:20:46.491913] W [socket.c:2645:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-10-05 06:20:46.491922] W [socket.c:2645:socket_server_event_handler] 0-socket.management: accept on 11 failed (Too many open files) Version-Release number of selected component (if applicable): glusterfs-3.8.4-2.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. Installed HC stack on RHV-H and arbiter 2. Enabled SSL on the volumes 3. Just left the setup overnight. Actual results: All of a sudden i see that glusterd.log gets flooded with messages reported in description and the log file grows huge i.e upto 14G and glusterd service stops. Expected results: glusterd.log should not be flooded with these messages and glusterd service should not stop. Additional info:
I have also copied the glusterd.log file which has grown up to 14G to the link mentioned in comment 2
proposing this bug as blocker as glusterd.log gets flooded with these messages and all glusterd commands are hung on my system.
Additional information: There are lots of error saying that - "error in polling loop", which should not happen once the setup is up and running with SSL/TLS configured properly. There are about 1580 such error messages in 1 day and it keeps ticking. # grep -i 'error in polling loop' /var/log/glusterfs/glusterd.log | wc -l 1580 This considerably makes glusterd log grow. # ls -lsah /var/log/glusterfs/glusterd.log 8.1M -rw-------. 1 root root 8.1M Oct 7 16:51 /var/log/glusterfs/glusterd.log
(In reply to RamaKasturi from comment #4) > proposing this bug as blocker as glusterd.log gets flooded with these > messages and all glusterd commands are hung on my system. Please do not use blocker flag till dev freeze. TestBlocker keyword should suffice.
Hi, I am facing similar kind of issue. With glusterd logs are flooded with messages, resulting in root disk space getting full and blocks the ongoing work. I did very limited (say twice) removing and creating the secure-access file from the management and trying the mount & umount in the client along with glusterd restart. But with in a span of 2 days huge number of data is created (5GB) in the glusterd log file. # df -h / #Filesystem Size Used Avail Use% Mounted on #/dev/mapper/rhgs_dhcp47--12-root 7.6G 7.6G 20K 100% / # du -sh /var/log/glusterfs #4.6G /var/log/glusterfs # du -sh /var/log/glusterfs/glusterd.log #4.1G /var/log/glusterfs/glusterd.log I am raising this as a highest of priority as within a week's time it will overload to 100GB of data resulting to a block for the on going work.
Atin, apologies for the delay. I will update this by EOD. I have been continuously hitting other RHV-H problems due to which i could not test this.
This is happening because the fix for #1336339 that was merged with http://review.gluster.org/14891 is only partial. With the change pipes are being only closed when spawning own-thread (socket_spawn) fails. They are still being left open when a working connection disconnects. So the fix is only partial. It still requires http://review.gluster.org/14356 which closes pipes after an SSL connection disconnects. I'll refresh my change upstream and get it merged. It can then be backported.
Hi Kasturi, This is updated test rpm( this is updated link https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11924412 <magrawal> it have both fixes) after merged patch is shared by kaushal. You can test it and share the result. Regards Mohit Agrawal
Hi Kasturi, There was some problem in last build,I have done some minor update. I have tried some basic testing on below build specific to SSL ,it is working fine. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11928955 Please do install this and share the result. Regards Mohit Agrawal
Hi Mohith, I installed the latest build which you provided in comment 13 and i am able to mount the volume. Monitoring the system, will update the status in the bug. Thanks kasturi
Hi Kasturi, Thank you very much for sharing the result. Regards Mohit Agrawal
Has not seen the logs flooded with Too many open files message. I will leave the setup over night monitor and update the bug.
upstream mainline : http://review.gluster.org/14356 upstream 3.8 : http://review.gluster.org/#/c/15703 (not merged) upstream 3.9 : http://review.gluster.org/#/c/15702 (not merged) downstream patch :https://code.engineering.redhat.com/gerrit/87981
Tested with the private build given. Do not see that glusterd.log files is getting flooded with too many open file errors.
Verified and works fine with build glusterfs-3.8.4-3.el7rhgs.x86_64. I do not see the log files getting flooded with "Too many files open issue".
Hi Abhishek/Atin, The bugzilla was raised to resolve the error "Too many open files" due to fd leak issue in the socket_poller code. We can provide the patch only to resolve this issue(fd leak) but in later version(after this version) we had done multiple changes in socket code,after this patch you will not get fd leak issue but you can face other issue. Regards Mohit Agrawal
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html