+++ This bug was initially created as a clone of Bug #1336371 +++ +++ This bug was initially created as a clone of Bug #1336369 +++ +++ This bug was initially created as a clone of Bug #1336339 +++ Description of problem: ======================= I done 400 times sequential volume start and stop, it's failed after some time with CLI error message """ volume start: rep: success volume stop: rep: success volume start: rep: success volume stop: rep: failed: Commit failed on localhost. Please check the log file for more details. volume start: rep: failed: Pre Validation failed on dhcp43-215.lab.eng.blr.redhat.com. Volume rep already started volume stop: rep: failed: Volume rep is not in the started state volume start: rep: failed: Pre-validation failed on localhost. Please check log file for details volume stop: rep: failed: Volume rep is not in the started state volume start: rep: failed: Pre-validation failed on localhost. Please check lo """ glusterd logs are posted in the comment. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-4. How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a volume 16 x 2 = 32 type using two nodes. 2. Do FUSE mount and umount the volume 300 times. 3. Do volume start/stop 400 times //it fail after some start/stop operation. 4. Check the glusterd logs for error messages and warnings. Actual results: =============== Sequential volume start and stop is not working with SSL enabled setup. Expected results: ================= Sequential volume start and stop should work with SSL enabled setup. Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-05-16 13:58:53 IST --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from Byreddy on 2016-05-16 13:59:59 IST --- [2016-05-16 06:52:45.028639] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028646] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028654] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028661] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028674] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028679] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.028689] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028750] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028760] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028768] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028776] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028784] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028792] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028806] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028815] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028823] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028831] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028838] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028864] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028872] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028880] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028888] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028895] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028903] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028911] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028918] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028926] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028934] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028941] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028949] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028956] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028965] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028973] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029132] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029152] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029168] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029183] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029191] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029198] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029206] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029213] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029221] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029228] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029236] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029244] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029251] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029258] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.030909] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.031075] E [socket.c:2868:socket_server_event_handler] 0-socket.management: could not create pipe [2016-05-16 06:52:45.041646] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.043605] E [socket.c:2616:socket_poller] 0-socket.management: error in polling loop The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:44.061901] and [2016-05-16 06:52:47.161342] [2016-05-16 06:52:47.163334] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:47.163376] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:47.163551] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.163551] and [2016-05-16 06:52:47.252524] [2016-05-16 06:52:47.254692] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:47.254771] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:47.254784] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:47.254793] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:47.254800] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:47.254809] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:47.255009] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.255009] and [2016-05-16 06:52:50.350609] [2016-05-16 06:52:50.352683] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:50.352756] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:50.352909] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.352909] and [2016-05-16 06:52:50.448995] [2016-05-16 06:52:50.451250] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:50.451338] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:50.451353] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:50.451363] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:50.451371] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:50.451381] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:50.451505] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.451505] and [2016-05-16 06:52:53.542390] [2016-05-16 06:52:53.544425] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:53.544467] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the star: [2016-05-16 06:53:00.020201] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:53:00.020216] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:53:00.020225] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:53:00.020233] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:53:00.020243] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:53:00.020359] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 06:56:30.764560] I [MSGID: 106487] [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2016-05-16 07:27:02.476621] E [MSGID: 101012] [logging.c:376:gf_log_rotate] 0-logrotate: failed to open logfile [Too many open files] [2016-05-16 07:27:02.476615] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476753] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476902] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476983] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477002] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477010] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files" repeated 13 times between [2016-05-16 07:27:02.477010] and [2016-05-16 07:27:02.477115] [2016-05-16 07:27:02.477124] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files" repeated 4 times between [2016-05-16 07:27:02.477124] and [2016-05-16 07:27:02.477170] [2016-05-16 07:27:02.477177] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files" repeated 6 times between [2016-05-16 07:27:02.477177] and [2016-05-16 07:27:02.477235] [2016-05-16 07:27:02.477242] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files" repeated 7 times between [2016-05-16 07:27:02.477242] and [2016-05-16 07:27:02.477312] [2016-05-16 07:27:02.477319] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477343] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477350] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/read-ahead.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477371] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/md-cache.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477390] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.477390] and [2016-05-16 07:27:02.477412] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files" repeated 9 times between [2016-05-16 07:27:02.476615] and [2016-05-16 07:27:02.478514] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.478527] and [2016-05-16 07:27:02.478556] [2016-05-16 08:04:49.794095] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:04:49.849469] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume rep [2016-05-16 08:04:49.851449] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume rep is not started [2016-05-16 08:04:55.999928] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:04:56.000034] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:04:56.000059] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:04:56.000096] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:04:56.000107] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:04:56.000117] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:05:27.990060] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:05:27.990090] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:05:27.990157] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:05:27.990177] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:05:27.990246] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:05:27.990272] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 10 times between [2016-05-16 08:04:49.794095] and [2016-05-16 08:05:27.990523] [2016-05-16 08:13:05.184001] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:13:05.186580] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:05.186639] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:05.186654] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:05.186665] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:05.186675] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:05.186690] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:13:06.581765] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:06.581767] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:06.581833] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:06.581722] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:06.581761] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:06.581873] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 5 times between [2016-05-16 08:13:05.184001] and [2016-05-16 08:13:06.581980] [2016-05-16 08:19:40.223288] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2016-05-16 08:19:40.239294] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req (END) --- Additional comment from Kaushal on 2016-05-16 15:12:08 IST --- Each encrypted connections create a pipe (which uses 2 fds), which are not closed after disconnections. This is an fd leak, which can lead to resource starvation and lead to other operations failing afterwards. --- Additional comment from Vijay Bellur on 2016-05-16 15:18:56 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#1) for review on master by Kaushal M (kaushal) --- Additional comment from Vijay Bellur on 2016-05-16 15:53:41 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#2) for review on master by Kaushal M (kaushal) --- Additional comment from Vijay Bellur on 2016-05-18 15:58:03 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#3) for review on master by Kaushal M (kaushal) --- Additional comment from Vijay Bellur on 2016-06-15 13:36:26 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#4) for review on master by Kaushal M (kaushal) --- Additional comment from Atin Mukherjee on 2016-08-10 12:34:15 IST --- http://review.gluster.org/#/c/14694/ actually fixes this problem. --- Additional comment from Kaushal on 2016-10-20 14:13:19 IST --- http://review.gluster.org/#/c/14694/ doesn't fix the issue completely. With the change pipes are being only closed when spawning own-thread (socket_spawn) fails. They are still being left open when a working connection disconnects. So the fix is only partial. It still requires http://review.gluster.org/14356 which closes pipes after an SSL connection disconnects. --- Additional comment from Worker Ant on 2016-10-20 14:15:18 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#5) for review on master by Kaushal M (kaushal) --- Additional comment from Worker Ant on 2016-10-20 20:08:23 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#6) for review on master by Kaushal M (kaushal) --- Additional comment from Worker Ant on 2016-10-20 20:22:14 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#7) for review on master by Kaushal M (kaushal) --- Additional comment from Worker Ant on 2016-10-21 11:34:59 IST --- REVIEW: http://review.gluster.org/14356 (rpc/socket: Close pipe on disconnection) posted (#8) for review on master by Kaushal M (kaushal) --- Additional comment from Worker Ant on 2016-10-24 09:40:54 IST --- COMMIT: http://review.gluster.org/14356 committed in master by Raghavendra G (rgowdapp) ------ commit d4b993610c1ee64ab7c2f41cf457e63a3da7f521 Author: Kaushal M <kaushal> Date: Mon May 16 15:15:49 2016 +0530 rpc/socket: Close pipe on disconnection Encrypted connections create a pipe, which isn't closed when the connection disconnects. This leaks fds, and gluster eventually ends up in a situation with fd starvation which leads to operation failures. Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc BUG: 1336371 Signed-off-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/14356 Tested-by: MOHIT AGRAWAL <moagrawa> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp>
REVIEW: http://review.gluster.org/15702 (rpc/socket: Close pipe on disconnection) posted (#1) for review on release-3.9 by Kaushal M (kaushal)
COMMIT: http://review.gluster.org/15702 committed in release-3.9 by Raghavendra G (rgowdapp) ------ commit 4d59c16154e794677bae61842e32fa8aef6d419f Author: Kaushal M <kaushal> Date: Mon May 16 15:15:49 2016 +0530 rpc/socket: Close pipe on disconnection Encrypted connections create a pipe, which isn't closed when the connection disconnects. This leaks fds, and gluster eventually ends up in a situation with fd starvation which leads to operation failures. > Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc > BUG: 1336371 > Signed-off-by: Kaushal M <kaushal> > Reviewed-on: http://review.gluster.org/14356 > Tested-by: MOHIT AGRAWAL <moagrawa> > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Raghavendra G <rgowdapp> > Signed-off-by: Kaushal M <kaushal> Change-Id: I144e1f767cec8c6fc1aa46b00cd234129d2a4adc BUG: 1387960 Signed-off-by: Kaushal M <kaushal> Reviewed-on: http://review.gluster.org/15702 Tested-by: Atin Mukherjee <amukherj> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Atin Mukherjee <amukherj> Reviewed-by: Raghavendra G <rgowdapp>
This bug is getting closed because GlusterFS-3.9 has reached its end-of-life [1]. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please open a new bug against the newer release. [1]: https://www.gluster.org/community/release-schedule/