Description of problem: ======================= I done 400 times sequential volume start and stop, it's failed after some time with CLI error message """ volume start: rep: success volume stop: rep: success volume start: rep: success volume stop: rep: failed: Commit failed on localhost. Please check the log file for more details. volume start: rep: failed: Pre Validation failed on dhcp43-215.lab.eng.blr.redhat.com. Volume rep already started volume stop: rep: failed: Volume rep is not in the started state volume start: rep: failed: Pre-validation failed on localhost. Please check log file for details volume stop: rep: failed: Volume rep is not in the started state volume start: rep: failed: Pre-validation failed on localhost. Please check lo """ glusterd logs are posted in the comment. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-4. How reproducible: ================= Always Steps to Reproduce: =================== 1. Create a volume 16 x 2 = 32 type using two nodes. 2. Do FUSE mount and umount the volume 300 times. 3. Do volume start/stop 400 times //it fail after some start/stop operation. 4. Check the glusterd logs for error messages and warnings. Actual results: =============== Sequential volume start and stop is not working with SSL enabled setup. Expected results: ================= Sequential volume start and stop should work with SSL enabled setup. Additional info:
[2016-05-16 06:52:45.028639] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028646] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028654] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028661] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028674] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028679] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.028689] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028750] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028760] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028768] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028776] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028784] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028792] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028806] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028815] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028823] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028831] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028838] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028864] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028872] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028880] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028888] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028895] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028903] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028911] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028918] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028926] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028934] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028941] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028949] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028956] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028965] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028973] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029132] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029152] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029168] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029183] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029191] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029198] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029206] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029213] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029221] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029228] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029236] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029244] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029251] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029258] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.030909] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.031075] E [socket.c:2868:socket_server_event_handler] 0-socket.management: could not create pipe [2016-05-16 06:52:45.041646] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.043605] E [socket.c:2616:socket_poller] 0-socket.management: error in polling loop The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:44.061901] and [2016-05-16 06:52:47.161342] [2016-05-16 06:52:47.163334] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:47.163376] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:47.163551] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.163551] and [2016-05-16 06:52:47.252524] [2016-05-16 06:52:47.254692] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:47.254771] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:47.254784] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:47.254793] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:47.254800] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:47.254809] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:47.255009] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.255009] and [2016-05-16 06:52:50.350609] [2016-05-16 06:52:50.352683] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:50.352756] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:50.352909] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.352909] and [2016-05-16 06:52:50.448995] [2016-05-16 06:52:50.451250] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:50.451338] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:50.451353] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:50.451363] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:50.451371] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:50.451381] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:50.451505] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.451505] and [2016-05-16 06:52:53.542390] [2016-05-16 06:52:53.544425] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:53.544467] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the star: [2016-05-16 06:53:00.020201] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:53:00.020216] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:53:00.020225] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:53:00.020233] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:53:00.020243] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:53:00.020359] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 06:56:30.764560] I [MSGID: 106487] [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2016-05-16 07:27:02.476621] E [MSGID: 101012] [logging.c:376:gf_log_rotate] 0-logrotate: failed to open logfile [Too many open files] [2016-05-16 07:27:02.476615] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476753] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476902] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476983] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477002] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477010] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files" repeated 13 times between [2016-05-16 07:27:02.477010] and [2016-05-16 07:27:02.477115] [2016-05-16 07:27:02.477124] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files" repeated 4 times between [2016-05-16 07:27:02.477124] and [2016-05-16 07:27:02.477170] [2016-05-16 07:27:02.477177] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files" repeated 6 times between [2016-05-16 07:27:02.477177] and [2016-05-16 07:27:02.477235] [2016-05-16 07:27:02.477242] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files" repeated 7 times between [2016-05-16 07:27:02.477242] and [2016-05-16 07:27:02.477312] [2016-05-16 07:27:02.477319] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477343] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477350] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/read-ahead.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477371] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/md-cache.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477390] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.477390] and [2016-05-16 07:27:02.477412] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files" repeated 9 times between [2016-05-16 07:27:02.476615] and [2016-05-16 07:27:02.478514] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.478527] and [2016-05-16 07:27:02.478556] [2016-05-16 08:04:49.794095] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:04:49.849469] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume rep [2016-05-16 08:04:49.851449] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume rep is not started [2016-05-16 08:04:55.999928] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:04:56.000034] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:04:56.000059] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:04:56.000096] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:04:56.000107] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:04:56.000117] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:05:27.990060] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:05:27.990090] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:05:27.990157] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:05:27.990177] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:05:27.990246] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:05:27.990272] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 10 times between [2016-05-16 08:04:49.794095] and [2016-05-16 08:05:27.990523] [2016-05-16 08:13:05.184001] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:13:05.186580] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:05.186639] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:05.186654] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:05.186665] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:05.186675] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:05.186690] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:13:06.581765] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:06.581767] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:06.581833] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:06.581722] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:06.581761] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:06.581873] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 5 times between [2016-05-16 08:13:05.184001] and [2016-05-16 08:13:06.581980] [2016-05-16 08:19:40.223288] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2016-05-16 08:19:40.239294] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req (END)
Each encrypted connections create a pipe (which uses 2 fds), which are not closed after disconnections. This is an fd leak, which can lead to resource starvation and lead to other operations failing afterwards.
Upstream patch http://review.gluster.org/14356 posted for review. I'd prefer to mark it as known issue as restarting glusterd could be the work around to get rid of this issue.
Kaushal, Could you fill up the doc text marking it as a known issue? ~Atin
Not just volume restart,I am seeing this log message whenever I do a replace-brick/add-brick/enable quota etc.: [socket.c:2618:socket_poller] 0-tcp.testvol-server: error in polling loop
clearing the needinfo from Kaushal as the doc text is already provided.
http://review.gluster.org/14891 in 3.8 branch is merged.
Upstream mainline : http://review.gluster.org/14694 Upstream 3.8 : http://review.gluster.org/14891 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
This bug is verified using the build - glusterfs-3.8.4-2 Repeated mentioned reproducing steps..issue no more exist Moving to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html