Bug 1336339
Summary: | Sequential volume start&stop is failing with SSL enabled setup. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> | |
Component: | glusterd | Assignee: | Kaushal <kaushal> | |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | amukherj, asoman, bmohanra, kaushal, rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | Bug Fix | |
Doc Text: |
Each encrypted connections with GlusterD creates pipe and they are not closed after disconnection. This leads to resource starvation and lead to other operations failing afterwards.
Workaround: Restart the GlusterD service.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1336369 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:30:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1336371, 1387960 | |||
Bug Blocks: | 1311843, 1351522 |
Description
Byreddy
2016-05-16 08:28:50 UTC
[2016-05-16 06:52:45.028639] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028646] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028654] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028661] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028674] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028679] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.028689] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028750] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028760] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028768] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028776] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028784] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028792] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028806] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028815] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028823] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028831] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028838] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028864] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028872] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028880] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028888] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028895] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028903] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028911] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028918] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028926] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028934] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028941] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028949] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028956] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028965] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.028973] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029132] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029152] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029168] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029183] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029191] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029198] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029206] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029213] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029221] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029228] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029236] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029244] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029251] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.029258] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.030909] W [socket.c:2719:socket_server_event_handler] 0-socket.management: accept on 10 failed (Too many open files) [2016-05-16 06:52:45.031075] E [socket.c:2868:socket_server_event_handler] 0-socket.management: could not create pipe [2016-05-16 06:52:45.041646] I [socket.c:459:ssl_setup_connection] 0-socket.management: peer CN = dhcp43-215.lab.eng.blr.redhat.com [2016-05-16 06:52:45.043605] E [socket.c:2616:socket_poller] 0-socket.management: error in polling loop The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:44.061901] and [2016-05-16 06:52:47.161342] [2016-05-16 06:52:47.163334] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:47.163376] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:47.163551] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.163551] and [2016-05-16 06:52:47.252524] [2016-05-16 06:52:47.254692] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:47.254771] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:47.254784] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:47.254793] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:47.254800] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:47.254809] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:47.255009] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:47.255009] and [2016-05-16 06:52:50.350609] [2016-05-16 06:52:50.352683] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:50.352756] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the started state [2016-05-16 06:52:50.352909] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.352909] and [2016-05-16 06:52:50.448995] [2016-05-16 06:52:50.451250] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 06:52:50.451338] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:52:50.451353] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:52:50.451363] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:52:50.451371] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:52:50.451381] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:52:50.451505] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 2 times between [2016-05-16 06:52:50.451505] and [2016-05-16 06:52:53.542390] [2016-05-16 06:52:53.544425] E [MSGID: 106280] [glusterd-volume-ops.c:1653:glusterd_op_stage_stop_volume] 0-management: Volume rep is not in the started state [2016-05-16 06:52:53.544467] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Stop' failed on localhost : Volume rep is not in the star: [2016-05-16 06:53:00.020201] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 06:53:00.020216] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 06:53:00.020225] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 06:53:00.020233] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 06:53:00.020243] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 06:53:00.020359] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 06:56:30.764560] I [MSGID: 106487] [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req [2016-05-16 07:27:02.476621] E [MSGID: 101012] [logging.c:376:gf_log_rotate] 0-logrotate: failed to open logfile [Too many open files] [2016-05-16 07:27:02.476615] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476753] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476902] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.476983] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477002] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/stripe.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477010] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/debug/io-stats.so: cannot open shared object file: Too many open files" repeated 13 times between [2016-05-16 07:27:02.477010] and [2016-05-16 07:27:02.477115] [2016-05-16 07:27:02.477124] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-cache.so: cannot open shared object file: Too many open files" repeated 4 times between [2016-05-16 07:27:02.477124] and [2016-05-16 07:27:02.477170] [2016-05-16 07:27:02.477177] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/io-threads.so: cannot open shared object file: Too many open files" repeated 6 times between [2016-05-16 07:27:02.477177] and [2016-05-16 07:27:02.477235] [2016-05-16 07:27:02.477242] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/write-behind.so: cannot open shared object file: Too many open files" repeated 7 times between [2016-05-16 07:27:02.477242] and [2016-05-16 07:27:02.477312] [2016-05-16 07:27:02.477319] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477343] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/open-behind.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477350] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/read-ahead.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477371] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/performance/md-cache.so: cannot open shared object file: Too many open files [2016-05-16 07:27:02.477390] W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/encryption/crypt.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.477390] and [2016-05-16 07:27:02.477412] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/distribute.so: cannot open shared object file: Too many open files" repeated 9 times between [2016-05-16 07:27:02.476615] and [2016-05-16 07:27:02.478514] The message "W [MSGID: 101095] [xlator.c:148:xlator_volopt_dynload] 0-xlator: /usr/lib64/glusterfs/3.7.9/xlator/cluster/replicate.so: cannot open shared object file: Too many open files" repeated 2 times between [2016-05-16 07:27:02.478527] and [2016-05-16 07:27:02.478556] [2016-05-16 08:04:49.794095] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:04:49.849469] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume rep [2016-05-16 08:04:49.851449] E [MSGID: 106301] [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume rep is not started [2016-05-16 08:04:55.999928] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:04:56.000034] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:04:56.000059] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:04:56.000096] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:04:56.000107] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:04:56.000117] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:05:27.990060] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:05:27.990090] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:05:27.990157] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:05:27.990177] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:05:27.990246] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:05:27.990272] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 10 times between [2016-05-16 08:04:49.794095] and [2016-05-16 08:05:27.990523] [2016-05-16 08:13:05.184001] W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace. [2016-05-16 08:13:05.186580] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:05.186639] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:05.186654] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:05.186665] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:05.186675] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:05.186690] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed [2016-05-16 08:13:06.581765] E [MSGID: 106187] [glusterd-volume-ops.c:1486:glusterd_op_stage_start_volume] 0-management: Could not find peer on which brick dhcp43-214.lab.eng.blr.redhat.com:/bricks/brick0/brr0 resides [2016-05-16 08:13:06.581767] W [MSGID: 106122] [glusterd-mgmt.c:198:gd_mgmt_v3_pre_validate_fn] 0-management: Volume start prevalidation failed. [2016-05-16 08:13:06.581833] E [MSGID: 106122] [glusterd-mgmt.c:879:glusterd_mgmt_v3_pre_validate] 0-management: Pre Validation failed for operation Start on local node [2016-05-16 08:13:06.581722] E [MSGID: 106408] [glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: System error [Unknown error -11] [2016-05-16 08:13:06.581761] E [MSGID: 101075] [common-utils.c:3137:gf_is_local_addr] 0-management: error in getaddrinfo: System error [2016-05-16 08:13:06.581873] E [MSGID: 106122] [glusterd-mgmt.c:1991:glusterd_mgmt_v3_initiate_all_phases] 0-management: Pre Validation Failed The message "W [MSGID: 101088] [common-utils.c:3662:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 5 times between [2016-05-16 08:13:05.184001] and [2016-05-16 08:13:06.581980] [2016-05-16 08:19:40.223288] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2016-05-16 08:19:40.239294] I [MSGID: 106488] [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req (END) Each encrypted connections create a pipe (which uses 2 fds), which are not closed after disconnections. This is an fd leak, which can lead to resource starvation and lead to other operations failing afterwards. Upstream patch http://review.gluster.org/14356 posted for review. I'd prefer to mark it as known issue as restarting glusterd could be the work around to get rid of this issue. Kaushal, Could you fill up the doc text marking it as a known issue? ~Atin Not just volume restart,I am seeing this log message whenever I do a replace-brick/add-brick/enable quota etc.: [socket.c:2618:socket_poller] 0-tcp.testvol-server: error in polling loop clearing the needinfo from Kaushal as the doc text is already provided. http://review.gluster.org/14891 in 3.8 branch is merged. Upstream mainline : http://review.gluster.org/14694 Upstream 3.8 : http://review.gluster.org/14891 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. This bug is verified using the build - glusterfs-3.8.4-2 Repeated mentioned reproducing steps..issue no more exist Moving to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |