Bug 1463108
Summary: | Regression: Heal info takes longer time when a brick is down | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> | |
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.3 | CC: | amukherj, rhinduja, rhs-bugs, storage-qa-internal, ubansal | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | RHGS 3.3.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | 3.3.0-devel-freeze-exception | |||
Fixed In Version: | glusterfs-3.8.4-32 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1464091 (view as bug list) | Environment: | ||
Last Closed: | 2017-09-21 04:59:42 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1417151, 1464091, 1465854 |
Description
Nag Pavan Chilakam
2017-06-20 07:11:10 UTC
check on 3.2 build and i don't notice the problem Hence marking as regression caLast login: Tue Jun 20 14:40:40 2017 from dhcp35-77.lab.eng.blr.redhat.com ca[root@dhcp41-212 ~]# cat /etc/redhat-* cat: /etc/redhat-access-insights: Is a directory Red Hat Enterprise Linux Server release 7.3 (Maipo) Red Hat Gluster Storage Server 3.2.0 [root@dhcp41-212 ~]# rpm -qa|grep gluster gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-fuse-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 nfs-ganesha-gluster-2.4.1-9.el7rhgs.x86_64 glusterfs-cli-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-rdma-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-client-xlators-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-server-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 python-gluster-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.noarch gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-api-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-debuginfo-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 vdsm-gluster-4.17.33-1.1.el7rhgs.noarch samba-vfs-glusterfs-4.4.6-5.el7rhgs.x86_64 glusterfs-libs-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 glusterfs-geo-replication-3.8.4-18.4.el7rhgs.1.HOTFIX.CASE01604226.BZ1428936.x86_64 With 2 bricks down and one entry to be healed, the o/p takes almost 2 min to respond and complete. This is too much of a drop [root@dhcp35-45 ecv]# time gluster v heal ecv info Brick 10.70.35.45:/rhs/brick1/ecv Status: Transport endpoint is not connected Number of entries: - Brick 10.70.35.23:/rhs/brick1/ecv /dir1 Status: Connected Number of entries: 1 Brick 10.70.35.122:/rhs/brick1/ecv Status: Transport endpoint is not connected Number of entries: - Brick 10.70.35.45:/rhs/brick2/ecv /dir1 Status: Connected Number of entries: 1 Brick 10.70.35.23:/rhs/brick2/ecv /dir1 Status: Connected Number of entries: 1 Brick 10.70.35.122:/rhs/brick2/ecv /dir1 Status: Connected Number of entries: 1 real 2m2.552s user 0m36.088s sys 0m33.221s [root@dhcp35-45 ecv]# Following is glfs log o/p [2017-06-22 14:28:39.901976] W [MSGID: 101174] [graph.c:360:_log_if_unknown_option] 0-ecv-readdir-ahead: option 'parallel-readdir' is not recognized [2017-06-22 14:28:39.902111] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 64686370-3335-2d34-352e-6c61622e656e (0) coming up [2017-06-22 14:28:39.902136] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-0: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.905465] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-1: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.906050] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-0: changing port to 49160 (from 0) [2017-06-22 14:28:39.908135] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-2: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.910916] E [socket.c:2360:socket_connect_finish] 0-ecv-client-0: connection to 10.70.35.45:49160 failed (Connection refused); disconnecting socket [2017-06-22 14:28:39.911327] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-1: changing port to 49155 (from 0) [2017-06-22 14:28:39.913509] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-3: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.916339] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-ecv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-06-22 14:28:39.916479] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-2: changing port to 49152 (from 0) [2017-06-22 14:28:39.918835] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-4: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.921243] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-ecv-client-1: Connected to ecv-client-1, attached to remote volume '/rhs/brick1/ecv'. [2017-06-22 14:28:39.921286] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-ecv-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-06-22 14:28:39.921520] E [socket.c:2360:socket_connect_finish] 0-ecv-client-2: connection to 10.70.35.122:49152 failed (Connection refused); disconnecting socket [2017-06-22 14:28:39.921596] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-ecv-client-1: Server lk version = 1 [2017-06-22 14:28:39.921830] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-3: changing port to 49158 (from 0) [2017-06-22 14:28:39.924133] I [MSGID: 114020] [client.c:2356:notify] 0-ecv-client-5: parent translators are ready, attempting connect on transport [2017-06-22 14:28:39.926850] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-ecv-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-06-22 14:28:39.927535] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-4: changing port to 49153 (from 0) [2017-06-22 14:28:39.927630] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-ecv-client-3: Connected to ecv-client-3, attached to remote volume '/rhs/brick2/ecv'. [2017-06-22 14:28:39.927660] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-ecv-client-3: Server and Client lk-version numbers are not same, reopening the fds [2017-06-22 14:28:39.929419] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-ecv-client-3: Server lk version = 1 [2017-06-22 14:28:39.929816] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-5: changing port to 49154 (from 0) [2017-06-22 14:28:39.932243] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-ecv-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-06-22 14:28:39.933224] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-ecv-client-4: Connected to ecv-client-4, attached to remote volume '/rhs/brick2/ecv'. [2017-06-22 14:28:39.933259] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-ecv-client-4: Server and Client lk-version numbers are not same, reopening the fds [2017-06-22 14:28:39.933557] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-ecv-client-4: Server lk version = 1 [2017-06-22 14:28:39.935075] I [MSGID: 114057] [client-handshake.c:1450:select_server_supported_programs] 0-ecv-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-06-22 14:28:39.935870] I [MSGID: 114046] [client-handshake.c:1215:client_setvolume_cbk] 0-ecv-client-5: Connected to ecv-client-5, attached to remote volume '/rhs/brick2/ecv'. [2017-06-22 14:28:39.935901] I [MSGID: 114047] [client-handshake.c:1226:client_setvolume_cbk] 0-ecv-client-5: Server and Client lk-version numbers are not same, reopening the fds [2017-06-22 14:28:39.936145] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-ecv-client-5: Server lk version = 1 [2017-06-22 14:28:43.893978] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-0: changing port to 49160 (from 0) [2017-06-22 14:28:43.896933] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-2: changing port to 49152 (from 0) [2017-06-22 14:28:43.898922] E [socket.c:2360:socket_connect_finish] 0-ecv-client-0: connection to 10.70.35.45:49160 failed (Connection refused); disconnecting socket [2017-06-22 14:28:43.901701] E [socket.c:2360:socket_connect_finish] 0-ecv-client-2: connection to 10.70.35.122:49152 failed (Connection refused); disconnecting socket [2017-06-22 14:28:47.899547] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-0: changing port to 49160 (from 0) [2017-06-22 14:28:47.903300] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-2: changing port to 49152 (from 0) [2017-06-22 14:28:47.904508] E [socket.c:2360:socket_connect_finish] 0-ecv-client-0: connection to 10.70.35.45:49160 failed (Connection refused); disconnecting socket [2017-06-22 14:28:47.907162] E [socket.c:2360:socket_connect_finish] 0-ecv-client-2: connection to 10.70.35.122:49152 failed (Connection refused); disconnecting socket [2017-06-22 14:28:50.902197] I [MSGID: 122061] [ec.c:323:ec_up] 0-ecv-disperse-0: Going UP [2017-06-22 14:28:50.905125] W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (5) The message "W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (5)" repeated 2 times between [2017-06-22 14:28:50.905125] and [2017-06-22 14:28:50.907589] [2017-06-22 14:28:50.908376] I [MSGID: 104041] [glfs-resolve.c:914:__glfs_active_subvol] 0-ecv: switched to graph 64686370-3335-2d34-352e-6c61622e656e (0) [2017-06-22 14:28:50.911613] W [MSGID: 122035] [ec-common.c:464:ec_child_select] 0-ecv-disperse-0: Executing operation with some subvolumes unavailable (5) [2017-06-22 14:28:51.909942] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-0: changing port to 49160 (from 0) [2017-06-22 14:28:51.913016] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-ecv-client-2: changing port to 49152 (from 0) [2017-06-22 14:28:51.915213] E [socket.c:2360:socket_connect_finish] 0-ecv-client-0: connection to 10.70.35.45:49160 failed (Connection refused); disconnecting socket [2017-06-22 14:28:51.917594] E [socket.c:2360:socket_connect_finish] 0-ecv-client-2: connection to 10.70.35.122:49152 failed (Connection refused); disconnecting socket upstream patch : https://review.gluster.org/17606 have retested the testcase and saw that the heal time now is reporting in about a second. hence moving to verified testversion:3.8.4-32 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774 |