Bug 1615911
Summary: | [geo-rep]: No such file or directory when a node is shut down and brought back | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rochelle <rallan> |
Component: | geo-replication | Assignee: | Kotresh HR <khiremat> |
Status: | CLOSED ERRATA | QA Contact: | Rochelle <rallan> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.4 | CC: | amukherj, apaladug, avishwan, csaba, rallan, rhs-bugs, sankarshan, storage-qa-internal, vdas |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | RHGS 3.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.12.2-16 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-04 06:52:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1503137 |
Description
Rochelle
2018-08-14 14:08:04 UTC
Analysis: When the node is brought up back, the geo-rep worker is started but could not connect to brick. Hence the Faulty state. Looking to libgfchangelog logs, rpc is failed for some reason. ====================================================== [2018-08-14 13:32:53.146652] I [MSGID: 132019] [gf-history-changelog.c:935:gf_history_changelog] 0-gfchangelog: FINAL: from: 1534253526, to: 1534253571, changes: 4 [2018-08-14 13:33:08.859846] I [MSGID: 132019] [gf-history-changelog.c:767:gf_changelog_extract_min_max] 0-gfchangelog: MIN: 1534252536, MAX: 1534253586, TOTAL CHANGELOGS: 70 [2018-08-14 13:33:08.860060] I [MSGID: 132019] [gf-history-changelog.c:935:gf_history_changelog] 0-gfchangelog: FINAL: from: 1534253526, to: 1534253586, changes: 5 [2018-08-14 13:54:03.635946] I [MSGID: 132028] [gf-changelog.c:560:gf_changelog_register_generic] 0-gfchangelog: Registering brick: /rhs/brick1/b3 [notify filter: 1] [2018-08-14 13:54:03.637715] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2018-08-14 13:54:03.637800] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3 [2018-08-14 13:54:03.637966] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2018-08-14 13:54:03.639023] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4 [2018-08-14 13:54:03.653980] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-08-14 13:54:05.654436] I [socket.c:3599:socket_submit_request] 0-gfchangelog: not connected (priv->connected = 0) [2018-08-14 13:54:05.654469] W [rpc-clnt.c:1679:rpc_clnt_submit] 0-gfchangelog: failed to submit rpc-request (XID: 0x2 Program: LIBGFCHANGELOG, ProgVers: 1, Proc: 1) to rpc-transport (gfchangelog) [2018-08-14 13:54:05.654492] E [MSGID: 132023] [gf-changelog.c:261:gf_changelog_setup_rpc] 0-gfchangelog: Could not initiate probe RPC, bailing out!!! [2018-08-14 13:54:05.654560] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-08-14 13:54:05.654891] E [MSGID: 132022] [gf-changelog.c:566:gf_changelog_register_generic] 0-gfchangelog: Error registering with changelog xlator [2018-08-14 13:54:22.711301] I [MSGID: 132028] [gf-changelog.c:560:gf_changelog_register_generic] 0-gfchangelog: Registering brick: /rhs/brick1/b3 [notify filter: 1] [2018-08-14 13:54:22.718923] I [socket.c:2474:socket_event_handler] 0-transport: EPOLLERR - disconnecting now [2018-08-14 13:54:24.719451] I [socket.c:3599:socket_submit_request] 0-gfchangelog: not connected (priv->connected = 0) [2018-08-14 13:54:24.719543] W [rpc-clnt.c:1679:rpc_clnt_submit] 0-gfchangelog: failed to submit rpc-request (XID: 0x2 Program: LIBGFCHANGELOG, ProgVers: 1, Proc: 1) to rpc-transport (gfchangelog) ========================================================= From brick logs, looks like the brick didn't come up after node is back Brick went down around 2018-08-14 13:32 The Faulty error started coming error at around 2018-08-14 13:54 So assuming the node came back around 13:54, so there is no brick log around that with graph loaded. So brick didn't come up ======================== master-client-2-0-0 [2018-08-14 13:32:49.923031] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.41.253" [2018-08-14 13:32:49.923099] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:49.923140] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp41-253.lab.eng.blr.redhat.com-30249-2018/08/14-13:32:49:818533-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:32:50.000634] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.41.253" [2018-08-14 13:32:50.000703] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:50.000743] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp41-253.lab.eng.blr.redhat.com-30252-2018/08/14-13:32:49:880584-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:32:50.520496] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.43.163" [2018-08-14 13:32:50.520636] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:50.520677] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp43-163.lab.eng.blr.redhat.com-31883-2018/08/14-13:32:50:413587-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:32:50.639809] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.42.60" [2018-08-14 13:32:50.639878] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:50.639915] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp42-60.lab.eng.blr.redhat.com-20854-2018/08/14-13:32:50:531903-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:32:50.705107] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.43.163" [2018-08-14 13:32:50.705235] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:50.705291] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp43-163.lab.eng.blr.redhat.com-31895-2018/08/14-13:32:50:595040-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:32:50.764690] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.42.60" [2018-08-14 13:32:50.764778] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 5710b280-df15-4de3-a609-c93c418b2f5c [2018-08-14 13:32:50.764839] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp42-60.lab.eng.blr.redhat.com-20857-2018/08/14-13:32:50:640529-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:44:58.612896] I [addr.c:55:compare_addr_and_update] 0-/rhs/brick1/b3: allowed = "*", received addr = "10.70.42.41" [2018-08-14 13:44:58.613025] I [MSGID: 115029] [server-handshake.c:817:server_setvolume] 0-master-server: accepted client from dhcp42-41.lab.eng.blr.redhat.com-8160-2018/08/14-13:44:58:502057-master-client-2-0-0 (version: 3.12.2) [2018-08-14 13:45:55.989063] W [socket.c:593:__socket_rwv] 0-glusterfs: readv on 10.70.41.253:24007 failed (No data available) [2018-08-14 13:45:55.989163] I [glusterfsd-mgmt.c:2349:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 10.70.41.253 [2018-08-14 13:45:55.989193] I [glusterfsd-mgmt.c:2370:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2018-08-14 13:46:00.003174] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fd657e6fdd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x557d5e50c575] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x557d5e50c3eb] ) 0-: received signum (15), shutting down [2018-08-14 13:46:00.055398] W [socket.c:593:__socket_rwv] 0-master-changelog: readv on /var/run/gluster/.2bb646d408433506236fffd6edfc4b4830209.sock failed (No data available) [2018-08-14 13:46:00.055612] I [timer.c:89:gf_timer_call_cancel] (-->/usr/lib64/glusterfs/3.12.2/xlator/features/changelog.so(+0x191fb) [0x7fd64a5e91fb] -->/lib64/libgfrpc.so.0(rpc_clnt_disable+0x4b) [0x7fd658dd943b] -->/lib64/libglusterfs.so.0(gf_timer_call_cancel+0x179) [0x7fd65901efc9] ) 0-timer: ctx cleanup started [2018-08-14 13:46:00.805317] E [socket.c:3289:socket_connect] 0-glusterfs: connection attempt on 10.70.41.253:24007 failed, (Network is unreachable) [2018-08-14 13:46:00.805471] I [socket.c:3599:socket_submit_request] 0-glusterfs: not connected (priv->connected = -1) [2018-08-14 13:46:00.805489] W [rpc-clnt.c:1679:rpc_clnt_submit] 0-glusterfs: failed to submit rpc-request (XID: 0x6 Program: Gluster Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs) ========================================================================= So geo-rep's behaviour is expected to be Faulty until the brick is completely up. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607 |