Bug 1394219
Summary: | Better logging when reporting failures of the kind "<file-path> Failing MKNOD as quorum is not met" | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Krutika Dhananjay <kdhananj> |
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | bugs, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-6 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1392865 | Environment: | |
Last Closed: | 2017-03-23 06:18:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1392865 | ||
Bug Blocks: | 1351528, 1393629, 1393630, 1393631 |
Description
Krutika Dhananjay
2016-11-11 12:33:23 UTC
The error string in this particular log message, had it been logged would have been much more helpful in debugging https://bugzilla.redhat.com/show_bug.cgi?id=1370350. This bug was raised just to fix that - quorum loss log messaging also printing the error string for better debuggability. Also note that the occurrence of this new information in the log can be easily verified as part of verification of https://bugzilla.redhat.com/show_bug.cgi?id=1370350 -Krutika qa_validation: This is what i see when quorum is lost and the fuse client discovers (confirmed with dev on the expected o/p, which is as below hence moving to verified BEFORE: [2017-01-10 12:17:28.684316] W [MSGID: 108001] [afr-transaction.c:804:afr_handle_quorum] 0-rep3-replicate-0: /dir1/linux-4.9.2/Documentation/devicetree/bindings/display/arm,pl11x.txt: Failing SETATTR as quorum is not met AFTER: [2017-01-10 14:21:07.187875] W [MSGID: 108001] [afr-transaction.c:806:afr_handle_quorum] 0-rep3-replicate-0: 0621b9e8-f9c2-435e-b679-267dfba99ef2: Failing WRITE as quorum is not met [Transport endpoint is not connected] test version:3.8.4-11 ===>test version:glusterfs-libs-3.8.4-3.el7rhgs.x86_64 [2017-01-11 06:21:14.112038] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f381bb905b2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f381b95675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f381b95686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f381b957fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f381b9588a0] ))))) 0-rep3-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-01-11 06:21:14.102905 (xid=0x2b87ea) [2017-01-11 06:21:14.116366] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected] [2017-01-11 06:21:14.119704] I [socket.c:3401:socket_submit_request] 0-rep3-client-0: not connected (priv->connected = 0) [2017-01-11 06:21:14.119751] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-rep3-client-0: failed to submit rpc-request (XID: 0x2b87eb Program: GlusterFS 3.3, ProgVers: 330, Proc: 16) to rpc-transport (rep3-client-0) [2017-01-11 06:21:14.119770] W [MSGID: 114031] [client-rpc-fops.c:974:client3_3_fsync_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected] [2017-01-11 06:21:14.119794] W [MSGID: 108035] [afr-transaction.c:2150:afr_changelog_fsync_cbk] 0-rep3-replicate-0: fsync(83c34e00-0e26-4089-ba4a-1f54bcb14213) failed on subvolume rep3-client-0. Transaction was WRITE [Transport endpoint is not connected] [2017-01-11 06:21:14.119948] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-0: disconnected from rep3-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-11 06:21:14.120191] E [MSGID: 114058] [client-handshake.c:1533:client_query_portmap_cbk] 0-rep3-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2017-01-11 06:21:14.409300] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-rep3-client-0: remote operation failed [2017-01-11 06:21:14.410305] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected] [2017-01-11 06:21:24.704657] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-0: disconnected from rep3-client-0. Client process will keep trying to connect to glusterd until brick's port is available =====================> test version:glusterfs-api-3.8.4-11.el7rhgs.x86_64 [2017-01-11 06:51:28.108980] W [socket.c:590:__socket_rwv] 0-rep3-client-1: readv on 10.70.47.105:49154 failed (No data available) [2017-01-11 06:51:28.111944] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff70a0b2602] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff709e7876e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff709e7887e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7ff709e79fe4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7ff709e7a874] ))))) 0-rep3-client-1: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-01-11 06:51:28.108777 (xid=0xd20c8) [2017-01-11 06:51:28.111991] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-rep3-client-1: remote operation failed [Transport endpoint is not connected] [2017-01-11 06:51:28.112031] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-1: disconnected from rep3-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2017-01-11 06:51:28.112217] W [MSGID: 114061] [client-common.c:460:client_pre_fsync] 0-rep3-client-1: (dce618ba-01fa-453d-b8ed-67af2f77d1aa) remote_fd is -1. EBADFD [File descriptor in bad state] [2017-01-11 06:51:28.112267] W [MSGID: 108035] [afr-transaction.c:2221:afr_changelog_fsync_cbk] 0-rep3-replicate-0: fsync(dce618ba-01fa-453d-b8ed-67af2f77d1aa) failed on subvolume rep3-client-1. Transaction was WRITE [File descriptor in bad state] [2017-01-11 06:51:33.425876] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-rep3-client-1: remote operation failed [2017-01-11 06:51:33.430676] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-rep3-client-1: remote operation failed [Transport endpoint is not connected] [2017-01-11 06:51:38.485138] E [MSGID: 114058] [client-handshake.c:1533:client_query_portmap_cbk] 0-rep3-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2017-01-11 06:51:38.485228] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-1: disconnected from rep3-client-1. Client process will keep trying to connect to glusterd until brick's port is available steps: 1)create a 3 node cluster 2)create a 1x3 volume 3)enable server quorum by setting cluster.server-quorum-type to server 4) mount vol on a client a do a dd of 5 gb 5)while file write is happening bring down glusterd of n2,n3 the above logs will be observed on fuse and below is the difference before and after fix BEFORE: [2017-01-10 12:17:28.684316] W [MSGID: 108001] [afr-transaction.c:804:afr_handle_quorum] 0-rep3-replicate-0: /dir1/linux-4.9.2/Documentation/devicetree/bindings/display/arm,pl11x.txt: Failing SETATTR as quorum is not met AFTER: [2017-01-10 14:21:07.187875] W [MSGID: 108001] [afr-transaction.c:806:afr_handle_quorum] 0-rep3-replicate-0: 0621b9e8-f9c2-435e-b679-267dfba99ef2: Failing WRITE as quorum is not met [Transport endpoint is not connected] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |