Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1394219 - Better logging when reporting failures of the kind "<file-path> Failing MKNOD as quorum is not met"
Better logging when reporting failures of the kind "<file-path> Failing MKNOD...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.2
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.2.0
Assigned To: Krutika Dhananjay
nchilaka
: Triaged
Depends On: 1392865
Blocks: 1351528 1393629 1393630 1393631
  Show dependency treegraph
 
Reported: 2016-11-11 07:33 EST by Krutika Dhananjay
Modified: 2017-03-23 02:18 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1392865
Environment:
Last Closed: 2017-03-23 02:18:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 05:18:45 EDT

  None (edit)
Description Krutika Dhananjay 2016-11-11 07:33:23 EST
+++ This bug was initially created as a clone of Bug #1392865 +++

Description of problem:

Often when debugging quorum failures, it helps to know the exact error code because of which the fop is being failed.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2016-11-08 06:56:35 EST ---

REVIEW: http://review.gluster.org/15800 (cluster/afr: When failing fop due to lack of quorum, also log error string) posted (#1) for review on master by Krutika Dhananjay (kdhananj@redhat.com)

--- Additional comment from Worker Ant on 2016-11-09 05:27:50 EST ---

REVIEW: http://review.gluster.org/15800 (cluster/afr: When failing fop due to lack of quorum, also log error string) posted (#2) for review on master by Krutika Dhananjay (kdhananj@redhat.com)

--- Additional comment from Worker Ant on 2016-11-09 10:35:02 EST ---

REVIEW: http://review.gluster.org/15800 (cluster/afr: When failing fop due to lack of quorum, also log error string) posted (#3) for review on master by Krutika Dhananjay (kdhananj@redhat.com)

--- Additional comment from Worker Ant on 2016-11-09 19:45:19 EST ---

COMMIT: http://review.gluster.org/15800 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
------
commit f2e6cf9d9c578c58cdd348ebcd14dcbfdfbafadd
Author: Krutika Dhananjay <kdhananj@redhat.com>
Date:   Tue Nov 8 17:24:41 2016 +0530

    cluster/afr: When failing fop due to lack of quorum, also log error string
    
    Change-Id: I39de2bcfc660f23a4229a885a0e0420ca949ffc9
    BUG: 1392865
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/15800
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Comment 2 Krutika Dhananjay 2016-11-11 07:41:39 EST
The error string in this particular log message, had it been logged would have been much more helpful in debugging https://bugzilla.redhat.com/show_bug.cgi?id=1370350.

This bug was raised just to fix that - quorum loss log messaging also printing the error string for better debuggability.

Also note that the occurrence of this new information in the log can be easily verified as part of verification of https://bugzilla.redhat.com/show_bug.cgi?id=1370350

-Krutika
Comment 4 Krutika Dhananjay 2016-11-14 05:22:45 EST
https://code.engineering.redhat.com/gerrit/#/c/90066/
Comment 7 nchilaka 2017-01-11 03:40:16 EST
qa_validation:
This is what i see when quorum is lost and the fuse client discovers
(confirmed with dev on the expected o/p, which is as below hence moving to verified

BEFORE:

[2017-01-10 12:17:28.684316] W [MSGID: 108001] [afr-transaction.c:804:afr_handle_quorum] 0-rep3-replicate-0: /dir1/linux-4.9.2/Documentation/devicetree/bindings/display/arm,pl11x.txt: Failing SETATTR as quorum is not met

AFTER:

[2017-01-10 14:21:07.187875] W [MSGID: 108001] [afr-transaction.c:806:afr_handle_quorum] 0-rep3-replicate-0: 0621b9e8-f9c2-435e-b679-267dfba99ef2: Failing WRITE as quorum is not met [Transport endpoint is not connected]


test version:3.8.4-11




===>test version:glusterfs-libs-3.8.4-3.el7rhgs.x86_64
 
[2017-01-11 06:21:14.112038] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f381bb905b2] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f381b95675e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f381b95686e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f381b957fc4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x7f381b9588a0] ))))) 0-rep3-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-01-11 06:21:14.102905 (xid=0x2b87ea)
[2017-01-11 06:21:14.116366] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected]
[2017-01-11 06:21:14.119704] I [socket.c:3401:socket_submit_request] 0-rep3-client-0: not connected (priv->connected = 0)
[2017-01-11 06:21:14.119751] W [rpc-clnt.c:1640:rpc_clnt_submit] 0-rep3-client-0: failed to submit rpc-request (XID: 0x2b87eb Program: GlusterFS 3.3, ProgVers: 330, Proc: 16) to rpc-transport (rep3-client-0)
[2017-01-11 06:21:14.119770] W [MSGID: 114031] [client-rpc-fops.c:974:client3_3_fsync_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected]
[2017-01-11 06:21:14.119794] W [MSGID: 108035] [afr-transaction.c:2150:afr_changelog_fsync_cbk] 0-rep3-replicate-0: fsync(83c34e00-0e26-4089-ba4a-1f54bcb14213) failed on subvolume rep3-client-0. Transaction was WRITE [Transport endpoint is not connected]
[2017-01-11 06:21:14.119948] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-0: disconnected from rep3-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-11 06:21:14.120191] E [MSGID: 114058] [client-handshake.c:1533:client_query_portmap_cbk] 0-rep3-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-11 06:21:14.409300] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-rep3-client-0: remote operation failed
[2017-01-11 06:21:14.410305] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-rep3-client-0: remote operation failed [Transport endpoint is not connected]
[2017-01-11 06:21:24.704657] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-0: disconnected from rep3-client-0. Client process will keep trying to connect to glusterd until brick's port is available


=====================>
test version:glusterfs-api-3.8.4-11.el7rhgs.x86_64




[2017-01-11 06:51:28.108980] W [socket.c:590:__socket_rwv] 0-rep3-client-1: readv on 10.70.47.105:49154 failed (No data available)
[2017-01-11 06:51:28.111944] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff70a0b2602] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff709e7876e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff709e7887e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7ff709e79fe4] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7ff709e7a874] ))))) 0-rep3-client-1: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2017-01-11 06:51:28.108777 (xid=0xd20c8)
[2017-01-11 06:51:28.111991] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-rep3-client-1: remote operation failed [Transport endpoint is not connected]
[2017-01-11 06:51:28.112031] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-1: disconnected from rep3-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2017-01-11 06:51:28.112217] W [MSGID: 114061] [client-common.c:460:client_pre_fsync] 0-rep3-client-1:  (dce618ba-01fa-453d-b8ed-67af2f77d1aa) remote_fd is -1. EBADFD [File descriptor in bad state]
[2017-01-11 06:51:28.112267] W [MSGID: 108035] [afr-transaction.c:2221:afr_changelog_fsync_cbk] 0-rep3-replicate-0: fsync(dce618ba-01fa-453d-b8ed-67af2f77d1aa) failed on subvolume rep3-client-1. Transaction was WRITE [File descriptor in bad state]
[2017-01-11 06:51:33.425876] W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-rep3-client-1: remote operation failed
[2017-01-11 06:51:33.430676] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-rep3-client-1: remote operation failed [Transport endpoint is not connected]
[2017-01-11 06:51:38.485138] E [MSGID: 114058] [client-handshake.c:1533:client_query_portmap_cbk] 0-rep3-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-11 06:51:38.485228] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-rep3-client-1: disconnected from rep3-client-1. Client process will keep trying to connect to glusterd until brick's port is available





steps:
1)create a 3 node cluster
2)create a 1x3 volume
3)enable server quorum by setting cluster.server-quorum-type to server
4) mount vol on a client a do a dd of 5 gb
5)while file write is happening bring down glusterd of n2,n3
the above logs will be observed on fuse

and below is the difference before and after fix


BEFORE:

[2017-01-10 12:17:28.684316] W [MSGID: 108001] [afr-transaction.c:804:afr_handle_quorum] 0-rep3-replicate-0: /dir1/linux-4.9.2/Documentation/devicetree/bindings/display/arm,pl11x.txt: Failing SETATTR as quorum is not met

AFTER:

[2017-01-10 14:21:07.187875] W [MSGID: 108001] [afr-transaction.c:806:afr_handle_quorum] 0-rep3-replicate-0: 0621b9e8-f9c2-435e-b679-267dfba99ef2: Failing WRITE as quorum is not met [Transport endpoint is not connected]
Comment 9 errata-xmlrpc 2017-03-23 02:18:30 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.