Bug 1787294 - Improve logging in EC, client and lock xlator
Summary: Improve logging in EC, client and lock xlator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.5.z Batch Update 3
Assignee: Ashish Pandey
QA Contact: Pranav Prakash
URL:
Whiteboard:
Depends On: 1779760 1906420
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-02 09:01 UTC by Ashish Pandey
Modified: 2020-12-17 04:51 UTC (History)
8 users (show)

Fixed In Version: glusterfs-6.0-38
Doc Type: No Doc Update
Doc Text:
Clone Of: 1779760
Environment:
Last Closed: 2020-12-17 04:51:13 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:5603 0 None None None 2020-12-17 04:51:38 UTC

Comment 9 Pranav Prakash 2020-11-05 11:09:51 UTC
The following scenario was performed to verify the logging improvement.

1. Create a dispersed volume
2. Mount to 2 or more clients
3. Perfom IO on the clients
4. Kill few bricks
5. Bring the bricks up
6. Verify the logs


[root@dhcp43-237 ~]# gluster v status
Status of volume: testvol_dispersed
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.43.188:/gluster/bricks/brick1/t
estvol_dispersed_brick0                     49152     0          Y       107025
Brick 10.70.43.5:/gluster/bricks/brick1/tes
tvol_dispersed_brick1                       49152     0          Y       67553
Brick 10.70.43.245:/gluster/bricks/brick1/t
estvol_dispersed_brick2                     49152     0          Y       67689
Brick 10.70.41.159:/gluster/bricks/brick1/t
estvol_dispersed_brick3                     49152     0          Y       68209
Brick 10.70.43.237:/gluster/bricks/brick1/t
estvol_dispersed_brick4                     49152     0          Y       67796
Brick 10.70.43.224:/gluster/bricks/brick1/t
estvol_dispersed_brick5                     49152     0          Y       67553
Self-heal Daemon on localhost               N/A       N/A        Y       67813
Self-heal Daemon on 10.70.43.245            N/A       N/A        Y       67706
Self-heal Daemon on 10.70.43.224            N/A       N/A        Y       67571
Self-heal Daemon on 10.70.41.159            N/A       N/A        Y       68226
Self-heal Daemon on 10.70.43.5              N/A       N/A        Y       67570
Self-heal Daemon on dhcp43-188.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       107042
 
Task Status of Volume testvol_dispersed
------------------------------------------------------------------------------
There are no active volume tasks
 

Performed IO on clients, brought bricks down on couple of nodes and then up. 

----------
OBSERVATIONS:

The logs, for e.g 'remote option failed' log in the previous versions, are now providing additional information, thus improving the logging. 
Below is one eg. 

```
[2020-11-03 05:35:32.171859] W [MSGID: 114031] [client-rpc-fops_v2.c:2635:client4_0_lookup_cbk] 0-testvol_dispersed-client-2: remote operation failed. Path: /user1/testfile1.txt (00000000-0000-0000-0000-000000000000) [Transport endpoint is not connected]

[2020-11-03 05:37:34.346233] W [MSGID: 114031] [client-rpc-fops_v2.c:2116:client4_0_create_cbk] 3-testvol_dispersed-client-2: remote operation failed. Path: /file114 [Input/output error]

```

Now additional information about which operation and why it is failed can be identified from the logs.

Similarly,
```
[2020-11-03 05:37:34.737417] W [MSGID: 122053] [ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111111, remaining=000000, good=111011, bad=000100, FOP : 'CREATE' failed on '/file116' with gfid 00000000-0000-0000-0000-000000000000)
[2020-11-03 05:37:34.741320] W [MSGID: 122053] [ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000, good=111011, bad=000100, FOP : 'FLUSH' failed on gfid 09c1992b-3190-45b0-9ab8-b39e27e12e32)
[2020-11-03 05:37:34.761148] W [MSGID: 122053] [ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111011, remaining=000000, good=111011, bad=000100, FOP : 'WRITE' failed on gfid 09c1992b-3190-45b0-9ab8-b39e27e12e32)

[2020-11-03 05:38:00.132336] W [MSGID: 122053] [ec-common.c:329:ec_check_status] 3-testvol_dispersed-disperse-0: Operation failed on 1 of 6 subvolumes.(up=101111, mask=101111, remaining=000000, good=101011, bad=000100, FOP : 'LOOKUP' failed on '/user38' with gfid 585b50e9-303c-47f4-a67c-75db7fa73878)

2020-11-03 05:38:02.168378] W [MSGID: 122033] [ec-common.c:1914:ec_locked] 3-testvol_dispersed-disperse-0: Failed to complete preop lock [Stale file handle]


```


The above observations are verified in the following glusterfs version : 
```
glusterfs-6.0-46.el8rhgs.x86_64

```
------------------------------------
Marking it as Verifed.

Comment 11 errata-xmlrpc 2020-12-17 04:51:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603


Note You need to log in before you can comment on or make changes to this bug.