Bug 1597648 - "gluster vol heal <volname> info" is locked forever
Summary: "gluster vol heal <volname> info" is locked forever
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: RHGS 3.3.1 Async
Assignee: Pranith Kumar K
QA Contact: Vijay Avuthu
URL:
Whiteboard:
Depends On:
Blocks: 1597654 1598189
TreeView+ depends on / blocked
 
Reported: 2018-07-03 10:56 UTC by Vijay Avuthu
Modified: 2018-07-19 07:38 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.8.4-54.14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1597654 1598189 (view as bug list)
Environment:
Last Closed: 2018-07-19 06:00:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2222 0 None None None 2018-07-19 06:00:49 UTC

Description Vijay Avuthu 2018-07-03 10:56:56 UTC
Description of problem:

While running the automation cases, found out "gluster vol heal <volname> info" command is not returning the o/p even though there is not much heal pending

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-54.13.el7rhgs.x86_64

How reproducible:

Always

Steps to Reproduce:

Automation test ID : test_self_heal_while_running_dbench

subcase :

EXECUTING IO ON MOUNT: dbench_case8


Actual results:

# date
Tue Jul  3 06:43:34 EDT 2018
#
# tail -f distaf_tests_afr_glusterfs_9.log

2018-07-02 12:48:40,271 INFO run Executing gluster volume heal testvol info > /tmp/heal_info on rhsauto056.lab.eng.blr.redhat.com


Expected results:


Additional info:

From state dump:

[root@rhsauto047 gluster]# less bricks-brick0-testvol_brick3.15456.dump.1530610309

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 27761, owner=0000000000
000000, client=0x7ff9380bb370, connection-id=rhsauto028.lab.eng.blr.redhat.com-27361-2018/07/0
2-15:06:17:567933-testvol-client-3-0-8, granted at 2018-07-02 16:53:38
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 27761, owner=40080fd8ea7f0000, client=0x7ff9380bb370, connection-id=rhsauto028.lab.eng.blr.redhat.com-27361-2018/07/02-15:06:17:567933-testvol-client-3-0-8, blocked at 2018-07-03 09:22:03

brick log:

[2018-07-02 16:53:38.542862] E [inodelk.c:304:__inode_unlock_lock] 0-testvol-locks:  Matching lock not found for unlock 0-9223372036854775807, by 40080fd8ea7f0000 on 0x7ff9380bb370
[2018-07-02 16:53:38.542899] E [MSGID: 115053] [server-rpc-fops.c:325:server_finodelk_cbk] 0-testvol-server: 4274661: FINODELK 4 (eac1b574-bbfd-4bad-a358-96727d37d822) ==> (Invalid argument) [Invalid argument]

Comment 3 Ravishankar N 2018-07-03 11:05:27 UTC
Karthik tells me that Pranith found what the issue is and is working on the fix, hence assigning it to him.

Comment 5 Pranith Kumar K 2018-07-03 16:28:00 UTC
hi Vijay,
    As per code I could only find one possibility of the issue we saw happening. As discussed I created test rpms which I believe should fix the issue. Please let me know how this build performs in that test-case.

Link to test-rpm:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16982140

Pranith

Comment 7 Vijay Avuthu 2018-07-04 16:02:41 UTC
Update:
========

I have run the same test case on below build ( took from comment 5 ) 

glusterfs-3.8.4-54.13.3.git8f9ead154.el7rhgs.x86_64

and the test case passed , there is no lagging in getting volume heal info

2018-07-04 11:13:11,495 INFO wrapper Ending test: test_self_heal_while_running_dbench
2018-07-04 11:13:11,496 INFO wrapper Test execution time for test_self_heal_while_running_dbench: 5:41:22
2018-07-04 11:13:11,497 INFO <module> Total Tests Executed : 1
2018-07-04 11:13:11,497 INFO <module> Total Tests Passed : 1
2018-07-04 11:13:11,497 INFO <module> Total Tests Failed : 0
2018-07-04 11:13:11,498 INFO <module> List of Failed tests: [""]


> logs with new test-rpm which shows "heal info" ( this is the point where heal info stuck last time ).

2018-07-04 08:29:09,501 INFO run root.eng.blr.redhat.com (cp): gluster volume heal testvol info > /tmp/heal_info
2018-07-04 08:29:22,928 INFO run RETCODE: 0
2018-07-04 08:29:22,929 INFO get_heal_info Successfully got heal info on the volume testvol
2018-07-04 08:29:22,929 INFO run root.eng.blr.redhat.com (cp): gluster volume info testvol --xml
2018-07-04 08:29:23,135 INFO run RETCODE: 0

Comment 9 Vijay Avuthu 2018-07-10 11:27:31 UTC
Update:
========

glusterfs-3.8.4-54.14.el7rhgs.x86_64


> Ran the same test in above build and test cases passed without any errors

 INFO wrapper Testcase test_self_heal_while_running_dbench passed
 INFO wrapper Ending test: test_self_heal_while_running_dbench
 INFO wrapper Test execution time for test_self_heal_while_running_dbench: 3:18:08


Hence moving to Verified

Comment 11 errata-xmlrpc 2018-07-19 06:00:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2222


Note You need to log in before you can comment on or make changes to this bug.