Description of problem: While running the automation cases, found out "gluster vol heal <volname> info" command is not returning the o/p even though there is not much heal pending Version-Release number of selected component (if applicable): glusterfs-3.8.4-54.13.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: Automation test ID : test_self_heal_while_running_dbench subcase : EXECUTING IO ON MOUNT: dbench_case8 Actual results: # date Tue Jul 3 06:43:34 EDT 2018 # # tail -f distaf_tests_afr_glusterfs_9.log 2018-07-02 12:48:40,271 INFO run Executing gluster volume heal testvol info > /tmp/heal_info on rhsauto056.lab.eng.blr.redhat.com Expected results: Additional info: From state dump: [root@rhsauto047 gluster]# less bricks-brick0-testvol_brick3.15456.dump.1530610309 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 27761, owner=0000000000 000000, client=0x7ff9380bb370, connection-id=rhsauto028.lab.eng.blr.redhat.com-27361-2018/07/0 2-15:06:17:567933-testvol-client-3-0-8, granted at 2018-07-02 16:53:38 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 27761, owner=40080fd8ea7f0000, client=0x7ff9380bb370, connection-id=rhsauto028.lab.eng.blr.redhat.com-27361-2018/07/02-15:06:17:567933-testvol-client-3-0-8, blocked at 2018-07-03 09:22:03 brick log: [2018-07-02 16:53:38.542862] E [inodelk.c:304:__inode_unlock_lock] 0-testvol-locks: Matching lock not found for unlock 0-9223372036854775807, by 40080fd8ea7f0000 on 0x7ff9380bb370 [2018-07-02 16:53:38.542899] E [MSGID: 115053] [server-rpc-fops.c:325:server_finodelk_cbk] 0-testvol-server: 4274661: FINODELK 4 (eac1b574-bbfd-4bad-a358-96727d37d822) ==> (Invalid argument) [Invalid argument]
Karthik tells me that Pranith found what the issue is and is working on the fix, hence assigning it to him.
SOS Reports : http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/vavuthu/bug_1597648_lock_issue/
hi Vijay, As per code I could only find one possibility of the issue we saw happening. As discussed I created test rpms which I believe should fix the issue. Please let me know how this build performs in that test-case. Link to test-rpm: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16982140 Pranith
Update: ======== I have run the same test case on below build ( took from comment 5 ) glusterfs-3.8.4-54.13.3.git8f9ead154.el7rhgs.x86_64 and the test case passed , there is no lagging in getting volume heal info 2018-07-04 11:13:11,495 INFO wrapper Ending test: test_self_heal_while_running_dbench 2018-07-04 11:13:11,496 INFO wrapper Test execution time for test_self_heal_while_running_dbench: 5:41:22 2018-07-04 11:13:11,497 INFO <module> Total Tests Executed : 1 2018-07-04 11:13:11,497 INFO <module> Total Tests Passed : 1 2018-07-04 11:13:11,497 INFO <module> Total Tests Failed : 0 2018-07-04 11:13:11,498 INFO <module> List of Failed tests: [""] > logs with new test-rpm which shows "heal info" ( this is the point where heal info stuck last time ). 2018-07-04 08:29:09,501 INFO run root.eng.blr.redhat.com (cp): gluster volume heal testvol info > /tmp/heal_info 2018-07-04 08:29:22,928 INFO run RETCODE: 0 2018-07-04 08:29:22,929 INFO get_heal_info Successfully got heal info on the volume testvol 2018-07-04 08:29:22,929 INFO run root.eng.blr.redhat.com (cp): gluster volume info testvol --xml 2018-07-04 08:29:23,135 INFO run RETCODE: 0
Update: ======== glusterfs-3.8.4-54.14.el7rhgs.x86_64 > Ran the same test in above build and test cases passed without any errors INFO wrapper Testcase test_self_heal_while_running_dbench passed INFO wrapper Ending test: test_self_heal_while_running_dbench INFO wrapper Test execution time for test_self_heal_while_running_dbench: 3:18:08 Hence moving to Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2222