Bug 1412136 - [RHV-RHGS]: Rebalance process hung infinitely, which triggered after adding bricks.
Summary: [RHV-RHGS]: Rebalance process hung infinitely, which triggered after adding b...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Raghavendra G
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-11 10:43 UTC by Byreddy
Modified: 2019-12-31 07:23 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-22 08:24:13 UTC
Embargoed:


Attachments (Terms of Use)

Description Byreddy 2017-01-11 10:43:20 UTC
Description of problem:
========================
Rebalance process hung infinitely which triggered after adding bricks.

i was testing VM use case, converted the 1*3 volume to 2*3 volume and triggered the rebalance from RHEV engine.

rebalance  operation didn't completed.

Looks like this issue happned because of stale lock.
frame status remains same in the volume state dump taken in 15min interval.

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.8.4-11

How reproducible:
=================
One time


Steps to Reproduce:
===================
1.Have RHV-RHGS SETUP with 3 rhgs nodes and 2 clients (hosts)
2.create a 1 *3 volume 
3.create a Application VM using the storage created in step-2.
4.convert volume to 2*3 volume and trigger the rebalance.


Actual results:
===============
Rebalance process hung infinitely, which triggered after adding bricks.


Expected results:
=================
Rebalance process should not hung 

Additional info:

Comment 3 Byreddy 2017-01-11 10:49:49 UTC
frame status remains same in the volume state dump taken 15min interval.

[root@dhcp42-35 gluster]# grep -A2 -B2 -E "ACTIVE|BLOCK" ./rhs-brick2-br1.16325.dump.1484126221
lock-dump.domain.domain=Dis-Rep1-replicate-1:metadata
lock-dump.domain.domain=dht.layout.heal
inodelk.inodelk[0](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22549, owner=8084ba1bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:20
inodelk.inodelk[1](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22544, owner=78abb91bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:16
inodelk.inodelk[2](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22527, owner=9c58b51bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:16
inodelk.inodelk[3](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22527, owner=5cccb51bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:15
inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=a8a7410e827f0000, client=0x7f0fe410c710, connection-id=dhcp43-245.lab.eng.blr.redhat.com-23190-2017/01/11-05:57:18:82599-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:23
inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=9854e5d9147f0000, client=0x7f0fdc000cb0, connection-id=dhcp42-105.lab.eng.blr.redhat.com-7831-2017/01/11-05:57:18:96386-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:23
inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=20e66ccb027f0000, client=0x7f0fe40f13c0, connection-id=dhcp42-35.lab.eng.blr.redhat.com-24794-2017/01/11-05:57:13:32093-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:24
lock-dump.domain.domain=Dis-Rep1-replicate-1

[root@dhcp42-35 gluster]# 
[root@dhcp42-35 gluster]# grep -A2 -B2 -E "ACTIVE|BLOCK"  ./rhs-brick2-br1.16325.dump.1484129309
lock-dump.domain.domain=Dis-Rep1-replicate-1:metadata
lock-dump.domain.domain=dht.layout.heal
inodelk.inodelk[0](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22549, owner=8084ba1bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:20
inodelk.inodelk[1](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22544, owner=78abb91bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:16
inodelk.inodelk[2](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22527, owner=9c58b51bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:16
inodelk.inodelk[3](ACTIVE)=type=READ, whence=0, start=0, len=0, pid = 22527, owner=5cccb51bd57f0000, client=0x7f0fdc0081e0, connection-id=rhs-client30.lab.eng.blr.redhat.com-11057-2017/01/10-06:35:34:507910-Dis-Rep1-client-3-15-0, granted at 2017-01-11 05:49:15
inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=a8a7410e827f0000, client=0x7f0fe410c710, connection-id=dhcp43-245.lab.eng.blr.redhat.com-23190-2017/01/11-05:57:18:82599-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:23
inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=9854e5d9147f0000, client=0x7f0fdc000cb0, connection-id=dhcp42-105.lab.eng.blr.redhat.com-7831-2017/01/11-05:57:18:96386-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:23
inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551613, owner=20e66ccb027f0000, client=0x7f0fe40f13c0, connection-id=dhcp42-35.lab.eng.blr.redhat.com-24794-2017/01/11-05:57:13:32093-Dis-Rep1-client-3-0-0, blocked at 2017-01-11 05:57:24
lock-dump.domain.domain=Dis-Rep1-replicate-1

[root@dhcp42-35 gluster]#

Comment 4 Nithya Balachandran 2017-01-11 10:53:23 UTC
Assigning this to Raghavendra G as he has already looked at the setup.

Comment 6 Byreddy 2017-01-11 11:41:46 UTC
rebalance cli cmd status:
=========================
 ~]# gluster volume rebalance Dis-Rep1 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                6       260.5MB            18             0             0          in progress        5:43:34
       dhcp43-245.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        5:43:34
       dhcp42-105.lab.eng.blr.redhat.com                0        0Bytes             0             0             0          in progress        5:43:34
volume rebalance: Dis-Rep1: success

Comment 10 Byreddy 2017-01-16 09:41:27 UTC
I hit this issue only one time in one week of rhv-rhgs testing.


Note You need to log in before you can comment on or make changes to this bug.