Bug 1024643 - quota+rebalance: bricks are down and rebalance results in stopped status [NEEDINFO]
quota+rebalance: bricks are down and rebalance results in stopped status
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: quota (Show other bugs)
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: Vijaikumar Mallikarjuna
Depends On:
  Show dependency treegraph
Reported: 2013-10-30 03:13 EDT by Saurabh
Modified: 2016-09-17 08:38 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2015-11-17 03:51:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
vmallika: needinfo? (saujain)

Attachments (Terms of Use)

  None (edit)
Description Saurabh 2013-10-30 03:13:05 EDT
Description of problem:

Well, in this case I was having a volume with quota enabled and limit set of root of the volume, the underneath dorectories and some I/O going on.
I/O was happening over nfs mount, in two different directories.

Invoked an add-brick and rebalacnce following it.

Since the data inside volume was quite high, roundabout 2.4 TB, hence rebalance kept on going for long.

But, rebalance resulted in with status as "stopped" on all nodes of the cluster.

The cause for rebalance to stop was the bricks going down. 
But could not find the reason why the bricks went down. this remains a question!

Version-Release number of selected component (if applicable):

How reproducible:
seen on one set of rhs cluster of 4 nodes.

Steps to Reproduce:
Rather I am providing you the scenario, that I saw this issue,
1. quota enabled volume with limit set on "/" and directoried underneath
2. already having data inside the directorires.
3. in two more directories, keep creating data. with two different script executions, but both execution happening in parallel.

4. add-brick + rebalance

Actual results:
[root@quota5 ~]# gluster volume rebalance dist-rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------   ------------   --------------
                               localhost           149234         1.6GB        477247             5           534        stopped         62895.00
                              87293       977.8MB        735462            20        184122        stopped         62894.00
                                447         8.7MB        712639             5           235        stopped         62894.00
                                  0        0Bytes        712494             5             0        stopped         62892.00
       rhsauto004.lab.eng.blr.redhat.com                0        0Bytes        713601             5             0        stopped         62893.00

because bricks went down,
[root@quota6 ~]# gluster volume status
Status of volume: dist-rep
Gluster process						Port	Online	Pid
Brick			N/A	N	9376
Brick			N/A	N	9157
Brick			49152	Y	9151
Brick			49152	Y	9148
Brick			49153	Y	9387
Brick			49153	Y	9168
Brick			49153	Y	9162
Brick			49153	Y	9159
Brick			N/A	N	9398
Brick			49154	Y	9179
Brick			49154	Y	9173
Brick			49154	Y	9170
Brick			49155	Y	11217
Brick			49155	Y	10092
NFS Server on localhost					2049	Y	10104
Self-heal Daemon on localhost				N/A	Y	10111
Quota Daemon on localhost				N/A	Y	10118
NFS Server on				2049	Y	10191
Self-heal Daemon on			N/A	Y	10200
Quota Daemon on				N/A	Y	10205
NFS Server on				2049	Y	10086
Self-heal Daemon on			N/A	Y	10095
Quota Daemon on				N/A	Y	10100
NFS Server on rhsauto004.lab.eng.blr.redhat.com		2049	Y	15138
Self-heal Daemon on rhsauto004.lab.eng.blr.redhat.com	N/A	Y	15145
Quota Daemon on rhsauto004.lab.eng.blr.redhat.com	N/A	Y	15153
NFS Server on				2049	Y	11236
Self-heal Daemon on			N/A	Y	11243
Quota Daemon on				N/A	Y	11250
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    719dd624-b733-47c9-a487-296ec18544c7              2

but from logs could not make why bricks down

Expected results:
bricks going down , because rebalance does not provide a healthy status, is this happening because some of directories the quota limit were already reached.

Additional info:

[root@quota6 ~]# gluster volume quota dist-rep list 
                  Path                   Hard-limit Soft-limit   Used  Available
/                                          2.9TB       80%       1.3TB   1.6TB
/qa1                                     512.0GB       80%     421.5GB  90.5GB
/qa2                                     512.0GB       80%     399.7GB 112.3GB
/qa3                                     100.0GB       80%      83.4GB  16.6GB
/qa4                                     100.0GB       80%      83.3GB  16.7GB
/qa1/dir1                                500.0GB       80%     337.8GB 162.2GB
/qa2/dir1                                500.0GB       80%     316.4GB 183.6GB
/qa5                                     500.0GB       80%     361.7GB 138.3GB
Comment 5 Vijaikumar Mallikarjuna 2015-11-17 03:51:28 EST
Please file a new bug if this issue is still seen in 3.1.x.

Note You need to log in before you can comment on or make changes to this bug.