Bug 963541

Summary: glusterd : 'gluster volume status <vol_name>' some times do not show active task and sometimes it shows tasks which are not active.
Product: [Community] GlusterFS Reporter: Kaushal <kaushal>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, kirantpatil, nsathyan, racpatel, rhs-bugs, sasundar, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 961608 Environment:
Last Closed: 2013-07-24 17:40:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 961608    

Description Kaushal 2013-05-16 06:23:12 UTC
+++ This bug was initially created as a clone of Bug #961608 +++

Description of problem:
glusterd : 'gluster volume status <vol_name>' some times do not show active task and sometimes it shows tasks which are not active. 

Not able to understand which tasks should be listed and when it should be removed

Version-Release number of selected component (if applicable):
3.4.0.4rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
Observation:
1. rebalance status is there even rebalance is completed (no more active task). It will be removed only if any other task becomes active!!!

2.
a) if you do 'remove-brick start' then it will be listed under active task, till you commit it .

b) but if you run rebalance before commit then remove-brick task will be removed from ative task even though user hasnt committed it!

3. run 'remove-brick start' for one brick and before committing run 'remove-brick start' for another brick, the first task will be removed from active task list!

e.g.

1.
[root@cutlass ~]# gluster volume rebalance task start force
volume rebalance: task: success: Starting rebalance on volume task has been successful.
ID: decd28ea-d714-4be4-b3cc-0d3ef6faaa86
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta	49155	Y	25148
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25199
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26352
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26322
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5199
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    decd28ea-d714-4be4-b3cc-0d3ef6faaa86              3
[root@cutlass ~]# gluster volume rebalance task status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost              104       400.0KB           559             0      completed             7.00
             fred.lab.eng.blr.redhat.com                0        0Bytes           449             0      completed             6.00
              fan.lab.eng.blr.redhat.com              110      1000.0KB           550             0      completed             6.00
              mia.lab.eng.blr.redhat.com              106       100.0KB           553             0      completed             6.00
volume rebalance: task: success: 
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta	49155	Y	25148
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25199
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26352
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5199
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26322
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    decd28ea-d714-4be4-b3cc-0d3ef6faaa86              3


2.
 a)
[root@cutlass ~]# gluster volume remove-brick task cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta start
volume remove-brick start: success
ID: 4aa2ac47-5070-40dc-b2e8-87fddf38e7cf
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta	49155	Y	25148
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25199
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26386
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5232
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26355
 
           Task                                      ID         Status
           ----                                      --         ------
   Remove brick    4aa2ac47-5070-40dc-b2e8-87fddf38e7cf              3
[root@cutlass ~]# gluster volume remove-brick task cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta status
                                    Node Rebalanced-files          size       scanned      failures         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost               97       700.0KB           537             0      completed             4.00
             fred.lab.eng.blr.redhat.com                0        0Bytes             0             0    not started             0.00
              fan.lab.eng.blr.redhat.com                0        0Bytes             0             0    not started             0.00
              mia.lab.eng.blr.redhat.com                0        0Bytes             0             0    not started             0.00
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta	49155	Y	25148
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25199
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26386
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5232
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26355
 
           Task                                      ID         Status
           ----                                      --         ------
   Remove brick    4aa2ac47-5070-40dc-b2e8-87fddf38e7cf              3
[root@cutlass ~]# gluster volume remove-brick task cutlass.lab.eng.blr.redhat.com:/rhs/brick1/ta commit
Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y
volume remove-brick commit: success
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25293
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26410
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	N/A	N	N/A
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26372
 
There are no active volume tasks

b)
[root@cutlass ~]# gluster volume remove-brick task mia.lab.eng.blr.redhat.com:/rhs/brick1/ta start
volume remove-brick start: success
ID: 38eeda20-4238-4d11-8166-bd910d0c800a
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25318
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26427
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5249
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26396
 
           Task                                      ID         Status
           ----                                      --         ------
   Remove brick    38eeda20-4238-4d11-8166-bd910d0c800a              0
[root@cutlass ~]# gluster volume rebalance task start force
volume rebalance: task: success: Starting rebalance on volume task has been successful.
ID: d722c113-dc8e-41e5-8a4c-d6897cc4d0e1
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25318
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26427
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5249
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26396
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    d722c113-dc8e-41e5-8a4c-d6897cc4d0e1              3



3.
[root@cutlass ~]# gluster volume remove-brick task mia.lab.eng.blr.redhat.com:/rhs/brick1/ta start
volume remove-brick start: success
ID: 07053960-a720-4220-85db-bbd130296f56
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25367
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5249
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26462
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26430
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    07053960-a720-4220-85db-bbd130296f56              0
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25367
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26462
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5249
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26430
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    07053960-a720-4220-85db-bbd130296f56              0
[root@cutlass ~]# gluster volume remove-brick task mia.lab.eng.blr.redhat.com:/rhs/brick1/ta status
                                    Node Rebalanced-files          size       scanned      failures         status run-time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost                0        0Bytes             0             0    not started             0.00
             fred.lab.eng.blr.redhat.com                0        0Bytes             0             0    not started             0.00
              fan.lab.eng.blr.redhat.com                0        0Bytes             0             0    not started             0.00
              mia.lab.eng.blr.redhat.com                0        0Bytes           440             0      completed             1.00
[root@cutlass ~]# gluster volume remove-brick task fan.lab.eng.blr.redhat.com:/rhs/brick1/ta start
volume remove-brick start: success
ID: 17d1a925-c38e-4b16-84a2-ecf127325ded
[root@cutlass ~]# gluster v status task
Status of volume: task
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick mia.lab.eng.blr.redhat.com:/rhs/brick1/ta		49166	Y	5156
Brick fan.lab.eng.blr.redhat.com:/rhs/brick1/ta		49171	Y	26280
Brick fred.lab.eng.blr.redhat.com:/rhs/brick1/ta	49176	Y	26342
NFS Server on localhost					2049	Y	25432
NFS Server on 81a2750a-79f6-47f1-ae9b-961aed998238	2049	Y	26511
NFS Server on 94dda48c-1c98-4d56-b8b0-59c88e299af5	2049	Y	26430
NFS Server on 4b6d57e1-7de6-40e0-b53e-ea5331aa39cc	2049	Y	5348
 
           Task                                      ID         Status
           ----                                      --         ------
      Rebalance    17d1a925-c38e-4b16-84a2-ecf127325ded              0

Actual results:
'gluster volume status <vol_name>' some times do not show active task and sometimes it shows tasks which are not active. 

Expected results:
for all task which should be listed under active task - rebalance , remove-brick etc., behaviour should be consistent.

Additional info:

--- Additional comment from Kaushal on 2013-05-16 11:45:33 IST ---

Issues 2,3 happen because glusterd keeps track of only one rebalance/remove-brick operation on a volume at a time. Starting another rebalance/remove-brick, before commiting an earlier remove-brick will cause glusterd to stop tracking the earlier task, and start tracking earlier. Starting a new remove-brick/rebalance task on the volume shouldn't be allowed before commiting a remove-brick task. Will do the necessary changes and have a patch for review.

Comment 1 Anand Avati 2013-05-16 10:41:19 UTC
REVIEW: http://review.gluster.org/5019 (glusterd: More checks before starting rebalance/remove-brick) posted (#1) for review on master by Kaushal M (kaushal)

Comment 2 Anand Avati 2013-05-16 11:11:03 UTC
REVIEW: http://review.gluster.org/5019 (glusterd: More checks before starting rebalance/remove-brick) posted (#2) for review on master by Kaushal M (kaushal)

Comment 3 Anand Avati 2013-06-19 11:01:45 UTC
REVIEW: http://review.gluster.org/5019 (glusterd: More checks before starting rebalance/remove-brick) posted (#3) for review on master by Kaushal M (kaushal)

Comment 4 Anand Avati 2013-07-03 02:19:01 UTC
COMMIT: http://review.gluster.org/5019 committed in master by Vijay Bellur (vbellur) 
------
commit 7fd38981278c8a51587f1db5b59f8cfeed5c6e5a
Author: Kaushal M <kaushal>
Date:   Thu May 16 16:03:52 2013 +0530

    glusterd: More checks before starting rebalance/remove-brick
    
    Check if a previous remove-brick operation has been committed before
    starting a new rebalance/remove-brick task.
    
    Change-Id: I553e5ba64a6a352ca91032ab1a17997051a4494e
    BUG: 963541
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/5019
    Reviewed-by: Vijay Bellur <vbellur>
    Tested-by: Gluster Build System <jenkins.com>

Comment 5 Kiran 2014-08-27 09:21:12 UTC
I ran the 963541.t testcase from Gluster Regression Test suite and it fails and 
I also raised a testcase bug which is 1132496.

Version-Release number of selected component (if applicable):
gluster v3.5.2

How reproducible:
Always

Steps to Reproduce:
1. Install gluster v3.5.2 rpm on CentOS 6.4
2. clone the gluster from github and checkout v3.5.2 
3. DEBUG=1 prove tests/bugs/bug-963541.t

[root@fractal-0e6e glusterfs]# DEBUG=1 prove tests/bugs/bug-963541.t
tests/bugs/bug-963541.t .. =========================
TEST 1 (line 7): glusterd
tests/bugs/bug-963541.t .. 1/13 RESULT 1: 0
=========================
TEST 2 (line 8): pidof glusterd
RESULT 2: 0
=========================
TEST 3 (line 10): gluster --mode=script volume create patchy fractal-0e6e:/d/backends/patchy1 fractal-0e6e:/d/backends/patchy2 fractal-0e6e:/d/backends/patchy3
tests/bugs/bug-963541.t .. 3/13 RESULT 3: 0
=========================
TEST 4 (line 11): gluster --mode=script volume start patchy
tests/bugs/bug-963541.t .. 4/13 RESULT 4: 0
=========================
TEST 5 (line 14): gluster --mode=script volume remove-brick patchy fractal-0e6e:/d/backends/patchy1 start
tests/bugs/bug-963541.t .. 5/13 RESULT 5: 0
=========================
TEST 6 (line 16): ! gluster --mode=script volume rebalance patchy start
RESULT 6: 0
=========================
TEST 7 (line 17): ! gluster --mode=script volume remove-brick patchy fractal-0e6e:/d/backends/patchy2 start
RESULT 7: 0
=========================
TEST 8 (line 20): gluster --mode=script volume remove-brick patchy fractal-0e6e:/d/backends/patchy1 commit
volume remove-brick commit: failed: use 'force' option as migration is in progress
RESULT 8: 1
=========================
TEST 9 (line 24): gluster --mode=script volume rebalance patchy start
volume rebalance: patchy: failed: A remove-brick task on volume patchy is not yet committed. Either commit or stop the remove-brick task.
RESULT 9: 1
=========================
TEST 10 (line 25): gluster --mode=script volume rebalance patchy stop
tests/bugs/bug-963541.t .. 10/13 RESULT 10: 0
=========================
TEST 11 (line 27): gluster --mode=script volume remove-brick patchy fractal-0e6e:/d/backends/patchy2 start
tests/bugs/bug-963541.t .. 11/13 RESULT 11: 0
=========================
TEST 12 (line 28): gluster --mode=script volume remove-brick patchy fractal-0e6e:/d/backends/patchy2 stop
RESULT 12: 0
=========================
TEST 13 (line 30): gluster --mode=script volume stop patchy
volume stop: patchy: failed: rebalance session is in progress for the volume 'patchy'
RESULT 13: 1
tests/bugs/bug-963541.t .. Failed 3/13 subtests 

Test Summary Report
-------------------
tests/bugs/bug-963541.t (Wstat: 0 Tests: 13 Failed: 3)
  Failed tests:  8-9, 13
Files=1, Tests=13, 17 wallclock secs ( 0.09 usr  0.02 sys +  0.26 cusr  0.50 csys =  0.87 CPU)
Result: FAIL