Bug 1437318 - device info doesn't list all the underlying bricks after concurrent operations such as device remove and volume create are run in parallel
Summary: device info doesn't list all the underlying bricks after concurrent operation...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.5
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1415600
TreeView+ depends on / blocked
 
Reported: 2017-03-30 04:29 UTC by krishnaram Karthick
Modified: 2017-04-20 18:38 UTC (History)
7 users (show)

Fixed In Version: heketi-4.0.0-6.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-20 18:38:34 UTC
Embargoed:


Attachments (Terms of Use)
topology info, heketi logs, gluster vol info outputs (18.08 KB, application/zip)
2017-03-30 04:36 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1111 0 normal SHIPPED_LIVE heketi bug fix and enhancement update 2017-04-20 22:37:02 UTC

Description krishnaram Karthick 2017-03-30 04:29:38 UTC
Description of problem:

On a CNS setup when volume provisioning + device remove operations are run in parallel, although all the operations succeed, device info doesn't list all the bricks. i.e., only 6 out of 8 bricks are listed in device info. 

Device remove operation on such a device which has inconsistent device info could lead into situation where bricks are not cleaned up completely.

[root@dhcp46-202 ~]# heketi-cli device info 11b2fd1b238afb0b62dbd4a7d3d42263
Device Id: 11b2fd1b238afb0b62dbd4a7d3d42263
Name: /dev/sde
State: online
Size (GiB): 299
Used (GiB): 10
Free (GiB): 289
Bricks:
Id:0092b14fbdf176ead4339881527a07a8   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_0092b14fbdf176ead4339881527a07a8/brick
Id:b66a520fddb08b8632c22aabe9319e24   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_b66a520fddb08b8632c22aabe9319e24/brick
Id:bb1c1eb34ed64c8894adb2392711e9ec   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_bb1c1eb34ed64c8894adb2392711e9ec/brick
Id:cf16d487542871c69b0abc4a6adf6bd3   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_cf16d487542871c69b0abc4a6adf6bd3/brick
Id:e7d45280d809f5fa41cb47e042d3eb19   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_e7d45280d809f5fa41cb47e042d3eb19/brick
Id:fb7950d913ab25f360709183741fabc0   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_fb7950d913ab25f360709183741fabc0/brick

[please refer topology info for detailed output]


[root@dhcp47-78 ~]# pvs
  PV         VG                                  Fmt  Attr PSize   PFree  
  /dev/sda2  rhel_dhcp47-183                     lvm2 a--  100.00g      0 
  /dev/sdd   vg_0ab42999bff09fb8519983c08747a5ae lvm2 a--  299.87g 299.87g
  /dev/sde   vg_11b2fd1b238afb0b62dbd4a7d3d42263 lvm2 a--  299.87g 287.78g
  /dev/sdf   vg_a0d0eb24659c776d974d7f9ade26c425 lvm2 a--   99.87g  99.87g

[root@dhcp47-78 ~]# lvs
  LV                                     VG                                  Attr       LSize  Pool                                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home                                   rhel_dhcp47-183                     -wi-ao---- 50.00g                                                                                   
  root                                   rhel_dhcp47-183                     -wi-ao---- 50.00g                                                                                   
  brick_0092b14fbdf176ead4339881527a07a8 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  2.00g tp_0092b14fbdf176ead4339881527a07a8        0.74                                   
  brick_59712ce42fe1412fb2259be73e164e56 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  1.00g tp_59712ce42fe1412fb2259be73e164e56        1.12                                   
  brick_b66a520fddb08b8632c22aabe9319e24 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  2.00g tp_b66a520fddb08b8632c22aabe9319e24        10.47                                  
  brick_bb1c1eb34ed64c8894adb2392711e9ec vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  2.00g tp_bb1c1eb34ed64c8894adb2392711e9ec        10.47                                  
  brick_bd202a73e2e54924bd2dfc94927569ad vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  1.00g tp_bd202a73e2e54924bd2dfc94927569ad        1.12                                   
  brick_cf16d487542871c69b0abc4a6adf6bd3 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  1.00g tp_cf16d487542871c69b0abc4a6adf6bd3        20.65                                  
  brick_e7d45280d809f5fa41cb47e042d3eb19 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  1.00g tp_e7d45280d809f5fa41cb47e042d3eb19        1.15                                   
  brick_fb7950d913ab25f360709183741fabc0 vg_11b2fd1b238afb0b62dbd4a7d3d42263 Vwi-aotz--  2.00g tp_fb7950d913ab25f360709183741fabc0        10.56                                  
  tp_0092b14fbdf176ead4339881527a07a8    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  2.00g                                            0.74   0.33                            
  tp_59712ce42fe1412fb2259be73e164e56    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  1.00g                                            1.12   0.49                            
  tp_b66a520fddb08b8632c22aabe9319e24    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  2.00g                                            10.47  0.52                            
  tp_bb1c1eb34ed64c8894adb2392711e9ec    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  2.00g                                            10.47  0.52                            
  tp_bd202a73e2e54924bd2dfc94927569ad    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  1.00g                                            1.12   0.49                            
  tp_cf16d487542871c69b0abc4a6adf6bd3    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  1.00g                                            20.65  0.78                            
  tp_e7d45280d809f5fa41cb47e042d3eb19    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  1.00g                                            1.15   0.49                            
  tp_fb7950d913ab25f360709183741fabc0    vg_11b2fd1b238afb0b62dbd4a7d3d42263 twi-aotz--  2.00g                                            10.56  0.52                            


10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_fb7950d913ab25f360709183741fabc0/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_cf16d487542871c69b0abc4a6adf6bd3/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_b66a520fddb08b8632c22aabe9319e24/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_59712ce42fe1412fb2259be73e164e56/brick    --> Missing
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_0092b14fbdf176ead4339881527a07a8/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_e7d45280d809f5fa41cb47e042d3eb19/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_bb1c1eb34ed64c8894adb2392711e9ec/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_bd202a73e2e54924bd2dfc94927569ad/brick  --> Missing


Following two brick information is missing under device info:

10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_59712ce42fe1412fb2259be73e164e56/brick
10.70.47.78:/var/lib/heketi/mounts/vg_11b2fd1b238afb0b62dbd4a7d3d42263/brick_bd202a73e2e54924bd2dfc94927569ad/brick

However, all volumes are created, [pls refer gluster vol info output attached]

heketi-cli node info 65deb4410d8147343febf3bb5643e176
Node Id: 65deb4410d8147343febf3bb5643e176
State: online
Cluster Id: 27ac988a3e7bf097c3c289f402c8cf24
Zone: 2
Management Hostname: dhcp47-78.lab.eng.blr.redhat.com
Storage Hostname: 10.70.47.78
Devices:
Id:11b2fd1b238afb0b62dbd4a7d3d42263   Name:/dev/sde            State:online    Size (GiB):299     Used (GiB):10      Free (GiB):289     
Id:a0d0eb24659c776d974d7f9ade26c425   Name:/dev/sdf            State:failed    Size (GiB):99      Used (GiB):0       Free (GiB):99      

corresponding volumes and pvc details:

vol_f767bea3158ccc5f4f80a1d27cc93b22 --> glusterfs-dynamic-mongodb-19
vol_a1f30eda3cb826ab34c3d8cad35cfa23 --> glusterfs-dynamic-mongodb-6


Version-Release number of selected component (if applicable):
heketi-client-4.0.0-4.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. Have a CNS setup with node-{1,2,3}, device-{1,2}
2. Keep creating volumes until the test is over
3. Run device disable and remove on node-1, device-{1,2} one by one. device remove will proceed for device-1 while device-2 becomes offline, remove device on device-2 will fail
4. Once remove device on node-1, device 1 completes, stop volume creation
5. check if the brick count on node-1 device-2 corresponds to the number of volumes created in the background

Actual results:
brick information for 2 volumes are missing

Expected results:
brick information of all the volumes should be present

Additional info:

All logs and cli outputs shall be attached shortly.

oc get pods -o wide
NAME                             READY     STATUS    RESTARTS   AGE       IP             NODE
glusterfs-1rksf                  1/1       Running   1          1d        10.70.47.180   dhcp47-180.lab.eng.blr.redhat.com
glusterfs-3t02m                  1/1       Running   4          1d        10.70.47.51    dhcp47-51.lab.eng.blr.redhat.com
glusterfs-ks6zl                  1/1       Running   1          1d        10.70.47.65    dhcp47-65.lab.eng.blr.redhat.com
glusterfs-nh11g                  1/1       Running   1          1d        10.70.47.21    dhcp47-21.lab.eng.blr.redhat.com
glusterfs-qzrdm                  1/1       Running   1          1d        10.70.47.78    dhcp47-78.lab.eng.blr.redhat.com
glusterfs-z89sm                  1/1       Running   1          1d        10.70.46.165   dhcp46-165.lab.eng.blr.redhat.com
heketi-1-j6b9n                   1/1       Running   1          1d        10.130.2.10    dhcp46-165.lab.eng.blr.redhat.com
mongodb-1-1-tmrc5                1/1       Running   1          1d        10.129.2.5     dhcp47-65.lab.eng.blr.redhat.com
mongodb-19-1-tgpd1               1/1       Running   0          56m       10.130.2.12    dhcp46-165.lab.eng.blr.redhat.com
mongodb-2-1-cgcqv                1/1       Running   1          1d        10.128.2.12    dhcp47-21.lab.eng.blr.redhat.com
mongodb-20-1-0l4b8               1/1       Running   0          57m       10.129.2.7     dhcp47-65.lab.eng.blr.redhat.com
mongodb-3-1-0518g                1/1       Running   2          1h        10.129.0.8     dhcp47-51.lab.eng.blr.redhat.com
mongodb-4-1-wlprf                1/1       Running   0          1h        10.128.2.13    dhcp47-21.lab.eng.blr.redhat.com
mongodb-5-1-cxf1h                1/1       Running   0          57m       10.130.0.6     dhcp47-78.lab.eng.blr.redhat.com
mongodb-6-1-p7nw1                1/1       Running   0          56m       10.131.0.7     dhcp47-180.lab.eng.blr.redhat.com
storage-project-router-1-l68vj   1/1       Running   1          2d        10.70.47.78    dhcp47-78.lab.eng.blr.redhat.com

Comment 2 krishnaram Karthick 2017-03-30 04:36:03 UTC
Created attachment 1267412 [details]
topology info, heketi logs, gluster vol info outputs

Comment 6 Michael Adam 2017-04-05 12:20:36 UTC
We have an RCA and a patch is underway.
This is very critical to fix, even though it is not a regression.
Hence giving devel-ack.

Comment 10 Raghavendra Talur 2017-04-07 15:04:29 UTC
Patch posted and merged upstream https://github.com/heketi/heketi/pull/736

Comment 11 krishnaram Karthick 2017-04-11 10:24:44 UTC
The issue reported is not seen anymore with heketi-4.0.0-6.el7rhgs. Ran the following test

1) volume create + device remove - 3 iterations
2) volume delete + device remove

Moving the bug to verified.

Comment 12 errata-xmlrpc 2017-04-20 18:38:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1111


Note You need to log in before you can comment on or make changes to this bug.