Bug 1437798 - when remove device fails in the middle of migration, retrying device remove on the same device fails with error 'Id not found'
Summary: when remove device fails in the middle of migration, retrying device remove o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.9
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1526414
TreeView+ depends on / blocked
 
Reported: 2017-03-31 08:35 UTC by krishnaram Karthick
Modified: 2019-04-22 22:46 UTC (History)
9 users (show)

Fixed In Version: heketi-6.0.0-7.el7rhgs
Doc Type: Bug Fix
Doc Text:
Earlier, it was possible to run multiple device remove operations in parallel on the same device. This led to race conditions and database inconsistencies. With this fix, an error is returned while another device remove operation on the same device is already in progress.
Clone Of:
Environment:
Last Closed: 2018-04-05 03:08:10 UTC
Embargoed:


Attachments (Terms of Use)
heketi_logs_March8 (1.48 MB, text/plain)
2018-03-08 03:39 UTC, krishnaram Karthick
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0638 0 None None None 2018-04-05 03:09:27 UTC

Description krishnaram Karthick 2017-03-31 08:35:23 UTC
Description of problem:
On a device which has 100 volumes, device remove was run which failed after migratibe 25 devices (BZ # 1437747). Now re-running device remove on the same device fails with error - 'Id not found'

                Id:03279ea02e78439382a985db0c4d92ed   Name:/dev/sdh            State:online    Size (GiB):299     Used (GiB):0       Free (GiB):299
                Id:2da562dbb930fc8d90f78f70075f8c2a   Name:/dev/sdg            State:offline   Size (GiB):299     Used (GiB):100     Free (GiB):199
                Id:693eea662e30520ea007f830db992fc3   Name:/dev/sde            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97
                Id:8799415839ed9364e1e62187695f82f0   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:3f107588b1dabdb108131d801db3786a   Name:/dev/sdg            State:online    Size (GiB):299     Used (GiB):0       Free (GiB):299
                Id:71049fe1ea71ce342fa4110c04cc7f98   Name:/dev/sdh            State:offline   Size (GiB):299     Used (GiB):100     Free (GiB):199
                Id:8cbd06ebe6af08c39c7aa6ddc47cd09b   Name:/dev/sde            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97
                Id:bec377642422514ad42d4d0eaa8553a1   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:749041ba2b997ff9fab0f7b238e12ae9   Name:/dev/sdg            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:775747b65771ac369114ce29bf62b12c   Name:/dev/sdh            State:online    Size (GiB):299     Used (GiB):100     Free (GiB):199
                Id:c2f7deb4b600944bf6ea63719b0111b1   Name:/dev/sdd            State:offline   Size (GiB):15      Used (GiB):0       Free (GiB):15
                Id:db9a3b71d84274c7a951c726a2c992f2   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97
[root@dhcp46-202 ~]# heketi-cli device remove 2da562dbb930fc8d90f78f70075f8c2a


Error: Failed to remove device, error: Unable to replace brick 10.70.46.165:/var/lib/heketi/mounts/vg_2da562dbb930fc8d90f78f70075f8c2a/brick_5d8102dff119938e2154099b5c8f9ead/brick with 10.70.46.165:/var/lib/heketi/mounts/vg_03279ea02e78439382a985db0c4d92ed/brick_b29a30ee75969d5fa633cbd4235674d3/brick for volume vol_8f34efd3674e66fb5369d214ab2b4578
[root@dhcp46-202 ~]#

                Id:03279ea02e78439382a985db0c4d92ed   Name:/dev/sdh            State:online    Size (GiB):299     Used (GiB):25      Free (GiB):274
                Id:2da562dbb930fc8d90f78f70075f8c2a   Name:/dev/sdg            State:offline   Size (GiB):299     Used (GiB):75      Free (GiB):224
                Id:693eea662e30520ea007f830db992fc3   Name:/dev/sde            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97
                Id:8799415839ed9364e1e62187695f82f0   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:3f107588b1dabdb108131d801db3786a   Name:/dev/sdg            State:online    Size (GiB):299     Used (GiB):0       Free (GiB):299
                Id:71049fe1ea71ce342fa4110c04cc7f98   Name:/dev/sdh            State:offline   Size (GiB):299     Used (GiB):100     Free (GiB):199
                Id:8cbd06ebe6af08c39c7aa6ddc47cd09b   Name:/dev/sde            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97
                Id:bec377642422514ad42d4d0eaa8553a1   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:749041ba2b997ff9fab0f7b238e12ae9   Name:/dev/sdg            State:offline   Size (GiB):99      Used (GiB):0       Free (GiB):99
                Id:775747b65771ac369114ce29bf62b12c   Name:/dev/sdh            State:online    Size (GiB):299     Used (GiB):100     Free (GiB):199
                Id:c2f7deb4b600944bf6ea63719b0111b1   Name:/dev/sdd            State:offline   Size (GiB):15      Used (GiB):0       Free (GiB):15
                Id:db9a3b71d84274c7a951c726a2c992f2   Name:/dev/sdf            State:offline   Size (GiB):99      Used (GiB):2       Free (GiB):97

[root@dhcp46-202 ~]# heketi-cli device remove 2da562dbb930fc8d90f78f70075f8c2a
Error: Failed to remove device, error: Id not found

snippet of heketi log:

[sshexec] DEBUG 2017/03/31 08:17:36 /src/github.com/heketi/heketi/executors/sshexec/volume.go:258: {OpRet:0 OpErrno:0 OpErrStr: VolInfo:{XMLName:{Space: Local:volInfo} Volumes:{XMLName:{Space: Local:volumes} Count:1 VolumeList:[{XMLName:{Space: Local:volume} VolumeName:vol_8f34efd3674e66fb5369d214ab2b4578 ID:186d66fd-9d86-4954-a3e6-4878ddeb4693 Status:1 StatusStr:Started BrickCount:3 DistCount:3 StripeCount:1 ReplicaCount:3 ArbiterCount:0 DisperseCount:0 RedundancyCount:0 Type:2 TypeStr:Replicate Transport:0 Bricks:{XMLName:{Space: Local:bricks} BrickList:[{UUID:8947622f-a43d-462b-9e06-f4f8f77e2e4c Name:10.70.46.165:/var/lib/heketi/mounts/vg_03279ea02e78439382a985db0c4d92ed/brick_b29a30ee75969d5fa633cbd4235674d3/brick HostUUID:8947622f-a43d-462b-9e06-f4f8f77e2e4c IsArbiter:0} {UUID:6d101c9b-b039-4f7b-a47a-363dd4b0b84a Name:10.70.47.21:/var/lib/heketi/mounts/vg_775747b65771ac369114ce29bf62b12c/brick_a4d7887a8b268467970339899fc92591/brick HostUUID:6d101c9b-b039-4f7b-a47a-363dd4b0b84a IsArbiter:0} {UUID:325abd36-aacf-498c-b8ec-6ede7776f8a3 Name:10.70.47.51:/var/lib/heketi/mounts/vg_71049fe1ea71ce342fa4110c04cc7f98/brick_69fe7f9f5885d53ae15ca4da4cf8457c/brick HostUUID:325abd36-aacf-498c-b8ec-6ede7776f8a3 IsArbiter:0}]} OptCount:3 Options:{XMLName:{Space: Local:options} OptionList:[{Name:transport.address-family Value:inet} {Name:performance.readdir-ahead Value:on} {Name:nfs.disable Value:on}]}}]}}}
[asynchttp] INFO 2017/03/31 08:17:36 Completed job 5cd014a9e5af9ff946db1fe118ec17c8 in 265.711397ms
[heketi] ERROR 2017/03/31 08:17:36 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry_allocate.go:148: Unable to create brick entry using brick name:10.70.46.165:/var/lib/heketi/mounts/vg_03279ea02e78439382a985db0c4d92ed/brick_b29a30ee75969d5fa633cbd4235674d3/brick, error: Id not found
[heketi] ERROR 2017/03/31 08:17:36 /src/github.com/heketi/heketi/apps/glusterfs/device_entry.go:479: Failed to remove device, error: Id not found

Version-Release number of selected component (if applicable):
heketi-client-4.0.0-4.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
NA

Actual results:
device remove command fails with error:Id not found

Expected results:
device shouldn't go into such a state, now there is no way this device can be removed

Additional info:
sosreports, heketi logs, topology info shall be attached

Comment 11 krishnaram Karthick 2018-03-08 02:43:57 UTC
This issue is still seen.

steps to reproduce:
1) create few volume so a device has some bricks in it
2) disable the device and remove the device
3) hit ctrl+c while device remove is in progress
4) Re-run device remove --> fails with error Id not found.


# oc rsh heketi-storage-1-xznlb
sh-4.2# 
sh-4.2# 
sh-4.2# 
sh-4.2# rpm -qa | grep 'heketi'
heketi-client-6.0.0-5.el7rhgs.x86_64
python-heketi-6.0.0-5.el7rhgs.x86_64
heketi-6.0.0-5.el7rhgs.x86_64


# heketi-cli device remove 31c79d220a53d3cce0285fcdb7a750f1
^C
# heketi-cli device remove 31c79d220a53d3cce0285fcdb7a750f1
Error: Failed to remove device, error: Id not found
# heketi-cli node info 3c341cb212e7b06f4325dfeef2910ada
Node Id: 3c341cb212e7b06f4325dfeef2910ada
State: online
Cluster Id: e2276d294579a9b7b3b2bf15c2e95df0
Zone: 1
Management Hostname: dhcp46-210.lab.eng.blr.redhat.com
Storage Hostname: 10.70.46.210
Devices:
Id:2aa31ca463f4908a5ce83caa847850c2   Name:/dev/sdf            State:online    Size (GiB):99      Used (GiB):3       Free (GiB):96      
Id:31c79d220a53d3cce0285fcdb7a750f1   Name:/dev/sdd            State:offline   Size (GiB):99      Used (GiB):43      Free (GiB):56      
Id:df33255a3e01febf574d218d9da6a093   Name:/dev/sde            State:online    Size (GiB):599     Used (GiB):161     Free (GiB):437

Comment 12 krishnaram Karthick 2018-03-08 03:39:51 UTC
Created attachment 1405689 [details]
heketi_logs_March8

Comment 13 krishnaram Karthick 2018-03-14 08:04:12 UTC
The issue reported in this bug is no more seen in the following build - heketi-6.0.0-7.el7rhgs.x86_64

Test1:

1) run device remove command
2) hit ctrl+c 
3) re-run device remove command on the same device


Test2:

1) run device remove command
2) from another terminal, run device remove command for the same device.

In both the tests, second attempt to re-run failed with this error message - Error: The target exists, contains other items, or is in use.

Moving the bug to verified.

Comment 17 errata-xmlrpc 2018-04-05 03:08:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0638


Note You need to log in before you can comment on or make changes to this bug.