Bug 1449312 - [GSS] Id not found when removing a device
Summary: [GSS] Id not found when removing a device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: CNS 3.9
Assignee: Raghavendra Talur
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks: 1526414
TreeView+ depends on / blocked
 
Reported: 2017-05-09 15:06 UTC by Sergi Jimenez Romero
Modified: 2021-06-10 12:18 UTC (History)
17 users (show)

Fixed In Version: heketi-6.0.0-7.el7rhgs
Doc Type: Bug Fix
Doc Text:
Earlier, it was possible to run multiple device remove operations in parallel on the same device. This led to race conditions and database inconsistencies. With this fix, an error is returned while another device remove operation on the same device is already in progress.
Clone Of:
Environment:
Last Closed: 2018-04-05 03:08:10 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0638 0 None None None 2018-04-05 03:09:27 UTC

Description Sergi Jimenez Romero 2017-05-09 15:06:28 UTC
Description of problem:

After upgrading to cns-3.5, some nodes were going to be replaced, for that we started to remove devices, first disabling them and then removing them.

While removing the last device, we hit the following issue:

[05eadm@siy05ez1:~]$ HEKETI_CLI_SERVER=http://172.30.136.226:8080 heketi-cli device disable eeab4f4d36cce3c081cd59874e807aa1
Device eeab4f4d36cce3c081cd59874e807aa1 is now offline

[05eadm@siy05ez1:~]$ HEKETI_CLI_SERVER=http://172.30.136.226:8080 heketi-cli device remove eeab4f4d36cce3c081cd59874e807aa1
Error: Failed to remove device, error: Id not found


Version-Release number of selected component (if applicable):
cns-3.5

How reproducible:
not clear

Steps to Reproduce:
1. Start disabling and removing devices, one after another.


Actual results:
Error: Failed to remove device, error: Id not found

Expected results:
The device would be properly removed.

Additional info:

Comment 5 Sergi Jimenez Romero 2017-05-09 15:13:51 UTC
Created attachment 1277404 [details]
Topology info

Comment 6 Raghavendra Talur 2017-05-09 16:28:21 UTC
As per the heketi db, we must have 13 bricks on this device

Id:9f1b011439310e343a17628df16317c5 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_9f1b011439310e343a17628df16317c5/brick
Id:fbf8e9b74041637296dc927ac655faaa 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_fbf8e9b74041637296dc927ac655faaa/brick
Id:fa4ad379d8001e1613a728b1e6313d27 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_fa4ad379d8001e1613a728b1e6313d27/brick
Id:a2cf9656a4d470177d0eea27e1cae704 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_a2cf9656a4d470177d0eea27e1cae704/brick
Id:ce76b390ad27d3eb6d530f38b9d1a6bc 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_ce76b390ad27d3eb6d530f38b9d1a6bc/brick
Id:c30818309b6019eafb46130c390d5859 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_c30818309b6019eafb46130c390d5859/brick
Id:db20ada0b933db739385a433d4fe4fa6 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_db20ada0b933db739385a433d4fe4fa6/brick
Id:efdf0f44de5868f8c97b16ef138302e3 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_efdf0f44de5868f8c97b16ef138302e3/brick
Id:f6e0f415a0d47a1bf311905e4b4a64c7 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_f6e0f415a0d47a1bf311905e4b4a64c7/brick
Id:c2f5562d676b0e40ad0c03edb7c37237 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_c2f5562d676b0e40ad0c03edb7c37237/brick
Id:dbf63bfc1c6191da6f2c6395bb287ac5 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_dbf63bfc1c6191da6f2c6395bb287ac5/brick
220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_ed051c820ad75a3dc8449ed838b08274/brick


Above are 12 bricks that were found in Volume info.
The brick below seem to be in heketi db but not in Gluster volume status/info output.

Id:96793dd732a5cf5436f6ffa3317725c9

By co-relation, we have identified that these bricks belong to 
"heketidbstorage" volume.

The replaced brick for heketidbstorage is already at 
/var/lib/heketi/mounts/vg_c03ec6f522977a44ff70ca38ff1e329a/brick_579e6dd6703b3dd823d6664c9d8c719e/brick on 220.4.104.66


Please provide us
1. gluster volume status heketidbstorage
2. If the heketi pod hasn't been restarted, then "oc log heketi-<name>" would help us determine why heketi db state differs from Gluster volume state.

Based on the info, we will be able to root cause and suggest fixes.

Comment 7 Sergi Jimenez Romero 2017-05-10 07:34:41 UTC
(In reply to Raghavendra Talur from comment #6)
> As per the heketi db, we must have 13 bricks on this device
> 
> Id:9f1b011439310e343a17628df16317c5
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_9f1b011439310e343a17628df16317c5/brick
> Id:fbf8e9b74041637296dc927ac655faaa
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_fbf8e9b74041637296dc927ac655faaa/brick
> Id:fa4ad379d8001e1613a728b1e6313d27
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_fa4ad379d8001e1613a728b1e6313d27/brick
> Id:a2cf9656a4d470177d0eea27e1cae704
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_a2cf9656a4d470177d0eea27e1cae704/brick
> Id:ce76b390ad27d3eb6d530f38b9d1a6bc
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_ce76b390ad27d3eb6d530f38b9d1a6bc/brick
> Id:c30818309b6019eafb46130c390d5859
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_c30818309b6019eafb46130c390d5859/brick
> Id:db20ada0b933db739385a433d4fe4fa6
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_db20ada0b933db739385a433d4fe4fa6/brick
> Id:efdf0f44de5868f8c97b16ef138302e3
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_efdf0f44de5868f8c97b16ef138302e3/brick
> Id:f6e0f415a0d47a1bf311905e4b4a64c7
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_f6e0f415a0d47a1bf311905e4b4a64c7/brick
> Id:c2f5562d676b0e40ad0c03edb7c37237
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_c2f5562d676b0e40ad0c03edb7c37237/brick
> Id:dbf63bfc1c6191da6f2c6395bb287ac5
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_dbf63bfc1c6191da6f2c6395bb287ac5/brick
> 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/
> brick_ed051c820ad75a3dc8449ed838b08274/brick
> 
> 
> Above are 12 bricks that were found in Volume info.
> The brick below seem to be in heketi db but not in Gluster volume
> status/info output.
> 
> Id:96793dd732a5cf5436f6ffa3317725c9
> 
> By co-relation, we have identified that these bricks belong to 
> "heketidbstorage" volume.
> 
> The replaced brick for heketidbstorage is already at 
> /var/lib/heketi/mounts/vg_c03ec6f522977a44ff70ca38ff1e329a/
> brick_579e6dd6703b3dd823d6664c9d8c719e/brick on 220.4.104.66
> 
> 
> Please provide us
> 1. gluster volume status heketidbstorage
> 2. If the heketi pod hasn't been restarted, then "oc log heketi-<name>"
> would help us determine why heketi db state differs from Gluster volume
> state.
> 
> Based on the info, we will be able to root cause and suggest fixes.


# gluster volume status heketidbstorage
Status of volume: heketidbstorage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 220.4.104.66:/var/lib/heketi/mounts/v
g_c03ec6f522977a44ff70ca38ff1e329a/brick_57
9e6dd6703b3dd823d6664c9d8c719e/brick        49162     0          Y       61437
Brick 220.4.104.71:/var/lib/heketi/mounts/v
g_894d2e335da72e46011f341cad122579/brick_c3
0b4781ca663cf4618fc7d088382fa1/brick        49162     0          Y       31497
Brick 220.4.104.65:/var/lib/heketi/mounts/v
g_639d570fc243fb4feb134b2dce6b5545/brick_c1
a07a440a8a7cd83b5e9f9837a1b459/brick        49152     0          Y       30351
NFS Server on localhost                     N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        Y       95368
NFS Server on 220.4.104.71                  N/A       N/A        N       N/A
Self-heal Daemon on 220.4.104.71            N/A       N/A        Y       79317
NFS Server on 220.4.104.66                  N/A       N/A        N       N/A
Self-heal Daemon on 220.4.104.66            N/A       N/A        Y       72048
NFS Server on 220.128.135.192               N/A       N/A        N       N/A
Self-heal Daemon on 220.128.135.192         N/A       N/A        Y       3108

Task Status of Volume heketidbstorage
------------------------------------------------------------------------------
There are no active volume tasks


Attaching the heketi pod logs.

Comment 49 krishnaram Karthick 2018-03-14 08:03:41 UTC
The issue reported in this bug is no more seen in the following build - heketi-6.0.0-7.el7rhgs.x86_64

Test1:

1) run device remove command
2) hit ctrl+c 
3) re-run device remove command on the same device


Test2:

1) run device remove command
2) from another terminal, run device remove command for the same device.

In both the tests, second attempt to re-run failed with this error message - Error: The target exists, contains other items, or is in use.

Moving the bug to verified.

Comment 53 errata-xmlrpc 2018-04-05 03:08:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0638


Note You need to log in before you can comment on or make changes to this bug.