Description of problem: After upgrading to cns-3.5, some nodes were going to be replaced, for that we started to remove devices, first disabling them and then removing them. While removing the last device, we hit the following issue: [05eadm@siy05ez1:~]$ HEKETI_CLI_SERVER=http://172.30.136.226:8080 heketi-cli device disable eeab4f4d36cce3c081cd59874e807aa1 Device eeab4f4d36cce3c081cd59874e807aa1 is now offline [05eadm@siy05ez1:~]$ HEKETI_CLI_SERVER=http://172.30.136.226:8080 heketi-cli device remove eeab4f4d36cce3c081cd59874e807aa1 Error: Failed to remove device, error: Id not found Version-Release number of selected component (if applicable): cns-3.5 How reproducible: not clear Steps to Reproduce: 1. Start disabling and removing devices, one after another. Actual results: Error: Failed to remove device, error: Id not found Expected results: The device would be properly removed. Additional info:
Created attachment 1277404 [details] Topology info
As per the heketi db, we must have 13 bricks on this device Id:9f1b011439310e343a17628df16317c5 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_9f1b011439310e343a17628df16317c5/brick Id:fbf8e9b74041637296dc927ac655faaa 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_fbf8e9b74041637296dc927ac655faaa/brick Id:fa4ad379d8001e1613a728b1e6313d27 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_fa4ad379d8001e1613a728b1e6313d27/brick Id:a2cf9656a4d470177d0eea27e1cae704 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_a2cf9656a4d470177d0eea27e1cae704/brick Id:ce76b390ad27d3eb6d530f38b9d1a6bc 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_ce76b390ad27d3eb6d530f38b9d1a6bc/brick Id:c30818309b6019eafb46130c390d5859 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_c30818309b6019eafb46130c390d5859/brick Id:db20ada0b933db739385a433d4fe4fa6 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_db20ada0b933db739385a433d4fe4fa6/brick Id:efdf0f44de5868f8c97b16ef138302e3 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_efdf0f44de5868f8c97b16ef138302e3/brick Id:f6e0f415a0d47a1bf311905e4b4a64c7 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_f6e0f415a0d47a1bf311905e4b4a64c7/brick Id:c2f5562d676b0e40ad0c03edb7c37237 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_c2f5562d676b0e40ad0c03edb7c37237/brick Id:dbf63bfc1c6191da6f2c6395bb287ac5 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_dbf63bfc1c6191da6f2c6395bb287ac5/brick 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/brick_ed051c820ad75a3dc8449ed838b08274/brick Above are 12 bricks that were found in Volume info. The brick below seem to be in heketi db but not in Gluster volume status/info output. Id:96793dd732a5cf5436f6ffa3317725c9 By co-relation, we have identified that these bricks belong to "heketidbstorage" volume. The replaced brick for heketidbstorage is already at /var/lib/heketi/mounts/vg_c03ec6f522977a44ff70ca38ff1e329a/brick_579e6dd6703b3dd823d6664c9d8c719e/brick on 220.4.104.66 Please provide us 1. gluster volume status heketidbstorage 2. If the heketi pod hasn't been restarted, then "oc log heketi-<name>" would help us determine why heketi db state differs from Gluster volume state. Based on the info, we will be able to root cause and suggest fixes.
(In reply to Raghavendra Talur from comment #6) > As per the heketi db, we must have 13 bricks on this device > > Id:9f1b011439310e343a17628df16317c5 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_9f1b011439310e343a17628df16317c5/brick > Id:fbf8e9b74041637296dc927ac655faaa > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_fbf8e9b74041637296dc927ac655faaa/brick > Id:fa4ad379d8001e1613a728b1e6313d27 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_fa4ad379d8001e1613a728b1e6313d27/brick > Id:a2cf9656a4d470177d0eea27e1cae704 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_a2cf9656a4d470177d0eea27e1cae704/brick > Id:ce76b390ad27d3eb6d530f38b9d1a6bc > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_ce76b390ad27d3eb6d530f38b9d1a6bc/brick > Id:c30818309b6019eafb46130c390d5859 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_c30818309b6019eafb46130c390d5859/brick > Id:db20ada0b933db739385a433d4fe4fa6 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_db20ada0b933db739385a433d4fe4fa6/brick > Id:efdf0f44de5868f8c97b16ef138302e3 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_efdf0f44de5868f8c97b16ef138302e3/brick > Id:f6e0f415a0d47a1bf311905e4b4a64c7 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_f6e0f415a0d47a1bf311905e4b4a64c7/brick > Id:c2f5562d676b0e40ad0c03edb7c37237 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_c2f5562d676b0e40ad0c03edb7c37237/brick > Id:dbf63bfc1c6191da6f2c6395bb287ac5 > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_dbf63bfc1c6191da6f2c6395bb287ac5/brick > 220.128.135.192:/var/lib/heketi/mounts/vg_eeab4f4d36cce3c081cd59874e807aa1/ > brick_ed051c820ad75a3dc8449ed838b08274/brick > > > Above are 12 bricks that were found in Volume info. > The brick below seem to be in heketi db but not in Gluster volume > status/info output. > > Id:96793dd732a5cf5436f6ffa3317725c9 > > By co-relation, we have identified that these bricks belong to > "heketidbstorage" volume. > > The replaced brick for heketidbstorage is already at > /var/lib/heketi/mounts/vg_c03ec6f522977a44ff70ca38ff1e329a/ > brick_579e6dd6703b3dd823d6664c9d8c719e/brick on 220.4.104.66 > > > Please provide us > 1. gluster volume status heketidbstorage > 2. If the heketi pod hasn't been restarted, then "oc log heketi-<name>" > would help us determine why heketi db state differs from Gluster volume > state. > > Based on the info, we will be able to root cause and suggest fixes. # gluster volume status heketidbstorage Status of volume: heketidbstorage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 220.4.104.66:/var/lib/heketi/mounts/v g_c03ec6f522977a44ff70ca38ff1e329a/brick_57 9e6dd6703b3dd823d6664c9d8c719e/brick 49162 0 Y 61437 Brick 220.4.104.71:/var/lib/heketi/mounts/v g_894d2e335da72e46011f341cad122579/brick_c3 0b4781ca663cf4618fc7d088382fa1/brick 49162 0 Y 31497 Brick 220.4.104.65:/var/lib/heketi/mounts/v g_639d570fc243fb4feb134b2dce6b5545/brick_c1 a07a440a8a7cd83b5e9f9837a1b459/brick 49152 0 Y 30351 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 95368 NFS Server on 220.4.104.71 N/A N/A N N/A Self-heal Daemon on 220.4.104.71 N/A N/A Y 79317 NFS Server on 220.4.104.66 N/A N/A N N/A Self-heal Daemon on 220.4.104.66 N/A N/A Y 72048 NFS Server on 220.128.135.192 N/A N/A N N/A Self-heal Daemon on 220.128.135.192 N/A N/A Y 3108 Task Status of Volume heketidbstorage ------------------------------------------------------------------------------ There are no active volume tasks Attaching the heketi pod logs.
The issue reported in this bug is no more seen in the following build - heketi-6.0.0-7.el7rhgs.x86_64 Test1: 1) run device remove command 2) hit ctrl+c 3) re-run device remove command on the same device Test2: 1) run device remove command 2) from another terminal, run device remove command for the same device. In both the tests, second attempt to re-run failed with this error message - Error: The target exists, contains other items, or is in use. Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0638