Bug 1562581 - Node removal fails when run concurrently with volume deletion
Summary: Node removal fails when run concurrently with volume deletion
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: OCS-3.11.1-devel-triage-done
TreeView+ depends on / blocked
 
Reported: 2018-04-01 06:08 UTC by Rachael
Modified: 2019-03-12 20:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-12 20:04:32 UTC
Embargoed:


Attachments (Terms of Use)
heketi_logs (3.49 MB, text/plain)
2018-04-01 06:30 UTC, krishnaram Karthick
no flags Details
topology info (6.80 KB, text/plain)
2018-04-01 06:31 UTC, krishnaram Karthick
no flags Details

Comment 3 krishnaram Karthick 2018-04-01 06:30:33 UTC
Created attachment 1415730 [details]
heketi_logs

Comment 4 krishnaram Karthick 2018-04-01 06:31:12 UTC
Created attachment 1415731 [details]
topology info

Comment 5 Michael Adam 2018-05-08 06:23:41 UTC
Let me say, this is is actually good behavior.
We could have a nicer error message.
But it is correct to not proceed with node removal while the volume delete is operating on the node.

Also thanks for confirming that the db stayed consistent.

Not sure what to make out of this BZ:
Is it a request for a better error message?
Or to block the CLI until the volume delete is done and only afterwards remove the node?

Comment 6 krishnaram Karthick 2018-05-15 05:55:15 UTC
(In reply to Michael Adam from comment #5)
> Let me say, this is is actually good behavior.
> We could have a nicer error message.
> But it is correct to not proceed with node removal while the volume delete
> is operating on the node.

The expectation from this bug is to have a seamless node removal operation. I believe the error seen is due to the fact that the brick replace operation has failed as the existing brick is already deleted as part of volume delete. It would be great if this is handled gracefully by the node removal process.

> 
> Also thanks for confirming that the db stayed consistent.
> 
> Not sure what to make out of this BZ:
> Is it a request for a better error message?
> Or to block the CLI until the volume delete is done and only afterwards
> remove the node?

As mentioned above, the expectation from this bug is to handle the node removal process gracefully. We cannot fail Node removal each time a volume delete operation is run which involves node remove. With scale, admin will have to run the node removal command several times which kills the uer experience.


Note You need to log in before you can comment on or make changes to this bug.