Bug 1562581

Summary: Node removal fails when run concurrently with volume deletion
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachael <rgeorge>
Component: heketiAssignee: Michael Adam <madam>
Status: CLOSED WONTFIX QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.9CC: hchiramm, jmulligan, kramdoss, rgeorge, rhs-bugs, rtalur, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-12 20:04:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1641915    
Attachments:
Description Flags
heketi_logs
none
topology info none

Comment 3 krishnaram Karthick 2018-04-01 06:30:33 UTC
Created attachment 1415730 [details]
heketi_logs

Comment 4 krishnaram Karthick 2018-04-01 06:31:12 UTC
Created attachment 1415731 [details]
topology info

Comment 5 Michael Adam 2018-05-08 06:23:41 UTC
Let me say, this is is actually good behavior.
We could have a nicer error message.
But it is correct to not proceed with node removal while the volume delete is operating on the node.

Also thanks for confirming that the db stayed consistent.

Not sure what to make out of this BZ:
Is it a request for a better error message?
Or to block the CLI until the volume delete is done and only afterwards remove the node?

Comment 6 krishnaram Karthick 2018-05-15 05:55:15 UTC
(In reply to Michael Adam from comment #5)
> Let me say, this is is actually good behavior.
> We could have a nicer error message.
> But it is correct to not proceed with node removal while the volume delete
> is operating on the node.

The expectation from this bug is to have a seamless node removal operation. I believe the error seen is due to the fact that the brick replace operation has failed as the existing brick is already deleted as part of volume delete. It would be great if this is handled gracefully by the node removal process.

> 
> Also thanks for confirming that the db stayed consistent.
> 
> Not sure what to make out of this BZ:
> Is it a request for a better error message?
> Or to block the CLI until the volume delete is done and only afterwards
> remove the node?

As mentioned above, the expectation from this bug is to handle the node removal process gracefully. We cannot fail Node removal each time a volume delete operation is run which involves node remove. With scale, admin will have to run the node removal command several times which kills the uer experience.