1562581 – Node removal fails when run concurrently with volume deletion

Bug 1562581 - Node removal fails when run concurrently with volume deletion

Summary: Node removal fails when run concurrently with volume deletion

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	cns-3.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Michael Adam
QA Contact:	Rachael
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	OCS-3.11.1-devel-triage-done
TreeView+	depends on / blocked

Reported:	2018-04-01 06:08 UTC by Rachael
Modified:	2019-03-12 20:04 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-12 20:04:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
heketi_logs (3.49 MB, text/plain) 2018-04-01 06:30 UTC, krishnaram Karthick	no flags	Details
topology info (6.80 KB, text/plain) 2018-04-01 06:31 UTC, krishnaram Karthick	no flags	Details
View All

Comment 3 krishnaram Karthick 2018-04-01 06:30:33 UTC

Created attachment 1415730 [details]
heketi_logs

Comment 4 krishnaram Karthick 2018-04-01 06:31:12 UTC

Created attachment 1415731 [details]
topology info

Comment 5 Michael Adam 2018-05-08 06:23:41 UTC

Let me say, this is is actually good behavior.
We could have a nicer error message.
But it is correct to not proceed with node removal while the volume delete is operating on the node.

Also thanks for confirming that the db stayed consistent.

Not sure what to make out of this BZ:
Is it a request for a better error message?
Or to block the CLI until the volume delete is done and only afterwards remove the node?

Comment 6 krishnaram Karthick 2018-05-15 05:55:15 UTC

(In reply to Michael Adam from comment #5)
> Let me say, this is is actually good behavior.
> We could have a nicer error message.
> But it is correct to not proceed with node removal while the volume delete
> is operating on the node.

The expectation from this bug is to have a seamless node removal operation. I believe the error seen is due to the fact that the brick replace operation has failed as the existing brick is already deleted as part of volume delete. It would be great if this is handled gracefully by the node removal process.

> 
> Also thanks for confirming that the db stayed consistent.
> 
> Not sure what to make out of this BZ:
> Is it a request for a better error message?
> Or to block the CLI until the volume delete is done and only afterwards
> remove the node?

As mentioned above, the expectation from this bug is to handle the node removal process gracefully. We cannot fail Node removal each time a volume delete operation is run which involves node remove. With scale, admin will have to run the node removal command several times which kills the uer experience.

Note You need to log in before you can comment on or make changes to this bug.