Bug 1270732 - [engine-backend] Undesired handling with a UnsupportedGlusterVolumeReplicaCountError from vdsm
Summary: [engine-backend] Undesired handling with a UnsupportedGlusterVolumeReplicaCou...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.0
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Ala Hino
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-12 09:28 UTC by Elad
Modified: 2016-02-10 17:10 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-16 12:17:47 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-3.6.z?
ebenahar: planning_ack?
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine.log and vdsm.log (1.17 MB, application/x-gzip)
2015-10-12 09:28 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 47215 0 master MERGED core: Change error code according to vdsm Never
oVirt gerrit 47256 0 ovirt-engine-3.6 MERGED core: Change error code according to vdsm Never

Description Elad 2015-10-12 09:28:15 UTC
Created attachment 1081929 [details]
engine.log and vdsm.log

Description of problem:
Engine fails with NullPointerException when VDSM response with UnsupportedGlusterVolumeReplicaCountError while trying to create a Gluster domain which resides on a non 3 bricks volume (tried with 2 bricks).

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.18.el6.noarch
vdsm-4.17.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a Gluster storage domain resides on a replicate volume which has 2 bricks


Actual results:
VDSM fails to create the Gluster domain which is the desired behaviour. Engine gets VDSM response and doesn't know how to handle with it: 

2015-10-12 11:20:21,437 ERROR [org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-2) [5427132] Command 'org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand' failed: null
2015-10-12 11:20:21,437 ERROR [org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-2) [5427132] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.common.errors.EngineException.<init>(EngineException.java:24) [common.jar:]
        at org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand.executeCommand(AddStorageServerConnectionCommand.java:51) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1211) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1355) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1979) [bll.jar:]


Expected results:
Engine should know how to catch the failure from VDSM.

Additional info:
engine.log and vdsm.log

Comment 1 Ala Hino 2015-10-12 09:47:04 UTC
Root cause is inconsistency between error code sent from vdsm (480) and error code expected by the engine (4710).
Changing error code at engine to 480.

Comment 2 Allon Mureinik 2015-10-12 10:16:28 UTC
Ala, since the BZ is in POST, I assume that a patch exists. Can you add a reference to it?

Comment 3 Ala Hino 2015-10-12 11:06:03 UTC
(In reply to Allon Mureinik from comment #2)
> Ala, since the BZ is in POST, I assume that a patch exists. Can you add a
> reference to it?

Done

Comment 4 Yaniv Lavi 2015-10-29 12:40:30 UTC
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.

Comment 5 Elad 2015-11-30 13:28:15 UTC
Engine handles well with an unsupported replica volume count:

2015-11-30 13:26:41,208 ERROR [org.ovirt.engine.core.bll.storage.BaseFsStorageHelper] (ajp-/127.0.0.1:8702-1) [1e6a8626] The connection with details '10.35.65.25:/elad1' failed because of error code '480' and error message is: unsupported gluster volume replica count


Verified using:
rhevm-3.6.1-0.2.el6.noarch
vdsm-4.17.11-0.el7ev.noarch

Comment 6 Sandro Bonazzola 2015-12-16 12:17:47 UTC
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.


Note You need to log in before you can comment on or make changes to this bug.