Bug 1270732 - [engine-backend] Undesired handling with a UnsupportedGlusterVolumeReplicaCountError from vdsm
[engine-backend] Undesired handling with a UnsupportedGlusterVolumeReplicaCou...
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0
x86_64 Unspecified
unspecified Severity medium (vote)
: ovirt-3.6.1
: 3.6.1
Assigned To: Ala Hino
Elad
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-10-12 05:28 EDT by Elad
Modified: 2016-02-10 12:10 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-16 07:17:47 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
amureini: ovirt‑3.6.z?
ebenahar: planning_ack?
amureini: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine.log and vdsm.log (1.17 MB, application/x-gzip)
2015-10-12 05:28 EDT, Elad
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 47215 master MERGED core: Change error code according to vdsm Never
oVirt gerrit 47256 ovirt-engine-3.6 MERGED core: Change error code according to vdsm Never

  None (edit)
Description Elad 2015-10-12 05:28:15 EDT
Created attachment 1081929 [details]
engine.log and vdsm.log

Description of problem:
Engine fails with NullPointerException when VDSM response with UnsupportedGlusterVolumeReplicaCountError while trying to create a Gluster domain which resides on a non 3 bricks volume (tried with 2 bricks).

Version-Release number of selected component (if applicable):
rhevm-3.6.0-0.18.el6.noarch
vdsm-4.17.8-1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a Gluster storage domain resides on a replicate volume which has 2 bricks


Actual results:
VDSM fails to create the Gluster domain which is the desired behaviour. Engine gets VDSM response and doesn't know how to handle with it: 

2015-10-12 11:20:21,437 ERROR [org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-2) [5427132] Command 'org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand' failed: null
2015-10-12 11:20:21,437 ERROR [org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand] (ajp-/127.0.0.1:8702-2) [5427132] Exception: java.lang.NullPointerException
        at org.ovirt.engine.core.common.errors.EngineException.<init>(EngineException.java:24) [common.jar:]
        at org.ovirt.engine.core.bll.storage.AddStorageServerConnectionCommand.executeCommand(AddStorageServerConnectionCommand.java:51) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1211) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1355) [bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1979) [bll.jar:]


Expected results:
Engine should know how to catch the failure from VDSM.

Additional info:
engine.log and vdsm.log
Comment 1 Ala Hino 2015-10-12 05:47:04 EDT
Root cause is inconsistency between error code sent from vdsm (480) and error code expected by the engine (4710).
Changing error code at engine to 480.
Comment 2 Allon Mureinik 2015-10-12 06:16:28 EDT
Ala, since the BZ is in POST, I assume that a patch exists. Can you add a reference to it?
Comment 3 Ala Hino 2015-10-12 07:06:03 EDT
(In reply to Allon Mureinik from comment #2)
> Ala, since the BZ is in POST, I assume that a patch exists. Can you add a
> reference to it?

Done
Comment 4 Yaniv Lavi (Dary) 2015-10-29 08:40:30 EDT
In oVirt testing is done on single release by default. Therefore I'm removing the 4.0 flag. If you think this bug must be tested in 4.0 as well, please re-add the flag. Please note we might not have testing resources to handle the 4.0 clone.
Comment 5 Elad 2015-11-30 08:28:15 EST
Engine handles well with an unsupported replica volume count:

2015-11-30 13:26:41,208 ERROR [org.ovirt.engine.core.bll.storage.BaseFsStorageHelper] (ajp-/127.0.0.1:8702-1) [1e6a8626] The connection with details '10.35.65.25:/elad1' failed because of error code '480' and error message is: unsupported gluster volume replica count


Verified using:
rhevm-3.6.1-0.2.el6.noarch
vdsm-4.17.11-0.el7ev.noarch
Comment 6 Sandro Bonazzola 2015-12-16 07:17:47 EST
According to verification status and target milestone this issue should be fixed in oVirt 3.6.1. Closing current release.

Note You need to log in before you can comment on or make changes to this bug.