Bug 1474656 - Silent Hosted-Engine Auto-Import failure
Silent Hosted-Engine Auto-Import failure
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.0.7
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Andrej Krejcir
Nikolai Sednev
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-25 02:41 EDT by Germano Veit Michel
Modified: 2017-08-28 10:52 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-11 00:52:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: SLA
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Germano Veit Michel 2017-07-25 02:41:57 EDT
Description of problem:

After the Hosted-Engine Storage Domain auto Import is triggered, the engine does a GetImagesListVDSCommand and then a series of GetImageInfoVDSCommand on the Images of the HE SD.

These GetImageInfo commands all fail on engine side with with:

2017-07-04 17:54:13,133 WARN  [org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery] (org.ovirt.thread.pool-6-thread-37) [385bd3ae] Exception while parsing JSON for disk. Exception: '{}': org.codehaus.jackson.JsonParseException: Unexpected character ('h' (code 104)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.StringReader@383484c1; line: 1, column: 2]
        at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) [jackson-core-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) [jackson-core-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442) [jackson-core-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.impl.ReaderBasedParser._handleUnexpectedValue(ReaderBasedParser.java:1198) [jackson-core-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.impl.ReaderBasedParser.nextToken(ReaderBasedParser.java:485) [jackson-core-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2770) [jackson-mapper-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2718) [jackson-mapper-asl.jar:1.9.13.redhat-3]
        at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1877) [jackson-mapper-asl.jar:1.9.13.redhat-3]
        at org.ovirt.engine.core.utils.JsonHelper.jsonToMap(JsonHelper.java:41) [utils.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.MetadataDiskDescriptionHandler.enrichDiskByJsonDescription(MetadataDiskDescriptionHandler.java:247) [bll.jar:]
        at org.ovirt.engine.core.bll.storage.disk.image.GetUnregisteredDiskQuery.executeQueryCommand(GetUnregisteredDiskQuery.java:89) [bll.jar:]
        at org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:103) [bll.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33) [dal.jar:]
        at org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:558) [bll.jar:]
        at org.ovirt.engine.core.bll.Backend.runInternalQuery(Backend.java:524) [bll.jar:]

But the HostedEngine VM is imported into the enviroment and so is the hosted_storage. No errors seen from the Administration Portal. I've seen it happen on two customers today. Both have symptoms even though apparently all went fine from user perspective.

1) 4.0.7 engine with 4.19 vdsm
Symptom: Can't deploy additional HE Host because configs passed to host on Hosted-Engine->Deploy are empty, similar to BZ #1414696.

2) 3.6 engine with 4.19 vdsm (customer doing upgrade to RHV 4.0)
Symptom: HE SD not attached to the Storage Pool (VG tag MDT_POOL_UUID= is empty), so any operation on the HE SD fails with "ResourceAcqusitionFailed: Could not acquire resource. Probably resource factory threw an exception.: ()" due to dom.getPools() returning an empty list.

Unfortunately both have rotated logs on vdsm side. I'm trying to reproduce it.

I'm afraid this might bite in other places like upgrades or disaster recovery.

How reproducible:
Trying to...
Comment 3 Germano Veit Michel 2017-07-25 03:10:23 EDT
Is this vdsm sending a disk info the engine fails to parse?

Could it be due to the higher vdsm version (4.19) vs the engines (3.6 and 4.0)?
Comment 4 Germano Veit Michel 2017-07-25 03:26:29 EDT
So...

The Storage Domain Metadata for the disks on customer 2 contains this:

DESCRIPTION=HostedEngineConfigurationImage
DESCRIPTION=hosted-engine.lockspace
DESCRIPTION=hosted-engine.metadata
DESCRIPTION=Hosted Engine Image

The engine was expecting it in json format like an OVF one right? Similar to this?
DESCRIPTION={"Updated":true,"Size":20480,"Last Updated":"Thu Jun 15 09:17:02 CEST 2017","Storage Domains":[{"uuid":"12166789-fa51-4639-8dc7-91ed4f94dfb7"}],"Disk Description":"OVF_STORE"}

But instead of an '{' it got an 'h'?
Comment 5 Andrej Krejcir 2017-08-02 09:07:05 EDT
It seems that the warning message (with a lot of stack trace) is not related to the problems.

The disk description field can be either a json object or a string. The json description is used when the disk is created by the engine. But the hosted engine VM disk is created by the hosted-engine-setup and it sets the description to a plain string instead of json.

Probably, this behavior has not unchanged since 3.6.
I'm getting the same warning message on master and a clean HE deployment.

Looking at the engine logs, a lot of the errors come directly from rpc calls to vdsm, so without the vdsm logs, it is hard to know why.
Comment 6 Germano Veit Michel 2017-08-02 22:47:35 EDT
Hi Andrej,

I find it quite intriguing that both cases have problems related to the Hosted-Engine storage domain. Problems that I have never seen before. I can't make any sense of of this.

(In reply to Andrej Krejcir from comment #5)
> I'm getting the same warning message on master and a clean HE deployment.

Are you using ovirt-engine from master or vdsm from master, or both? I can try too.

Note You need to log in before you can comment on or make changes to this bug.