Bug 1562019

Summary: Node 0 deployment failed over iSCSI storage via cockpit.
Product: [oVirt] cockpit-ovirt Reporter: Nikolai Sednev <nsednev>
Component: Hosted EngineAssignee: Phillip Bailey <phbailey>
Status: CLOSED WORKSFORME QA Contact: Yihui Zhao <yzhao>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 0.11.19CC: bugs, cshao, nsednev, stirabos, ycui, ylavi, yzhao
Target Milestone: ovirt-4.2.3Flags: sbonazzo: ovirt-4.2?
sbonazzo: blocker?
nsednev: planning_ack?
nsednev: devel_ack?
cshao: testing_ack+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-29 14:25:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport from the engine
none
sosreport from alma04 none

Description Nikolai Sednev 2018-03-29 11:49:52 UTC
Description of problem:
[ INFO ] ok: [localhost]
[ INFO ] TASK [Trigger hosted engine OVF update]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Wait until OVF update finishes]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}], "discard_after_delete": false, "disk_profiles": [{"id": ["72088a9a-edb2-4111-92cb-b8f4cc0f9ab4"], "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": ["2f64edb7-4278-498f-b9fa-29b01e5bf9d4"], "image_id": "b55586fd-e237-459a-861b-670d75483902", "name": "HostedEngineConfigurationImage"}, {"id": ["3999cfa2-60d8-4b67-b3f9-549460b3d049"], "image_id": "a1ce92c9-f44c-4bf0-89b4-2f7415ef431a", "name": "he_virtio_disk"}, {"id": ["5f40d10d-80a8-4b2b-b5c7-b7ae96d17c9f"], "image_id": "b0d302ab-4f45-4e70-95a4-5777f7fa8efd", "name": "he_metadata"}, {"id": ["f4552177-ab8c-4fed-b706-ff0c56d37ed5"], "image_id": "f6ac0563-720c-4e65-af42-becb0ce9940c", "name": "he_sanlock"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/c88c6b61-1e45-44fd-b1fd-25c78a46defd", "id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "master": true, "name": "hosted_storage", "permissions": [{"id": ["58ca605c-010d-0307-0224-0000000001a9"]}, {"id": ["80f8b9b6-3344-11e8-8b64-00163e7bb853"]}], "storage": {"type": "iscsi", "volume_group": {"id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8", "logical_units": [{"address": "10.35.146.129", "discard_max_size": 8388608, "discard_zeroes_data": false, "id": "3514f0c5a516016ff", "lun_mapping": 1, "paths": 0, "port": 3260, "portal": "10.35.146.129:3260,1", "product_id": "XtremApp", "serial": "SXtremIO_XtremApp_XIO00153500071", "size": 75161927680, "storage_domain_id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "target": "iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00", "vendor_id": "XtremIO", "volume_group_id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8"}]}}, "storage_connections": [{"id": ["da632763-5747-4303-8124-c8b2060d2372"]}], "storage_format": "v4", "supports_discard": true, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 70866960384, "vms": [], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false}

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)


How reproducible:
100%

Steps to Reproduce:
1.Deploy Node 0 over iSCSI from Cockpit.

Actual results:
Deployment fails.

Expected results:
Deployment should succeed.
Additional info:
See attached sosreport from host and the engine.

Comment 1 Yihui Zhao 2018-03-29 12:01:09 UTC
*** Bug 1562022 has been marked as a duplicate of this bug. ***

Comment 2 Nikolai Sednev 2018-03-29 12:01:27 UTC
Created attachment 1414730 [details]
sosreport from the engine

Comment 3 Nikolai Sednev 2018-03-29 12:02:30 UTC
Created attachment 1414731 [details]
sosreport from alma04

Comment 4 Ryan Barry 2018-03-29 12:04:52 UTC
Does this work over the CLI?

Comment 5 Nikolai Sednev 2018-03-29 12:26:39 UTC
(In reply to Ryan Barry from comment #4)
> Does this work over the CLI?

Yes

Comment 9 Simone Tiraboschi 2018-03-29 13:22:35 UTC
The issue is here: the OVF_STORE disks failed to be created on the HE storage domain due to lack of space ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN


2018-03-29 14:39:50,921+03 WARN  [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Validation of action 'AddDisk' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName hosted_storage
2018-03-29 14:39:51,054+03 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Running command: CreateOvfVolumeForStorageDomainCommand internal: true. Entities affected :  ID: c88c6b61-1e45-44fd-b1fd-25c78a46defd Type: StorageAction group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
2018-03-29 14:39:51,098+03 WARN  [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Validation of action 'AddDisk' failed for user SYSTEM. Reasons: VAR__ACTION__ADD,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName hosted_storage
2018-03-29 14:39:51,200+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] EVENT_ID: UPDATE_OVF_FOR_STORAGE_DOMAIN_FAILED(190), Failed to update VMs/Templates OVF data for Storage Domain hosted_storage in Data Center Default.
2018-03-29 14:39:51,209+03 INFO  [org.ovirt.engine.core.bll.storage.ovfstore.ProcessOvfUpdateForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Lock freed to object 'EngineLock:{exclusiveLocks='[c88c6b61-1e45-44fd-b1fd-25c78a46defd=STORAGE]', sharedLocks='[4c78136c-3344-11e8-a34a-00163e7bb853=OVF_UPDATE]'}'
2018-03-29 14:39:51,393+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command 'CreateOvfVolumeForStorageDomain' id: '49a11484-3e59-425f-a107-885644530e81' child commands '[d6f5b581-7c75-4530-8593-1ef077c64f09]' executions were completed, status 'FAILED'
2018-03-29 14:39:51,422+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command 'CreateOvfVolumeForStorageDomain' id: '6785e23c-a6bd-4424-a9ee-889ae93b2bce' child commands '[c5452744-8791-4297-a444-cf83d7f5855a]' executions were completed, status 'FAILED'
2018-03-29 14:39:51,440+03 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command 'ProcessOvfUpdateForStorageDomain' (id: 'abbbe75d-15f5-482f-98d6-220ed58b7491') waiting on child command id: '49a11484-3e59-425f-a107-885644530e81' type:'CreateOvfVolumeForStorageDomain' to complete
2018-03-29 14:39:52,448+03 ERROR [org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-57) [49c911e4] Ending command 'org.ovirt.engine.core.bll.storage.ovfstore.CreateOvfVolumeForStorageDomainCommand' with failure.


We are explicitly validating the LUN size taking care also of the space for the OVF_STORE disks as for https://bugzilla.redhat.com/show_bug.cgi?id=1522737 but probably it's still not enough.
The engine has a kind of protection on free size preventing disk creation if a certain configurable percentage (5% or 10% as default, not sure) of the free space is not available. I think we are simply hitting that constraint.

Nikolai could you please retry on a bigger LUN (please at least 100GB!) and if it will be fine there close this and reopen https://bugzilla.redhat.com/show_bug.cgi?id=1522737 ?

Comment 10 Nikolai Sednev 2018-03-29 13:35:54 UTC
(In reply to Simone Tiraboschi from comment #9)
> The issue is here: the OVF_STORE disks failed to be created on the HE
> storage domain due to lack of space
> ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN
> 
> 
> 2018-03-29 14:39:50,921+03 WARN 
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Validation of
> action 'AddDisk' failed for user SYSTEM. Reasons:
> VAR__ACTION__ADD,VAR__TYPE__DISK,
> ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName
> hosted_storage
> 2018-03-29 14:39:51,054+03 INFO 
> [org.ovirt.engine.core.bll.storage.ovfstore.
> CreateOvfVolumeForStorageDomainCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Running
> command: CreateOvfVolumeForStorageDomainCommand internal: true. Entities
> affected :  ID: c88c6b61-1e45-44fd-b1fd-25c78a46defd Type: StorageAction
> group MANIPULATE_STORAGE_DOMAIN with role type ADMIN
> 2018-03-29 14:39:51,098+03 WARN 
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Validation of
> action 'AddDisk' failed for user SYSTEM. Reasons:
> VAR__ACTION__ADD,VAR__TYPE__DISK,
> ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName
> hosted_storage
> 2018-03-29 14:39:51,200+03 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] EVENT_ID:
> UPDATE_OVF_FOR_STORAGE_DOMAIN_FAILED(190), Failed to update VMs/Templates
> OVF data for Storage Domain hosted_storage in Data Center Default.
> 2018-03-29 14:39:51,209+03 INFO 
> [org.ovirt.engine.core.bll.storage.ovfstore.
> ProcessOvfUpdateForStorageDomainCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-6) [49c911e4] Lock freed to
> object
> 'EngineLock:{exclusiveLocks='[c88c6b61-1e45-44fd-b1fd-25c78a46defd=STORAGE]',
> sharedLocks='[4c78136c-3344-11e8-a34a-00163e7bb853=OVF_UPDATE]'}'
> 2018-03-29 14:39:51,393+03 INFO 
> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
> (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command
> 'CreateOvfVolumeForStorageDomain' id: '49a11484-3e59-425f-a107-885644530e81'
> child commands '[d6f5b581-7c75-4530-8593-1ef077c64f09]' executions were
> completed, status 'FAILED'
> 2018-03-29 14:39:51,422+03 INFO 
> [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback]
> (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command
> 'CreateOvfVolumeForStorageDomain' id: '6785e23c-a6bd-4424-a9ee-889ae93b2bce'
> child commands '[c5452744-8791-4297-a444-cf83d7f5855a]' executions were
> completed, status 'FAILED'
> 2018-03-29 14:39:51,440+03 INFO 
> [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
> (EE-ManagedThreadFactory-engineScheduled-Thread-8) [49c911e4] Command
> 'ProcessOvfUpdateForStorageDomain' (id:
> 'abbbe75d-15f5-482f-98d6-220ed58b7491') waiting on child command id:
> '49a11484-3e59-425f-a107-885644530e81'
> type:'CreateOvfVolumeForStorageDomain' to complete
> 2018-03-29 14:39:52,448+03 ERROR
> [org.ovirt.engine.core.bll.storage.ovfstore.
> CreateOvfVolumeForStorageDomainCommand]
> (EE-ManagedThreadFactory-engineScheduled-Thread-57) [49c911e4] Ending
> command
> 'org.ovirt.engine.core.bll.storage.ovfstore.
> CreateOvfVolumeForStorageDomainCommand' with failure.
> 
> 
> We are explicitly validating the LUN size taking care also of the space for
> the OVF_STORE disks as for
> https://bugzilla.redhat.com/show_bug.cgi?id=1522737 but probably it's still
> not enough.
> The engine has a kind of protection on free size preventing disk creation if
> a certain configurable percentage (5% or 10% as default, not sure) of the
> free space is not available. I think we are simply hitting that constraint.
> 
> Nikolai could you please retry on a bigger LUN (please at least 100GB!) and
> if it will be fine there close this and reopen
> https://bugzilla.redhat.com/show_bug.cgi?id=1522737 ?

CLI working just fine with 70GB, no reason for Cockpit to fail with the same size.

Comment 11 Simone Tiraboschi 2018-03-29 13:54:50 UTC
(In reply to Nikolai Sednev from comment #10)
> > Nikolai could you please retry on a bigger LUN (please at least 100GB!) and
> > if it will be fine there close this and reopen
> > https://bugzilla.redhat.com/show_bug.cgi?id=1522737 ?
> 
> CLI working just fine with 70GB, no reason for Cockpit to fail with the same
> size.

I just successfully deployed from cockpit on a 100 G LUN with the same version.

From CLI we are proposing 50 GiB as the default for the appliance while cockpit is proposing 58 GiB.

In your error message I see:

{"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}]

So you have 3 GiB free and 61 committed (58 for the engine VM disk plus 1 GiB each for metadata, lockspace and configuration volumes) and the engine refuses now to create the the OVF_STORE volume since 3 GiB is < 5 Gib ("critical_space_action_blocker").

Comment 12 Nikolai Sednev 2018-03-29 13:59:15 UTC
(In reply to Simone Tiraboschi from comment #11)
> (In reply to Nikolai Sednev from comment #10)
> > > Nikolai could you please retry on a bigger LUN (please at least 100GB!) and
> > > if it will be fine there close this and reopen
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1522737 ?
> > 
> > CLI working just fine with 70GB, no reason for Cockpit to fail with the same
> > size.
> 
> I just successfully deployed from cockpit on a 100 G LUN with the same
> version.
> 
> From CLI we are proposing 50 GiB as the default for the appliance while
> cockpit is proposing 58 GiB.
> 
> In your error message I see:
> 
> {"ovirt_storage_domains": [{"available": 3221225472, "backup": false,
> "committed": 65498251264, "critical_space_action_blocker": 5,
> "data_centers": [{"href":
> "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id":
> "4c78136c-3344-11e8-a34a-00163e7bb853"}]
> 
> So you have 3 GiB free and 61 committed (58 for the engine VM disk plus 1
> GiB each for metadata, lockspace and configuration volumes) and the engine
> refuses now to create the the OVF_STORE volume since 3 GiB is < 5 Gib
> ("critical_space_action_blocker").

So first issue here is inconsistency between Cockpit minimum requirements vs. CLI. They both have to be aligned and the same.

Comment 13 Simone Tiraboschi 2018-03-29 14:11:37 UTC
(In reply to Nikolai Sednev from comment #12)
> So first issue here is inconsistency between Cockpit minimum requirements
> vs. CLI. They both have to be aligned and the same.

https://bugzilla.redhat.com/show_bug.cgi?id=1561888

Comment 14 Nikolai Sednev 2018-03-29 14:25:30 UTC
In CLI and Cockpit, minimum LUN sizes are different, hence for CLI 70GB LUN is sufficient, while for Cockpit its not.
Cockpit is using 58GB as minimum default requirement.
CLI is using 50GB as minimum default requirement.

I've just tried again to deploy over 100GB LUN and succeeded.

I've reopened https://bugzilla.redhat.com/show_bug.cgi?id=1522737 and we have already covered the inconsistency of minimum LUN space requirements for both, Cockpit and CLI here:
https://bugzilla.redhat.com/show_bug.cgi?id=1561888. 

Moving this bug to closed as it works fine with larger LUN size.