Bug 1592916

Summary: [blocked on platform bug 1690511] Support device block size of 4096 bytes for file based storage domains
Product: [oVirt] vdsm Reporter: Sahina Bose <sabose>
Component: CoreAssignee: Nir Soffer <nsoffer>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: 4.30.0CC: aefrat, bkunal, budic, bugs, dfediuck, ebenahar, fgarciad, godas, guillaume.pavese, gveitmic, jcall, jortialc, marceloltmm, mwest, nico.kruger, nsoffer, okok102928, okok102928, rcyriac, rhs-bugs, sasundar, sbonazzo, shalygin.k, stirabos, szmadej, tnisan, ykaul
Target Milestone: ovirt-4.3.8Flags: rule-engine: ovirt-4.3+
Target Release: 4.30.26   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.30.26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1591293
: 1721020 1734429 (view as bug list) Environment:
Last Closed: 2020-02-26 16:31:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1411103, 1690511, 1728953, 1729809, 1737256, 1737268, 1738657, 1738721, 1739147    
Bug Blocks: 1364869, 1591293, 1720923, 1721020, 1734429    

Description Sahina Bose 2018-06-19 15:20:52 UTC
Description of problem:
RHV fails to create a Storage Domain when storage with native 4k sectors are present.  For RHHI in particular, the Gluster volume is successfully created, but RHVM reports a non-specific failure when adding the Storage Domain.  The 512-byte emulation workaround is not available with new 4kN (e.g. 4k Native) drives.


Version-Release number of selected component (if applicable):
4.2

Comment 2 Tal Nisan 2018-11-05 09:36:36 UTC
*** Bug 1643326 has been marked as a duplicate of this bug. ***

Comment 3 Sandro Bonazzola 2019-01-28 09:37:01 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 4 Yaniv Kaul 2019-03-05 08:39:25 UTC
I believe this cannot be tested with an updated sanlock?

Comment 5 Nir Soffer 2019-03-05 09:08:21 UTC
We are far from QA, I don't know why
this moved to modified.

Comment 6 Nir Soffer 2019-03-08 19:46:25 UTC
This bug is not ready yet. Looks like the CI scripts are confused.

Comment 7 Nir Soffer 2019-03-08 19:47:20 UTC
This cannot be ready for 4.3.2, moving to 4.3.3.

Comment 8 Tal Nisan 2019-03-24 14:32:29 UTC
*** Bug 1364869 has been marked as a duplicate of this bug. ***

Comment 9 Avihai 2019-03-26 08:37:16 UTC
Hi Tal,

Looks like automation bot moved this bug to MODIFIED a few days ago but this bug has many patches still in POST .
Please move back to POST and retarget to 4.3 zstream (currently targeted to 4.3.3)

Comment 10 Avihai 2019-04-03 11:33:37 UTC
Hi Tal,

Once more the automation bot moved this bug to MODIFIED(the bug has many patches still in POST)

Please move it back to POST.

Comment 12 Szymon Madej 2019-05-28 19:42:33 UTC
Hi,

Commit https://github.com/oVirt/ovirt-engine/commit/e2e04e82351a6aea9c160b0da967de7ed39e5f24#diff-6db200f01d6157a55a94e3797f9724c7
which changed in oVirt v4.3+ default Storage Domain version to V5 has caused reapperance of BUG https://bugzilla.redhat.com/show_bug.cgi?id=1669606

Currently I'm not able to migrate Hosted Engine to new oVirt host with backup-restore procedure:

# hosted-engine --deploy --restore-from-file=/root/HE_BCK_2019.05.28.bck

At storage selection I setup NFS share from company's NAS storage. Each try has failed:

          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: nfs
          Please specify the nfs version you would like to use (auto, v3, v4, v4_1)[auto]: v4
          Please specify the full shared storage connection path to use (example: host:/path): mynfsserver:/he_nas
          If needed, specify additional mount options for the connection to the hosted-engine storagedomain (example: rsize=32768,wsize=32768) []:
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
...
[ INFO  ] TASK [ovirt.hosted_engine_setup : Activate storage domain]
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Storage format is unsupported]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Storage format is unsupported]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]:

		  
In Hosted Engine VM in /var/log/ovirt-engine/engine.log the error logs are:
2019-05-28 19:25:13,357+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] EVENT_ID: IRS_BROKER_COMMAND_FAILURE(10,803), VDSM command AttachStorageomainVDS failed: Domain version `5` is unsupported by this version of VDSM: ''
2019-05-28 19:25:13,357+02 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] Command 'AttachStorageDomainVDSCommand( AttachStorageDomainVDSCommandarameters:{storagePoolId='4669438a-3f99-11e9-aef4-00163e3912a0', ignoreFailoverLimit='false', storageDomainId='51ef1a02-271e-4e27-96e8-dfa1886917c7'})' execution failed: IRSGenericException: IRSErrorException: Domain version `5` is nsupported by this version of VDSM: ''
2019-05-28 19:25:13,358+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.AttachStorageDomainVDSCommand] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] FINISH, AttachStorageDomainVDSCommand, return: , log id: 580e7243
2019-05-28 19:25:13,358+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomanToPoolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Domain version `5` is unsupported by this version of VDSM: '' (Failed ith error UnsupportedDomainVersion and code 394)
2019-05-28 19:25:13,367+02 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] Command [id=0e8d4f68-9e59-4944-a3d1-89b591ff811c]: Compensating NEW_ENTITY_ID of org.ovirt.engin.core.common.businessentities.StoragePoolIsoMap; snapshot: StoragePoolIsoMapId:{storagePoolId='4669438a-3f99-11e9-aef4-00163e3912a0', storageId='51ef1a02-271e-4e27-96e8-dfa1886917c7'}.
2019-05-28 19:25:13,397+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] EVENT_ID: USER_ATTACH_STORAGE_DOMAIN_TO_POOL_FAILED(963), Failed to attah Storage Domain hosted_storage to Data Center *******. (User: admin@internal-authz)
2019-05-28 19:25:13,407+02 INFO  [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (default task-1) [cba462c2-1d73-4978-89fd-f6db40162601] Lock freed to object 'EngineLock:{exclusiveLocks='[51ef1a02-271e-4e7-96e8-dfa1886917c7=STORAGE]', sharedLocks=''}'
2019-05-28 19:25:13,407+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-1) [] Operation Failed: [Storage format is unsupported]

All packages are at most recent versions.

Comment 13 Nir Soffer 2019-05-28 20:14:23 UTC
(In reply to Szymon Madej from comment #12)
Please file a new bug, or reopen bug 1669606 if you think that this is the same bug.

This is RFE for supporting 4k storage, not the place for tracking bugs, even if they
are related to 4k related code.

Comment 15 Sahina Bose 2019-06-13 10:26:43 UTC
Since sanlock packages are now available as per  bug 1690511, is this feature ready to be tested

Comment 16 Gobinda Das 2019-06-17 07:22:22 UTC
*** Bug 1720923 has been marked as a duplicate of this bug. ***

Comment 17 Sahina Bose 2019-06-17 07:25:09 UTC
*** Bug 1711054 has been marked as a duplicate of this bug. ***

Comment 18 Nir Soffer 2019-07-03 19:19:34 UTC
(In reply to Sahina Bose from comment #15)
> Since sanlock packages are now available as per  bug 1690511, is this
> feature ready to be tested

Not yet, sanlock is required but we need to add 4k support in:
- vdsm (in progress, mostly done)
- engine (mostly done)
- ovirt-imageio (work not started yet)
- hosted engine (status unknown)

When all parts are done, we can start testing RHHI.

Comment 19 Nir Soffer 2019-07-11 18:42:38 UTC
We depend on ioprocess 1.2.1. Should be released for 4.3.6.

Comment 20 Nir Soffer 2019-07-15 00:27:12 UTC
We depend also on 4k support in ovirt-imageio (bug 1729809).

Comment 21 RHV bug bot 2019-08-02 17:21:26 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Open patch attached]

For more info please contact: infra

Comment 22 Nir Soffer 2019-08-04 17:59:55 UTC
Testing on Fedora 29 hosts and gluster storage show that qemu does not detect
the block size of the underlying storage, so provisioning VMs on gluster 4k
storage fails. The issue is tracked in bug 1737256.

Comment 23 Nir Soffer 2019-08-04 21:52:00 UTC
We can work around the issue described in bug 1737256 by adding <blockio> element
to libvirt xml (see https://gerrit.ovirt.org/c/102308/). With this we can provision
a VM, but it will not boot. The issue is tracked in bug 1737268.

Comment 24 Nir Soffer 2019-08-04 21:54:57 UTC
To test with gluster 4k storage, you need to enable gluster 4k support in vdsm.

Create a drop-in configuration file:
    
$ cat /etc/vdsm/vdsm.conf.d/gluster.conf
[gluster]
enable_4k_storage = true
    
And restart vdsm to load the new configuration.

Comment 25 Nir Soffer 2019-08-06 11:53:18 UTC
With tiny qemu patch:
https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html

I can provision VMs on gluster 4k storage when using fuse mount.

I don't think it will fix libgfapi, need to test.

Comment 26 Yaniv Kaul 2019-08-06 12:01:06 UTC
(In reply to Nir Soffer from comment #25)
> With tiny qemu patch:
> https://lists.nongnu.org/archive/html/qemu-block/2019-08/msg00133.html
> 
> I can provision VMs on gluster 4k storage when using fuse mount.
> 
> I don't think it will fix libgfapi, need to test.

We don't need libgfapi support - don't bother.

Comment 27 Nir Soffer 2019-08-06 13:04:33 UTC
(In reply to Yaniv Kaul from comment #26)
> We don't need libgfapi support - don't bother.

RHHI does not use libgfapi?

Comment 28 Yaniv Kaul 2019-08-06 21:13:30 UTC
(In reply to Nir Soffer from comment #27)
> (In reply to Yaniv Kaul from comment #26)
> > We don't need libgfapi support - don't bother.
> 
> RHHI does not use libgfapi?

Never, because not all features worked well with libgfapi (live migration? don't remember right now).

Comment 29 Nir Soffer 2019-08-07 18:43:27 UTC
Another issue with qemu is copying disks. qemu-img convert fails to reading from
gluster 4k storage because of alignment issue. With the same fix for bug 1737256,
the read issue if fixed. However there is another issue with detecting block
size of the target image, causing writes to gluster 4k storage to fail.
The issue is tracked in bug 1738657.

We can workaround this issue in oVirt by using the -n option to qemu convert.
We will provide a workaround in the next 4.3.6 build.

Comment 30 Nir Soffer 2019-08-07 23:13:53 UTC
(In reply to Nir Soffer from comment #29)
> We can workaround this issue in oVirt by using the -n option to qemu convert.
> We will provide a workaround in the next 4.3.6 build.

But if we use the -n option, image preallocation during convert will be broken.
The issue is tracked in bug 1738721.

Comment 32 SATHEESARAN 2020-01-28 07:59:22 UTC
Tested with RHV 4.3.8 + RHGS 3.5.1 ( glusterfs-6.0-29.el7rhgs )

1. All the RHHI-V core cases are run with 4K disks as well as with 4K VDO devices.
2. Deployment and functionally works good without any issues

Comment 33 SATHEESARAN 2020-01-28 08:01:13 UTC
Tested with RHV 4.3.8 + RHGS 3.5.1 ( glusterfs-6.0-29.el7rhgs )

1. All the RHHI-V core cases are run with 4K disks as well as with 4K VDO devices.
2. Deployment and functionally works good without any issues

Comment 34 Sahina Bose 2020-02-06 11:01:56 UTC
*** Bug 1574681 has been marked as a duplicate of this bug. ***

Comment 35 Sandro Bonazzola 2020-02-26 16:31:51 UTC
This bugzilla is included in oVirt 4.3.8 release, published on January 27th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.3.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.