Created attachment 1369520 [details] sosreport from purple-vds1 Description of problem: SHE deployment fails over FC. 2017-12-18 15:27:21,752+0200 INFO (jsonrpc/0) [vdsm.api] START createVG(vgname='f3f15c92-9d84-4c5e-a65c-fb5212181069', devlist=['3514f0c5a51601092'], force=True, options=None) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:46) 2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] pvcreate failed with rc=5 (lvm:754) 2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] [], [' Device /dev/mapper/3514f0c5a51601092 not found (or ignored by filtering).'] (lvm:755) 2017-12-18 15:27:21,891+0200 INFO (jsonrpc/0) [vdsm.api] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:50) 2017-12-18 15:27:21,891+0200 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "<string>", line 2, in createVG File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2129, in createVG force=force) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 983, in createVG _initpvs(pvs, metadataSize, force) File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 756, in _initpvs raise se.PhysDevInitializationError(str(devices)) PhysDevInitializationError: Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) 2017-12-18 15:27:21,951+0200 INFO (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') aborting: Task is aborted: 'Failed to initialize physical device: ("[\'/dev/mapper/3514f0c5a51601092\']",)' - code 601 (task:1181) 2017-12-18 15:27:21,951+0200 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) (dispatcher:82) 3514f0c5a51601092 dm-1 XtremIO ,XtremApp size=60G features='0' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active |- 0:0:0:1 sda 8:0 active ready running `- 0:0:1:1 sdb 8:16 active ready running Version-Release number of selected component (if applicable): ovirt-imageio-daemon-1.2.0-0.el7ev.noarch ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch ovirt-imageio-common-1.2.0-0.el7ev.noarch mom-0.5.11-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-host-deploy-1.7.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-setup-lib-1.1.4-1.el7ev.noarch qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64 vdsm-4.20.9.2-1.el7ev.x86_64 ovirt-host-4.2.0-1.el7ev.x86_64 ovirt-provider-ovn-driver-1.2.2-1.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 ovirt-host-dependencies-4.2.0-1.el7ev.x86_64 libvirt-client-3.2.0-14.el7_4.5.x86_64 Linux version 3.10.0-693.15.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Dec 14 05:13:32 EST 2017 Linux purple-vds1.qa.lab.tlv.redhat.com 3.10.0-693.15.1.el7.x86_64 #1 SMP Thu Dec 14 05:13:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) How reproducible: 100% on purple-vds1. Steps to Reproduce: 1.Deploy SHE over FC. Actual results: Deployment fails. Expected results: Deployment should succeed. Additional info: Sosreport from host. Deployment details http://pastebin.test.redhat.com/541628.
This could be caused due to a "dirty" LUN. Check here how to solve this: https://bugzilla.redhat.com/show_bug.cgi?id=1343043
I suspect that the target device was dirty. Nikolai, could you please clean that device and retry?
Yes, the target seems to be dirty and using "Force" option has failed. I'll retry now with also clean target in order to verify.
(In reply to Simone Tiraboschi from comment #2) > I suspect that the target device was dirty. > Nikolai, could you please clean that device and retry? Forth to our latest chat, this is what happened during deployment: https://bugzilla.redhat.com/show_bug.cgi?id=1215427#c4
I've had this during deployment: "Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: fc The following luns have been found on the requested target: [1] 3514f0c5a51601092 60GiB XtremIO XtremApp status: used, paths: 2 active Please select the destination LUN (1) [1]: The selected device is already used. To create a vg on this device, you must use Force. WARNING: This will destroy existing data on the device. (Force, Abort)[Abort]? Force" It seems that "Force" is not working properly and that is the issue here.
Forth to comment #5, "Force" option is not working. Shouldn't we change the whole message during deployment as it looks like a bit misleading?
This failure happened due to inability of deployment procedure to deal with "Force" cleaning of dirty LUN. Tried on clean LUN and this failure was not reproduced.
(In reply to Nikolai Sednev from comment #6) > Forth to comment #5, "Force" option is not working. Shouldn't we change the > whole message during deployment as it looks like a bit misleading? What do you mean "not working"? Maybe it is doing something else from what you are expecting... I am not involved in the SHE setup, but I guess it will pass the "Force" to LVM. LVM will fail also with "Force" if there is a partition table on the device. It seems it is doing what is expected. BTW in oVirt we are also using "force", but it will fail the same if there is a partition. The user need to clean the LUN manually.
(In reply to Fred Rolland from comment #8) > (In reply to Nikolai Sednev from comment #6) > > Forth to comment #5, "Force" option is not working. Shouldn't we change the > > whole message during deployment as it looks like a bit misleading? > > What do you mean "not working"? Maybe it is doing something else from what > you are expecting... > > I am not involved in the SHE setup, but I guess it will pass the "Force" to > LVM. > LVM will fail also with "Force" if there is a partition table on the device. > > It seems it is doing what is expected. > > BTW in oVirt we are also using "force", but it will fail the same if there > is a partition. The user need to clean the LUN manually. The message is confusing, customer thinks that using Force option will resolve the dirty LUN issue on its own, but its not, so deployment fails with no reasonable explanation. We should think on replacing that sentence and either provide link to documentation or explain that LUN must be cleaned manually or both. If I remember correctly, on NFS Force once worked fine and deployment could proceed correctly. It could also be option of using dd even from deployment for dirty LUN cases in order to clean it up, it could be very beneficial for automated deployments on large scale.
(In reply to Fred Rolland from comment #8) > (In reply to Nikolai Sednev from comment #6) > > Forth to comment #5, "Force" option is not working. Shouldn't we change the > > whole message during deployment as it looks like a bit misleading? > > What do you mean "not working"? Maybe it is doing something else from what > you are expecting... > > I am not involved in the SHE setup, but I guess it will pass the "Force" to > LVM. > LVM will fail also with "Force" if there is a partition table on the device. Which is why -ff was invented ('will forcibly create a PV, overriding checks that normally prevent it'). I suspect if we wish to enable force, we need to use it. > > It seems it is doing what is expected. > > BTW in oVirt we are also using "force", but it will fail the same if there > is a partition. The user need to clean the LUN manually.
(In reply to Yaniv Kaul from comment #10) > (In reply to Fred Rolland from comment #8) > > (In reply to Nikolai Sednev from comment #6) > > > Forth to comment #5, "Force" option is not working. Shouldn't we change the > > > whole message during deployment as it looks like a bit misleading? > > > > What do you mean "not working"? Maybe it is doing something else from what > > you are expecting... > > > > I am not involved in the SHE setup, but I guess it will pass the "Force" to > > LVM. > > LVM will fail also with "Force" if there is a partition table on the device. > > Which is why -ff was invented ('will forcibly create a PV, overriding checks > that normally prevent it'). I suspect if we wish to enable force, we need to > use it. It will fail also with ff. We are already using it. https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/lvm.py#L746 Check comment4: https://bugzilla.redhat.com/show_bug.cgi?id=1215427#c4 > > > > > It seems it is doing what is expected. > > > > BTW in oVirt we are also using "force", but it will fail the same if there > > is a partition. The user need to clean the LUN manually.
Does this reproduce consistently? Can you try with a different storage?
Can you try with a different FC storage?*
(In reply to Yaniv Lavi from comment #13) > Can you try with a different FC storage?* No, I have only single FC storage in QE.
It will work on clean FC volume but if volume being cleaned using force option, it won't succeed.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
*** Bug 1542377 has been marked as a duplicate of this bug. ***
Please file or reference the RHEL bug and move this ticket out as blocked, once it's attached.
(In reply to Yaniv Lavi from comment #18) > Please file or reference the RHEL bug and move this ticket out as blocked, > once it's attached. Done: BZ#1545627
Needs to be tested again with EL8 hosts
Nikolay, can yo re-test this with this week 4.4 compose?
Usage of wipefs -a or dd work fine for this, moving to verified. [ INFO ] ok: [localhost] The following luns have been found on the requested target: [1] 360002ac0000000000000131000021f6b 120.0GiB 3PARdata VV status: used, paths: 4 active Please select the destination LUN (1) [1]: [ INFO ] FC discard after delete is enabled [ INFO ] Creating Storage Domain [ INFO ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Force facts gathering] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Check local VM dir stat] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Enforce local VM dir existence] [ INFO ] skipping: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch host facts] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch cluster ID] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch cluster facts] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter facts] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter ID] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter name] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Add NFS storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Add glusterfs storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Add iSCSI storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [ovirt.hosted_engine_setup : Add Fibre Channel storage domain] [ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Physical device initialization failed. Please check that the device is empty and accessible by the host.]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Physical device initialization failed. Please check that the device is empty and accessible by the host.]\". HTTP response code is 400."} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Tested on latest 4.4 components: ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.