Bug 1527077 - [BLOCKED] SHE deployment fails over FC when selecting 'Force' to deploy on a dirty LUN
Summary: [BLOCKED] SHE deployment fails over FC when selecting 'Force' to deploy on a ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine
Version: 4.2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ovirt-4.4.0
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1542377 (view as bug list)
Depends On: 1545627 1795672
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-18 13:51 UTC by Nikolai Sednev
Modified: 2020-05-20 20:03 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-05-20 20:03:19 UTC
oVirt Team: Integration
Embargoed:
nsednev: needinfo+
nsednev: needinfo-
rule-engine: ovirt-4.4+
ylavi: blocker-
ylavi: exception+


Attachments (Terms of Use)
sosreport from purple-vds1 (9.33 MB, application/x-xz)
2017-12-18 13:51 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2017-12-18 13:51:53 UTC
Created attachment 1369520 [details]
sosreport from purple-vds1

Description of problem:
SHE deployment fails over FC.
2017-12-18 15:27:21,752+0200 INFO  (jsonrpc/0) [vdsm.api] START createVG(vgname='f3f15c92-9d84-4c5e-a65c-fb5212181069', devlist=['3514f0c5a51601092'], force=True, options=None) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:46)
2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] pvcreate failed with rc=5 (lvm:754)
2017-12-18 15:27:21,890+0200 ERROR (jsonrpc/0) [storage.LVM] [], ['  Device /dev/mapper/3514f0c5a51601092 not found (or ignored by filtering).'] (lvm:755)
2017-12-18 15:27:21,891+0200 INFO  (jsonrpc/0) [vdsm.api] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) from=::1,49016, task_id=5b5cab39-060b-4a1d-9b24-911fbf741264 (api:50)
2017-12-18 15:27:21,891+0200 ERROR (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "<string>", line 2, in createVG
  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2129, in createVG
    force=force)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 983, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 756, in _initpvs
    raise se.PhysDevInitializationError(str(devices))
PhysDevInitializationError: Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",)
2017-12-18 15:27:21,951+0200 INFO  (jsonrpc/0) [storage.TaskManager.Task] (Task='5b5cab39-060b-4a1d-9b24-911fbf741264') aborting: Task is aborted: 'Failed to initialize physical device: ("[\'/dev/mapper/3514f0c5a51601092\']",)' - code 601 (task:1181)
2017-12-18 15:27:21,951+0200 ERROR (jsonrpc/0) [storage.Dispatcher] FINISH createVG error=Failed to initialize physical device: ("['/dev/mapper/3514f0c5a51601092']",) (dispatcher:82)


3514f0c5a51601092 dm-1 XtremIO ,XtremApp        
size=60G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 0:0:0:1 sda 8:0  active ready running
  `- 0:0:1:1 sdb 8:16 active ready running


Version-Release number of selected component (if applicable):
ovirt-imageio-daemon-1.2.0-0.el7ev.noarch
ovirt-hosted-engine-ha-2.2.1-1.el7ev.noarch
ovirt-imageio-common-1.2.0-0.el7ev.noarch
mom-0.5.11-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-host-deploy-1.7.0-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.1.4-1.el7ev.noarch
qemu-kvm-rhev-2.9.0-16.el7_4.13.x86_64
vdsm-4.20.9.2-1.el7ev.x86_64
ovirt-host-4.2.0-1.el7ev.x86_64
ovirt-provider-ovn-driver-1.2.2-1.el7ev.noarch
sanlock-3.5.0-1.el7.x86_64
ovirt-host-dependencies-4.2.0-1.el7ev.x86_64
libvirt-client-3.2.0-14.el7_4.5.x86_64
Linux version 3.10.0-693.15.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Dec 14 05:13:32 EST 2017
Linux purple-vds1.qa.lab.tlv.redhat.com 3.10.0-693.15.1.el7.x86_64 #1 SMP Thu Dec 14 05:13:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.4 (Maipo)

How reproducible:
100% on purple-vds1.

Steps to Reproduce:
1.Deploy SHE over FC.

Actual results:
Deployment fails.

Expected results:
Deployment should succeed.

Additional info:
Sosreport from host.
Deployment details http://pastebin.test.redhat.com/541628.

Comment 1 Fred Rolland 2017-12-18 14:11:12 UTC
This could be caused due to a "dirty" LUN.
Check here how to solve this:

https://bugzilla.redhat.com/show_bug.cgi?id=1343043

Comment 2 Simone Tiraboschi 2017-12-18 14:11:38 UTC
I suspect that the target device was dirty.
Nikolai, could you please clean that device and retry?

Comment 3 Nikolai Sednev 2017-12-18 14:14:01 UTC
Yes, the target seems to be dirty and using "Force" option has failed.
I'll retry now with also clean target in order to verify.

Comment 4 Nikolai Sednev 2017-12-18 14:19:34 UTC
(In reply to Simone Tiraboschi from comment #2)
> I suspect that the target device was dirty.
> Nikolai, could you please clean that device and retry?

Forth to our latest chat, this is what happened during deployment:
https://bugzilla.redhat.com/show_bug.cgi?id=1215427#c4

Comment 5 Nikolai Sednev 2017-12-18 14:24:10 UTC
I've had this during deployment:
         "Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: fc
          The following luns have been found on the requested target:
                [1]     3514f0c5a51601092       60GiB   XtremIO XtremApp
                        status: used, paths: 2 active
         
          Please select the destination LUN (1) [1]: 
          The selected device is already used.
          To create a vg on this device, you must use Force.
          WARNING: This will destroy existing data on the device.
          (Force, Abort)[Abort]? Force"

It seems that "Force" is not working properly and that is the issue here.

Comment 6 Nikolai Sednev 2017-12-18 14:46:31 UTC
Forth to comment #5, "Force" option is not working. Shouldn't we change the whole message during deployment as it looks like a bit misleading?

Comment 7 Nikolai Sednev 2017-12-18 16:12:05 UTC
This failure happened due to inability of deployment procedure to deal with "Force" cleaning of dirty LUN. Tried on clean LUN and this failure was not reproduced.

Comment 8 Fred Rolland 2017-12-19 10:14:04 UTC
(In reply to Nikolai Sednev from comment #6)
> Forth to comment #5, "Force" option is not working. Shouldn't we change the
> whole message during deployment as it looks like a bit misleading?

What do you mean "not working"? Maybe it is doing something else from what you are expecting...

I am not involved in the SHE setup, but I guess it will pass the "Force" to LVM.
LVM will fail also with "Force" if there is a partition table on the device.

It seems it is doing what is expected.

BTW in oVirt we are also using "force", but it will fail the same if there is a partition. The user need to clean the LUN manually.

Comment 9 Nikolai Sednev 2017-12-19 10:25:09 UTC
(In reply to Fred Rolland from comment #8)
> (In reply to Nikolai Sednev from comment #6)
> > Forth to comment #5, "Force" option is not working. Shouldn't we change the
> > whole message during deployment as it looks like a bit misleading?
> 
> What do you mean "not working"? Maybe it is doing something else from what
> you are expecting...
> 
> I am not involved in the SHE setup, but I guess it will pass the "Force" to
> LVM.
> LVM will fail also with "Force" if there is a partition table on the device.
> 
> It seems it is doing what is expected.
> 
> BTW in oVirt we are also using "force", but it will fail the same if there
> is a partition. The user need to clean the LUN manually.

The message is confusing, customer thinks that using Force option will resolve the dirty LUN issue on its own, but its not, so deployment fails with no reasonable explanation. We should think on replacing that sentence and either provide link to documentation or explain that LUN must be cleaned manually or both.

If I remember correctly, on NFS Force once worked fine and deployment could proceed correctly. 

It could also be option of using dd even from deployment for dirty LUN cases in order to clean it up, it could be very beneficial for automated deployments on large scale.

Comment 10 Yaniv Kaul 2017-12-19 11:11:35 UTC
(In reply to Fred Rolland from comment #8)
> (In reply to Nikolai Sednev from comment #6)
> > Forth to comment #5, "Force" option is not working. Shouldn't we change the
> > whole message during deployment as it looks like a bit misleading?
> 
> What do you mean "not working"? Maybe it is doing something else from what
> you are expecting...
> 
> I am not involved in the SHE setup, but I guess it will pass the "Force" to
> LVM.
> LVM will fail also with "Force" if there is a partition table on the device.

Which is why -ff was invented ('will forcibly create a PV, overriding checks that normally prevent it'). I suspect if we wish to enable force, we need to use it.

> 
> It seems it is doing what is expected.
> 
> BTW in oVirt we are also using "force", but it will fail the same if there
> is a partition. The user need to clean the LUN manually.

Comment 11 Fred Rolland 2017-12-19 11:44:02 UTC
(In reply to Yaniv Kaul from comment #10)
> (In reply to Fred Rolland from comment #8)
> > (In reply to Nikolai Sednev from comment #6)
> > > Forth to comment #5, "Force" option is not working. Shouldn't we change the
> > > whole message during deployment as it looks like a bit misleading?
> > 
> > What do you mean "not working"? Maybe it is doing something else from what
> > you are expecting...
> > 
> > I am not involved in the SHE setup, but I guess it will pass the "Force" to
> > LVM.
> > LVM will fail also with "Force" if there is a partition table on the device.
> 
> Which is why -ff was invented ('will forcibly create a PV, overriding checks
> that normally prevent it'). I suspect if we wish to enable force, we need to
> use it.

It will fail also with ff. We are already using it.
https://github.com/oVirt/vdsm/blob/master/lib/vdsm/storage/lvm.py#L746

Check comment4:
https://bugzilla.redhat.com/show_bug.cgi?id=1215427#c4
> 
> > 
> > It seems it is doing what is expected.
> > 
> > BTW in oVirt we are also using "force", but it will fail the same if there
> > is a partition. The user need to clean the LUN manually.

Comment 12 Yaniv Lavi 2018-01-08 08:43:00 UTC
Does this reproduce consistently?
Can you try with a different storage?

Comment 13 Yaniv Lavi 2018-01-08 08:43:20 UTC
Can you try with a different FC storage?*

Comment 14 Nikolai Sednev 2018-01-08 09:29:03 UTC
(In reply to Yaniv Lavi from comment #13)
> Can you try with a different FC storage?*

No, I have only single FC storage in QE.

Comment 15 Nikolai Sednev 2018-01-08 09:30:01 UTC
It will work on clean FC volume but if volume being cleaned using force option, it won't succeed.

Comment 16 Red Hat Bugzilla Rules Engine 2018-01-15 08:24:31 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 17 Simone Tiraboschi 2018-02-06 09:12:44 UTC
*** Bug 1542377 has been marked as a duplicate of this bug. ***

Comment 18 Yaniv Lavi 2018-02-14 13:28:34 UTC
Please file or reference the RHEL bug and move this ticket out as blocked, once it's attached.

Comment 19 Simone Tiraboschi 2018-02-15 11:33:42 UTC
(In reply to Yaniv Lavi from comment #18)
> Please file or reference the RHEL bug and move this ticket out as blocked,
> once it's attached.

Done: BZ#1545627

Comment 21 Sandro Bonazzola 2019-10-23 07:11:52 UTC
Needs to be tested again with EL8 hosts

Comment 22 Sandro Bonazzola 2019-11-13 08:27:33 UTC
Nikolay, can yo re-test this with this week 4.4 compose?

Comment 29 Nikolai Sednev 2020-04-06 17:35:10 UTC
Usage of wipefs -a or dd work fine for this, moving to verified.

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     360002ac0000000000000131000021f6b       120.0GiB        3PARdata        VV
                        status: used, paths: 4 active
         
          Please select the destination LUN (1) [1]: 
[ INFO  ] FC discard after delete is enabled
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [ovirt.hosted_engine_setup : Execute just a specific set of steps]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Force facts gathering]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Check local VM dir stat]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Enforce local VM dir existence]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch host facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch cluster ID]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch cluster facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter ID]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Fetch Datacenter name]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Add NFS storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Add glusterfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Add iSCSI storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [ovirt.hosted_engine_setup : Add Fibre Channel storage domain]
[ ERROR ] ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Physical device initialization failed. Please check that the device is empty and accessible by the host.]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Physical device initialization failed. Please check that the device is empty and accessible by the host.]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Tested on latest 4.4 components:
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev

Comment 30 Sandro Bonazzola 2020-05-20 20:03:19 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.