Bug 1946199 - Cannot create Logical Volume ... Failed to wipe signatures
Summary: Cannot create Logical Volume ... Failed to wipe signatures
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.6
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ovirt-4.4.6
: 4.4.6
Assignee: Nir Soffer
QA Contact: Roni
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-05 10:52 UTC by Roni
Modified: 2022-04-07 08:52 UTC (History)
19 users (show)

Fixed In Version: vdsm-4.40.60.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-01 13:21:12 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
succeeded_to_add_iscsi_sd (406.89 KB, application/zip)
2021-04-07 10:58 UTC, Shir Fishbain
no flags Details
LVM configuration on host reproducing the issue (25.81 KB, application/gzip)
2021-04-07 20:01 UTC, Nir Soffer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2178 0 None None None 2021-06-01 13:21:18 UTC
oVirt gerrit 114120 0 master MERGED lvm: Do not prompt when wiping signatures 2021-04-07 12:34:46 UTC
oVirt gerrit 114170 0 master MERGED lvm: Disable wiping signatures 2021-04-08 07:45:11 UTC
oVirt gerrit 114183 0 master MERGED tests: Test lvcreate with leftover file system 2021-04-08 07:45:11 UTC

Description Roni 2021-04-05 10:52:53 UTC
Created attachment 1769242 [details]
engine logs

Created attachment 1769242 [details]
engine logs

Description of problem:
Fail to add ISCSI storage

Version-Release number of selected component (if applicable):
4.4.6.1-0.11.el8ev

How reproducible:
100%

Steps to Reproduce:
1. Provision host with rhel-8.4
2. Deploy oVirt standalone
3. Add storage using 'ovirt-ansible-collection' 'storages' role

Actual results:
Fail to add storage

Expected results:
Should succeed

Additional info:
2021-04-05 12:39:51,174+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] START, CreateStorageDomainVDSCommand(HostName = host_mixed_1, CreateStorageDomainVDSCommandParameters:{hostId='e9bac068-3c18-4598-b770-006668898a3c', storageDomain='StorageDomainStatic:{name='iscsi_0', id='9ea2871d-844b-4d9e-8ab5-2657681dc7ca'}', args='grXCdu-qaca-5KQS-PIGu-06w8-czv0-jqWfc3'}), log id: 5a475ff8
2021-04-05 12:39:52,994+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] Failed in 'CreateStorageDomainVDS' method
2021-04-05 12:39:53,000+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [26cd6445] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command CreateStorageDomainVDS failed: Cannot create Logical Volume: "vgname=9ea2871d-844b-4d9e-8ab5-2657681dc7ca lvname=master err=['WARNING: ext3 signature detected on /dev/9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master at offset 1080. Wipe it? [y/n]: [n]', '  Aborted wiping of ext3.', '  1 existing signature left on the device.', '  Failed to wipe signatures on logical volume 9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master.', '  Aborting. Failed to wipe start of new LV.']"
2021-04-05 12:39:53,001+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand' return value 'StatusOnlyReturn [status=Status [code=550, message=Cannot create Logical Volume: "vgname=9ea2871d-844b-4d9e-8ab5-2657681dc7ca lvname=master err=['WARNING: ext3 signature detected on /dev/9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master at offset 1080. Wipe it? [y/n]: [n]', '  Aborted wiping of ext3.', '  1 existing signature left on the device.', '  Failed to wipe signatures on logical volume 9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master.', '  Aborting. Failed to wipe start of new LV.']"]]'

Comment 1 Benny Zlotnik 2021-04-05 11:27:52 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1894692
could be related, what is the version of lvm?

Comment 2 Nir Soffer 2021-04-05 12:12:52 UTC
(In reply to Benny Zlotnik from comment #1)
> https://bugzilla.redhat.com/show_bug.cgi?id=1894692

This seems to be the root cause for this.

Since the master lv is owned by vdsm, I think we can safely override lvm to always
wipe the signature, but I'm not sure we have a good way to do this.

We can add "wipefs -a /dev/vg-name/master" before using lvm to avoid future regressions
in lvm.

The master storage domain contains only tasks. We already handle the case when the master
storage domain is broken by reconstructing the master storage domain, so wiping the master
lv is safe.

Comment 3 Nir Soffer 2021-04-05 12:15:15 UTC
According to https://bugzilla.redhat.com/1894692#c15 we can add --yes to the lvm command
to make it succeed in this case.

Comment 4 Richard W.M. Jones 2021-04-05 12:24:07 UTC
This was the solution in libguestfs:
https://github.com/libguestfs/libguestfs/commit/21cd97732c4973db835b8b6540c8ad582ebd2bda

ISTR there's no equivalent lvm.conf option to force wiping of signatures.  The
only current option disables wiping signatures which is different.

As I said in the other bug, the ideal situation is that LVM wouldn't let
the old content leak into new LVs at all, by background zeroing / zero-on-first-read.

Comment 5 Nir Soffer 2021-04-05 12:49:33 UTC
Since we cannot guarantee the contents of a logical volume in all cases
(only when wipe-after-delete is enabled), we need to add --yes to lvcreate
so leftovers from previous usage of the logical volume do not fail the
operation.

See comment 4 for libguestfs patch fixing same issue.

Comment 6 Roni 2021-04-05 13:36:14 UTC
(In reply to Benny Zlotnik from comment #1)
> https://bugzilla.redhat.com/show_bug.cgi?id=1894692
> could be related, what is the version of lvm?

[root@lynx09 ~]# lvm version
  LVM version:     2.03.11(2)-RHEL8 (2021-01-28)
  Library version: 1.02.175-RHEL8 (2021-01-28)
  Driver version:  4.43.0

Comment 7 Eyal Shenitzky 2021-04-05 13:54:07 UTC
Thanks Nir and Benny for the analysis.
Roman, let's submit a fix as soon as we can.

Comment 8 Eyal Shenitzky 2021-04-05 13:55:42 UTC
Roni, can you please provide all the detailed steps that done in -
"Add storage using 'ovirt-ansible-collection' 'storages' role" ?

Comment 10 Nir Soffer 2021-04-05 15:02:56 UTC
Roni, can  you test if this vdsm build fixes the issue?
https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build-artifacts.build-py3.el8.x86_64/

Hopefully you can use this vdsm build instead of the release build
in your deployment.

Comment 12 Roni 2021-04-05 21:56:30 UTC
(In reply to Eyal Shenitzky from comment #8)
> Roni, can you please provide all the detailed steps that done in -
> "Add storage using 'ovirt-ansible-collection' 'storages' role" ?

Sure please see attachment "ovirt-ansible-collection storages roles console output" above

Comment 14 Roni 2021-04-05 23:47:32 UTC
(In reply to Nir Soffer from comment #10)
> Roni, can  you test if this vdsm build fixes the issue?
> https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build-
> artifacts.build-py3.el8.x86_64/
> 
> Hopefully you can use this vdsm build instead of the release build
> in your deployment.

Nir, not fixed! please see attached "engine logs (vdsm upgrade to v4.40.60.2-10)"

2021-04-06 02:07:31,919+03 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-11) [0d44f274-d216-43a5-b825-f082c0605c45] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem

Comment 15 Vojtech Juranek 2021-04-06 06:54:12 UTC
(In reply to Roni from comment #14)
> (In reply to Nir Soffer from comment #10)
> > Roni, can  you test if this vdsm build fixes the issue?
> > https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build-
> > artifacts.build-py3.el8.x86_64/
> > 
> > Hopefully you can use this vdsm build instead of the release build
> > in your deployment.
> 
> Nir, not fixed! please see attached "engine logs (vdsm upgrade to
> v4.40.60.2-10)"
> 
> 2021-04-06 02:07:31,919+03 ERROR
> [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default
> task-11) [0d44f274-d216-43a5-b825-f082c0605c45] The connection with details
> '00000000-0000-0000-0000-000000000000' failed because of error code '465'
> and error message is: failed to setup iscsi subsystem

this looks like BZ #1788631. Could you please confirm you are not hitting this issue (i.e. mixing IP address and DNS name)?

Comment 23 Nir Soffer 2021-04-06 13:59:01 UTC
The error:

vdsm.storage.iscsiadm.IscsiNodeError: (15,
Logging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260]
Logging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260]
Login to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260] successful.
iscsiadm: Could not login to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260].
iscsiadm: initiator reported error (15 - session exists)\niscsiadm: Could not log into all portals

Mean we try to login twice to the same portal.

This happens when we mixing server ip address and dns name, and in general when
not using the addresses returned by ovirt when discovering the targets.

Correct use of ovirt API is:
- discover targets
- extract the targets from the response
- use the extracted targets when connecting to the targets

Using hard coded addresses like described in comments 17 is unlikely to work, 
unless the underlying ansible code is using the APIs according these rules.

Comment 34 Shir Fishbain 2021-04-07 10:58:12 UTC
Created attachment 1769813 [details]
succeeded_to_add_iscsi_sd

Comment 36 Nir Soffer 2021-04-07 11:29:46 UTC
I agree with Vojta about opening new buf for scsi login.

Regarding the scsi login issue - the real world normal flow is that host has
no scsi nodes and RHV is handling scsi discovery and login.

Handling host with arbitrary scsi nodes using automatic login can be tested 
separately as negative tests, but should not be the normal flow. 

Do not keep scsi nodes from cleanup step on the hosts!

Comment 38 Nir Soffer 2021-04-07 13:09:24 UTC
Patch was merged and should be available in the next compose.

Comment 40 Nir Soffer 2021-04-07 19:51:38 UTC
I reproduce this on rhel 8.4.

1. Manged storage domain and uncheck "Discard after delete" advanced option.

The bug does not affect users using discard after delete, since in this case
the previous contents of the disk are zeroed before a disk is deleted.

2. Create new preallocated disk

3. Create file system on the disk

This can be done by attaching the disk to a VM, or by activating the 
LV on the host and running mkfs.xfs.

4. Delete the disk

5. Create new preallocated disk with same size

When creating the new disk, the operation fails with:

2021-04-07 22:36:38,416+0300 ERROR (tasks/0) [storage.TaskManager.Task] (Task='bfaf87f4-e34e-447e-a512-fc92622bebfb') Unexpect
ed error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1921, in createVolume
    initial_size=initialSize, add_bitmaps=addBitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1081, in createVolume
    initial_size=initial_size, add_bitmaps=add_bitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1253, in create
    add_bitmaps=add_bitmaps)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/blockVolume.py", line 508, in _create
    initialTags=(sc.TAG_VOL_UNINIT,))
  File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1582, in createLV
    raise se.CannotCreateLogicalVolume(vgName, lvName, err)
vdsm.storage.exception.CannotCreateLogicalVolume: Cannot create Logical Volume: "vgname=31e3675e-302a-4a22-ad9a-402c0c1e9765 l
vname=d94aec7f-0f95-4741-8c11-7434bab09889 err=['WARNING: xfs signature detected on /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/
d94aec7f-0f95-4741-8c11-7434bab09889 at offset 0. Wipe it? [y/n]: [n]', '  Aborted wiping of xfs.', '  1 existing signature le
ft on the device.', '  Failed to wipe signatures on logical volume 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c1
1-7434bab09889.', '  Aborting. Failed to wipe start of new LV.']"

Why it failed:

The deleted disk had XFS file system. The new disk was created using the same area
on storage. After LVM zero the start of the disk it look for file system signatures
and try to wipe the XFS signature.

This issue affects any lvcreate operation, and wiping LUNs before using them will
not help. Each time a raw disk is deleted, the next disk created may fail.

Comment 41 Nir Soffer 2021-04-07 19:57:15 UTC
This is the failing lvm command:

2021-04-07 22:36:37,960+0300 WARN  (tasks/0) [storage.LVM] Command with specific filter failed or returned no data, retrying w
ith a wider filter, cmd=['/sbin/lvm', 'lvcreate', '--config', 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_
devices=1  write_cache_state=0  disable_after_error_count=3  filter=["a|^/dev/mapper/36001405e78e5d0feeb043ffa0aea5010$|", "r|
.*|"]  hints="none"  obtain_device_list_from_udev=0 } global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  us
e_lvmetad=0 } backup {  retain_min=50  retain_days=0 }', '--autobackup', 'n', '--contiguous', 'n', '--size', '2048m', '--addta
g', 'OVIRT_VOL_INITIALIZING', '--name', 'd94aec7f-0f95-4741-8c11-7434bab09889', '31e3675e-302a-4a22-ad9a-402c0c1e9765'] rc=5 o
ut=[] err=['WARNING: xfs signature detected on /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889 
at offset 0. Wipe it? [y/n]: [n]', '  Aborted wiping of xfs.', '  1 existing signature left on the device.', '  Failed to wipe
 signatures on logical volume 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889.', '  Aborting. Failed
 to wipe start of new LV.'] (lvm:511)

Zdenek, how LVM found a signature at offset 0, if it zero the lv before wiping?

Comment 42 Nir Soffer 2021-04-07 20:01:04 UTC
Created attachment 1770018 [details]
LVM configuration on host reproducing the issue

Comment 44 Nir Soffer 2021-04-07 20:20:25 UTC
(In reply to Nir Soffer from comment #41)
Correction for LVM flow (from lvmguy):

When using wipe_signatures_when_zeroing_new_lvs=1 lvm wipe
signature before zeroing. so detecting the signature at offset 0
is expected.

Comment 45 Nir Soffer 2021-04-07 20:40:44 UTC
This should be fixed in vdsm 4.40.60.3.

However we want to fix this in a safer way, disabling wiping signatures
instead of using global --yes that may have unwanted results in future
lvm versions.

Comment 46 Nir Soffer 2021-04-07 21:04:06 UTC
Easier way to reproduce and verify:

1. Start a vm 

2. Add new raw preallocated disk on block storage domain that does not
   use discard-after-delete

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0    6G  0 disk 
├─sda1   8:1    0    1M  0 part 
├─sda2   8:2    0    1G  0 part /boot
├─sda3   8:3    0  615M  0 part [SWAP]
└─sda4   8:4    0  4.4G  0 part /
sr0     11:0    1 1024M  0 rom  
vda    252:0    0    2G  0 disk 

3. In the guest, create file system on the new disk

# mkfs.xfs /dev/vda
meta-data=/dev/vda               isize=512    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.

3. Deactivate the disk and remove it (Remove permanently)

4. Immediately add new disk with same format/size to the vm

On vdsm version without the fix the operation will fail when lvm
fail to wipe the signatures.

With fixed version, the operation will succeed.

5. In the guest, verify that the first 4k of the disk are zeroed:

# dd if=/dev/vda bs=4K count=1 status=none | hexdump 
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000

Comment 47 Zdenek Kabelac 2021-04-08 08:46:39 UTC
(In reply to Nir Soffer from comment #41)
> This is the failing lvm command:
> 'd94aec7f-0f95-4741-8c11-7434bab09889',
> '31e3675e-302a-4a22-ad9a-402c0c1e9765'] rc=5 o
> ut=[] err=['WARNING: xfs signature detected on
> /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-
> 7434bab09889 
> at offset 0. Wipe it? [y/n]: [n]', '  Aborted wiping of xfs.', '  1 existing
> signature left on the device.', '  Failed to wipe
>  signatures on logical volume
> 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889.',
> '  Aborting. Failed
>  to wipe start of new LV.'] (lvm:511)
> 
> Zdenek, how LVM found a signature at offset 0, if it zero the lv before
> wiping?

Wiping is extension of original plain zeroing.
So the order of operation:

When Wiping Y -> first checks for signatures - if there is found a signature -> prompt for wiping.

If answered 'n'  -> command is aborted as likely the user does not want to damage the content of disk.
(User can use  -Wn to avoid wiping  (which comes with implicit -Zn))

If answered 'y'  ->  lvm2 wipes all signatures +  also zeroes first 8k.

When Wiping N -> no signature checking and according   -Z y|n  first 8k is cleared (default is y).

Comment 49 Sandro Bonazzola 2021-04-15 07:27:45 UTC
All referenced patches are merged, can you please update on this bug status?

Comment 50 Nir Soffer 2021-04-16 19:42:31 UTC
Fixed in:

commit e8b7f23a82aa5cbc6e5921c31d0c64674d1c2c6b

    lvm: Disable wiping signatures

$ git describe e8b7f23a82aa5cbc6e5921c31d0c64674d1c2c6b
v4.40.60.3-3-ge8b7f23a8

Comment 54 Roni 2021-04-25 14:55:53 UTC
Verified on: 4.4.6.5-0.17.el8ev

Comment 58 errata-xmlrpc 2021-06-01 13:21:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.6]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2178


Note You need to log in before you can comment on or make changes to this bug.