Bug 1973540

Summary: NBD+tls repeat installation failed with rewriting storage
Product: Red Hat Enterprise Linux 9 Reporter: zixchen
Component: qemu-kvmAssignee: Eric Blake <eblake>
qemu-kvm sub component: NBD QA Contact: zixchen
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: coli, jinzhao, kkiwi, rjones, virt-maint
Version: 9.0Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-09 05:16:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
storage log none

Description zixchen 2021-06-18 06:28:01 UTC
Description of problem:
Repeat install a guest with nbd+tls storage, installation failed after the first time install, error message is _ped.IOException: Partition(s) 2 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.

Version-Release number of selected component (if applicable):
kernel-5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-3.el9.x86_64

How reproducible:
50%

Steps to Reproduce:
1. Repeat install a guest with nbd+tls on the same export image for 3 times
    -blockdev node-name=nbd_image1,driver=nbd,auto-read-only=on,discard=unmap,server.type=inet,server.host=$localhost_name,server.port=10809,tls-creds=image1_access,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=nbd_image1 \


Actual results:
There is a 50% installation fail after the first installation.
error log:
_ped.IOException: Partition(s) 2 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.

Expected results:
Installation successfully

Additional info:
this issue can be only reproduced on rhel9, not on rhel 8.5
please check the full storage log from the attachment

Comment 1 zixchen 2021-06-18 06:32:27 UTC
Created attachment 1791976 [details]
storage log

Comment 2 Klaus Heinrich Kiwi 2021-07-01 21:23:58 UTC
Another one assigned to Eric - there is a flurry of NBD+tls bugs apparently. Hopefully there's some overlap between them, or at least a setup that can be reused. I'll keep this one in high priority since it's a functional issue.

Comment 4 Richard W.M. Jones 2021-07-02 07:37:52 UTC
I can't find the kickstart file in the Polarion case, but it would be very
useful to see it.  I think in any case this is likely to be a bug/regression
in Anaconda, since really it ought to be wiping signatures on the disk
before attempting to install.  (This bug would affect all installs over
an existing install, not just for NBD)

Comment 5 zixchen 2021-07-02 08:25:13 UTC
Sorry I lost the kickstart file, I will reproduce it with latest rhel9 and attach the kickstart file.

The kickstart file is generated in automation, it is the same for each test runs.

Comment 6 zixchen 2021-07-09 05:15:00 UTC
I've test on qemu-kvm-6.0.0-7.el9.x86_64, no such issue found. I attach the kickstart file in the attachemnt 

Version:
qemu-kvm-6.0.0-7.el9.x86_64
kernel-5.13.0-0.rc7.51.el9.x86_64
anaconda-core-34.25.0.9-1.el9.x86_64

Steps:
1. Repeat installation with nbd+tls image 3 times.

Actual Results:
Repeatedly installation succeeds.

As the the issue is not exists, closed as current released.