1608931 – Guest image ownership is changed to root:root after second round of migration with killing src qemu at Finish phase

Bug 1608931 - Guest image ownership is changed to root:root after second round of migration with killing src qemu at Finish phase

Summary: Guest image ownership is changed to root:root after second round of migratio...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.1
Assignee:	Virtualization Maintenance
QA Contact:	Fangge Jin
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-26 14:03 UTC by Fangge Jin
Modified:	2021-02-15 07:41 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-15 07:41:01 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
libvirtd log (587.35 KB, application/x-bzip) 2018-07-26 14:03 UTC, Fangge Jin	no flags	Details
View All

Description Fangge Jin 2018-07-26 14:03:02 UTC

Created attachment 1470788 [details]
libvirtd log

Description of problem:
Quest disk image ownership is changed to root:root after second round of migration with killing src qemu at Finish phase

Version-Release number of selected component (if applicable):
libvirt-4.5.0-4.el7.x86_64
qemu-kvm-rhev-2.12.0-8.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a guest on source host with guest disk image located on nfs

2.Attach gdb to libvirtd on the destination host and set breakpoint to qemuMigrationDstFinish

3.Migrate guest to target with or without --p2p
# virsh migrate rhel7-min qemu+ssh://$target/system --live --verbose

4.Wait until gdb hits the breakpoint

5.Kill QEMU process on the source host

6.Run "continue" command in gdb

7.After migration succeeds, migrate guest back to source host with or without --p2p:
# virsh migrate rhel7-min qemu+ssh://10.66.5.190/system --live --verbose --migrateuri tcp://10.66.5.190

8.Redo step 3-6

9.After migration succeeds, check guest image ownership:
# ll /nfs/RHEL-7.5-x86_64-latest.qcow2
-rw-r--r--. 1 root root 1403650048 Jul 26 09:46 RHEL-7.5-x86_64-latest.qcow2


Actual results:
As step 9, guest image ownership is changed to root:root.

Expected results:
Guest image ownership should be restored during migration.

Additional info:
In libvirtd.log, I see migrated=0, which is not corrected:
2018-07-26 03:49:41.020+0000: 16709: debug : virSecurityDACRestoreAllLabel:1560 : Restoring security label on rhel7-min migrated=0
2018-07-26 03:49:41.020+0000: 16709: info : virSecurityDACRestoreFileLabelInternal:665 : Restoring DAC user and group on '/nfs/RHEL-7.5-x86_64-latest.qcow2'
2018-07-26 03:49:41.020+0000: 16709: info : virSecurityDACSetOwnershipInternal:567 : Setting DAC user and group on '/nfs/RHEL-7.5-x86_64-latest.qcow2' to '0:0'

Comment 2 Jiri Denemark 2018-07-26 15:09:51 UTC

I'm not quite sure why this would happen only after the second migration, but
the problem is the monitor EOF handler which is called when you kill the QEMU
process on the source does not care about the migration and just resets
everything. This is usually fine because killed domain results in failed
migration most of the time, but it doesn't work in this corner case when the
QEMU process gets killed just after the migration actually finished (i.e., at
the point libvirtd itself would kill the process).

We could perhaps somehow check the current phase of migration in the EOF
handler so that it can pass migrated=1 when appropriate.

Comment 3 Daniel Berrangé 2018-07-26 15:11:19 UTC

It might perhaps be caused if the filesystem is different on each host. ie a local ext4 FS on one host, and then exported as NFS to the second host.

Comment 4 Fangge Jin 2018-07-26 23:35:25 UTC

(In reply to Daniel Berrange from comment #3)
> It might perhaps be caused if the filesystem is different on each host. ie a
> local ext4 FS on one host, and then exported as NFS to the second host.

Hi Daniel

I checked my test env on two hosts, they are configured correctly:

Target:
# mount |grep nfs
10.66.4.124:/nfs on /nfs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=10,retrans=2,sec=sys,clientaddr=10.73.131.69,local_lock=none,addr=10.66.4.124)


Source:
10.66.4.124:/nfs on /nfs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=15,retrans=1,sec=sys,clientaddr=10.66.5.190,local_lock=none,addr=10.66.4.124)

If "a local ext4 FS on one host, and then exported as NFS to the second host", the guest image ownership will be changed to root:root after the FIRST migration.

Comment 5 Lili Zhu 2018-09-04 09:36:37 UTC

Test with libvirt-4.5.0-8.virtcov.el7.x86_64
Now Only need try to migration for only one time, the migration after "continue" in gdb will be FAILED. 

1.Start a guest on source host with guest disk image located on nfs
# ll /mnt/nfs/lizhu/images/rhel7.6-GUI.img 
-rw-------. 1 qemu qemu 10739318784 Sep  4 05:31 /mnt/nfs/lizhu/images/rhel7.6-GUI.img

2.Attach gdb to libvirtd on the destination host and set breakpoint to qemuMigrationDstFinish

3.Migrate guest to target
# virsh migrate avocado-vt-vm1 qemu+ssh://10.73.73.112/system --verbose --live 

4.Wait until gdb hits the breakpoint

5.Kill QEMU process on the source host

6.Run "continue" command in gdb

7.Check the migration process
# virsh migrate avocado-vt-vm1 qemu+ssh://10.73.73.112/system --verbose --live 
Migration: [100 %]2018-09-04 09:32:46.794+0000: 14588: info : libvirt version: 4.5.0, package: 8.virtcov.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2018-09-03-10:48:21, x86-034.build.eng.bos.redhat.com)
2018-09-04 09:32:46.794+0000: 14588: info : hostname: ***
2018-09-04 09:32:46.794+0000: 14588: warning : virDomainMigrateVersion3Full:3249 : Guest avocado-vt-vm1 probably left in 'paused' state on source
error: internal error: unable to execute QEMU command 'cont': Could not reopen qcow2 layer: Could not read qcow2 header: Permission denied

8. check the guest image
# ll /mnt/nfs/lizhu/images/rhel7.6-GUI.img 
-rw-------. 1 root root 10739318784 Sep  4 05:32 /mnt/nfs/lizhu/images/rhel7.6-GUI.img

Comment 6 Jiri Denemark 2018-09-04 10:16:14 UTC

This matches what I described in comment #2. We need to enhance the monitor
EOF handler a bit.

Comment 7 Jiri Denemark 2019-04-25 09:56:43 UTC

This bug is going to be addressed in next major release.

Comment 8 Jaroslav Suchanek 2020-02-18 14:29:51 UTC

Can you please try to reproduce it with current rhel-av build? Thanks.

Comment 9 Fangge Jin 2020-02-25 09:21:44 UTC

Test with libvirt-6.0.0-6.virtcov.el8.x86_64 and qemu-kvm-4.2.0-11.module

Same steps as comment 5, it can still be reproduced

# virsh -k0 migrate rhev qemu+ssh://xxxxx/system --live --verbose
Migration: [100 %]error: internal error: unable to execute QEMU command 'cont': Could not reopen file: Permission denied

Comment 12 RHEL Program Management 2021-02-15 07:41:01 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.