Bug 2037218
| Summary: | VirtualDomain move fails | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | lejeczek <peljasz> |
| Component: | pcs | Assignee: | Ondrej Mular <omular> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | cluster-qe <cluster-qe> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 9.0 | CC: | agk, bstinson, cluster-maint, fdinitto, idevat, jwboyer, mlisik, mmazoure, mpospisi, mprivozn, omular, tojeline |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 9.1 | Flags: | pm-rhel:
mirror+
|
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-06-08 11:45:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
lejeczek
2022-01-05 09:39:33 UTC
Just in case I left it a bit vague - it's about live move/migration which is broken - what is still working in prev versions in CentOS 8 Stream Can you find the exact error reported in the libvirtd log? That might shed more light into why libvirt is denying migration. I'm looking at something else very strange, I see:
-> $ pcs constraint config | lesi
...
Resource: c8kubermaster2
Enabled on:
Node: whale (score:INFINITY)
and even though I do: 'clear' & 'cleanup' that constrain remains there, until I deleted the resource & re-create, now I can 'move' the resource again, albeit not! as 'live' migration.
Also 'setenforce' seems to make no difference.(unless some silent denials do)
In new C9 there is a number of vir* services which replace libvirtd.service - looking at virtqemud.service I see:
..
2022-01-05 16:58:16.399+0000: 644190: warning : virSecurityValidateTimestamp:206 : Invalid XATTR timestamp detected on /VMs3/c8kubermaster2.qcow2 secdriver=dac
internal error: unable to execute QEMU command 'cont': Failed to get "write" lock
'locking' problem affect other bits outside of PCS, backups, snapshots of VMs, now with only-via-fuse method. (unless there is some way to fuse-mount glusterFS with does the trick)
thanks, L.
-> $ pcs resource config c8kubermaster2
Resource: c8kubermaster2 (class=ocf provider=heartbeat type=VirtualDomain)
Attributes: config=/var/lib/pacemaker/conf.d/c8kubermaster2.xml hypervisor=qemu:///system migrate_options=--unsafe migration_transport=ssh
Meta Attrs: allow-migrate=true failure-timeout=30s
Operations: migrate_from interval=0s timeout=90s (c8kubermaster2-migrate_from-interval-0s)
migrate_to interval=0s timeout=90s (c8kubermaster2-migrate_to-interval-0s)
monitor interval=10s timeout=30s (c8kubermaster2-monitor-interval-10s)
start interval=0s timeout=60s (c8kubermaster2-start-interval-0s)
stop interval=0s timeout=60s (c8kubermaster2-stop-interval-0s)
swir.direct:/VMs3 on /VMs3 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
Thank you for reporting this issue. After looking into it in more detail, I'm pretty sure that I know what is causing this. There is a bug in the new implementation of `pcs resource move` introduced in pcs-0.11 (see pcs man page, section changes in pcs-0.11, for more details) which in some cases will not move the resource. However, the old implementation of the move command is still available as `pcs resource move-with-constraint` which can be used a a workaround for now. Another option is to run `pcs resource clear <resource> <node>` just before `pcs resource move`. Yes, but really the issue I care about reporting this BZ is LIVE migration/move of VirtualDomain - even if it's not really a BUG on PCS part - and possible ways for PCS to fix/improve that. With Qemu/Libvirt versions still with 'libgfapi' support LIVE migration works smoothly but with new version where 'libgfapi' is removed only way is to fuse-mount GlusterFS volumes, it's broken, LIVE move fails-over to shutdown/start - which is, well, what it is. from log: ... internal error: unable to execute QEMU command 'cont': Failed to get "write" lock ... thanks, L. |