Bug 2037218
Summary: | VirtualDomain move fails | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | lejeczek <peljasz> |
Component: | pcs | Assignee: | Ondrej Mular <omular> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | cluster-qe <cluster-qe> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 9.0 | CC: | agk, bstinson, cluster-maint, fdinitto, idevat, jwboyer, mlisik, mmazoure, mpospisi, mprivozn, omular, tojeline |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 9.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-06-08 11:45:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
lejeczek
2022-01-05 09:39:33 UTC
Just in case I left it a bit vague - it's about live move/migration which is broken - what is still working in prev versions in CentOS 8 Stream Can you find the exact error reported in the libvirtd log? That might shed more light into why libvirt is denying migration. I'm looking at something else very strange, I see: -> $ pcs constraint config | lesi ... Resource: c8kubermaster2 Enabled on: Node: whale (score:INFINITY) and even though I do: 'clear' & 'cleanup' that constrain remains there, until I deleted the resource & re-create, now I can 'move' the resource again, albeit not! as 'live' migration. Also 'setenforce' seems to make no difference.(unless some silent denials do) In new C9 there is a number of vir* services which replace libvirtd.service - looking at virtqemud.service I see: .. 2022-01-05 16:58:16.399+0000: 644190: warning : virSecurityValidateTimestamp:206 : Invalid XATTR timestamp detected on /VMs3/c8kubermaster2.qcow2 secdriver=dac internal error: unable to execute QEMU command 'cont': Failed to get "write" lock 'locking' problem affect other bits outside of PCS, backups, snapshots of VMs, now with only-via-fuse method. (unless there is some way to fuse-mount glusterFS with does the trick) thanks, L. -> $ pcs resource config c8kubermaster2 Resource: c8kubermaster2 (class=ocf provider=heartbeat type=VirtualDomain) Attributes: config=/var/lib/pacemaker/conf.d/c8kubermaster2.xml hypervisor=qemu:///system migrate_options=--unsafe migration_transport=ssh Meta Attrs: allow-migrate=true failure-timeout=30s Operations: migrate_from interval=0s timeout=90s (c8kubermaster2-migrate_from-interval-0s) migrate_to interval=0s timeout=90s (c8kubermaster2-migrate_to-interval-0s) monitor interval=10s timeout=30s (c8kubermaster2-monitor-interval-10s) start interval=0s timeout=60s (c8kubermaster2-start-interval-0s) stop interval=0s timeout=60s (c8kubermaster2-stop-interval-0s) swir.direct:/VMs3 on /VMs3 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072) Thank you for reporting this issue. After looking into it in more detail, I'm pretty sure that I know what is causing this. There is a bug in the new implementation of `pcs resource move` introduced in pcs-0.11 (see pcs man page, section changes in pcs-0.11, for more details) which in some cases will not move the resource. However, the old implementation of the move command is still available as `pcs resource move-with-constraint` which can be used a a workaround for now. Another option is to run `pcs resource clear <resource> <node>` just before `pcs resource move`. Yes, but really the issue I care about reporting this BZ is LIVE migration/move of VirtualDomain - even if it's not really a BUG on PCS part - and possible ways for PCS to fix/improve that. With Qemu/Libvirt versions still with 'libgfapi' support LIVE migration works smoothly but with new version where 'libgfapi' is removed only way is to fuse-mount GlusterFS volumes, it's broken, LIVE move fails-over to shutdown/start - which is, well, what it is. from log: ... internal error: unable to execute QEMU command 'cont': Failed to get "write" lock ... thanks, L. |