RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1154351 - VMs not migrating following move host to maintenance.
Summary: VMs not migrating following move host to maintenance.
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
Depends On: 1151953
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-19 07:07 UTC by Ilanit Stein
Modified: 2014-11-10 13:07 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1151953
Environment:
Last Closed: 2014-11-10 13:07:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test logs (1.22 MB, application/x-bzip)
2014-10-19 07:10 UTC, Ilanit Stein
no flags Details

Description Ilanit Stein 2014-10-19 07:07:04 UTC
+++ This bug was initially created as a clone of Bug #1151953 +++

Description of problem:
Migration automatic test (run vi rest api) fail on migration of vms, following move host to maintenance.

Test link:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.5-git/view/Dashboard/job/3.5-git-compute-virt-vm_migration-iscsi/29/

Test flow:
Have 5 vms, installed with rhel+guest agent
Move host to maintenance.

Version-Release number of selected component (if applicable):
engine: vt5 
hosts: rhel-7, vdsm-4.16.6-1.el7.x86_64

How reproducible:
Failed for several consecutive runs on vt5.

Expected results:
vms should succeed migration, and host should move to maintenance.

Additional info:
Same test PASSED on: 
engine: vt4
hosts: vdsm-4.16.5-2.el7.x86_64

--- Additional comment from Ilanit Stein on 2014-10-13 03:46:29 EDT ---



--- Additional comment from Francesco Romani on 2014-10-14 08:46:45 EDT ---

Hi,

I scanned the logs and found these migrations failed, which most likely caused the issue:

Thread-1322::WARNING::2014-10-13 03:02:02,352::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
Thread-1309::WARNING::2014-10-13 03:02:02,352::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
Thread-1322::CRITICAL::2014-10-13 03:02:02,352::libvirtconnection::137::root::(wrapper) taking calling process down.
Thread-1309::CRITICAL::2014-10-13 03:02:02,352::libvirtconnection::137::root::(wrapper) taking calling process down.
Thread-1322::DEBUG::2014-10-13 03:02:02,353::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
Thread-1309::DEBUG::2014-10-13 03:02:02,353::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
Thread-1322::ERROR::2014-10-13 03:02:02,353::migration::259::vm.Vm::(run) vmId=`3b32c9f3-b02f-4497-b176-664a71eeab19`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 245, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error: client socket is closed
Thread-1309::ERROR::2014-10-13 03:02:02,354::migration::259::vm.Vm::(run) vmId=`db2e563c-f0fe-43d7-9b35-0913d5e7e693`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 245, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error: client socket is closed
VM Channels Listener::DEBUG::2014-10-13 03:02:02,425::vmchannels::133::vds::(_handle_unconnected) Trying to connect fileno 49.


which in turn seems to be caused by a libvirtd crash:

Oct 13 03:02:01 puma20 systemd-machined: Machine qemu-virt-vm-3 terminated.
Oct 13 03:02:01 puma20 libvirtd: Caught Segmentation violation dumping internal log buffer:
Oct 13 03:02:01 puma20 libvirtd: ====== start of log =====
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=8 w=9, f=16 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=9 w=10, f=18 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=10 w=15, f=22 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=11 w=16, f=23 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=12 w=25, f=28 e=25 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=13 w=26, f=27 e=25 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=14 w=36, f=25 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=15 w=37, f=29 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 5 timers
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1413183726235
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1413183725078
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virFileMakePathHelper:2201 : path=/var/run/libvirt/qemu mode=0777
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:353 : Schedule timeout then=1413183725078 now=1413183721486
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:361 : Timeout at 1413183725078 due in 3592 ms
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollRunOnce:627 : EVENT_POLL_RUN: nhandles=15 timeout=3592
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virFileClose:90 : Closed fd 24
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f0800cba0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virConnectIsAlive:21373 : conn=0x7f8f08012c50
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f0800cba0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuDomainObjBeginJobInternal:1038 : Starting job: async nested (async=migration out)
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f080129f0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorGetMigrationStatus:2089 : mon=0x7f8f080129f0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToString:1100 : object=0x7f8f08006120
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f08006120 type=0 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f0800c2b0 type=2 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f08005320 type=2 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToString:1133 : result={"execute":"query-migrate","id":"libvirt-391"}
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorJSONCommandWithFd:264 : Send command '{"execute":"query-migrate","id":"libvirt-391"}' for write with FD -1
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=25 events=15
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virEventPollInterruptLocked:714 : Interrupting
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorSend:955 : QEMU_MONITOR_SEND_MSG: mon=0x7f8f080129f0 msg={"execute":"query-migrate","id":"libvirt-391"}
Oct 13 03:02:01 puma20 libvirtd: fd=-1
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1520: debug : virEventPollRunOnce:638 : Poll got 1 event(s)
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 systemd: libvirtd.service: main process exited, code=killed, status=11/SEGV
Oct 13 03:02:01 puma20 systemd: Unit libvirtd.service entered failed state.

To be honest, I don't know if this is the only cause of misbehaviour, but for sure it is one cause and quite big, so we need to move the bug against libvirt and block this one on it.

--- Additional comment from Francesco Romani on 2014-10-15 03:12:11 EDT ---

same as https://bugzilla.redhat.com/show_bug.cgi?id=1083485 apparently.

Either we manage to fetch a coredump or we have to close this as well since it seems  there is nothing we can do

Comment 1 Ilanit Stein 2014-10-19 07:10:17 UTC
Created attachment 948203 [details]
test logs

Comment 4 Michal Skrivanek 2014-11-03 13:52:54 UTC
parent bug has been closed as we were unable to get a core dump and the bug stopped reproducing

Comment 5 Jiri Denemark 2014-11-10 13:07:30 UTC
Unfortunately, there's not much we can do without a reproducer or backtrace. I looked at the migration code around the places mentioned in the log generated by SIGSEGV handler but I didn't spot anything suspicious. That said, I'm closing this bug to match the original vdsm bug. Please, file a new bug with a backtrace attached in case this issue reappears.


Note You need to log in before you can comment on or make changes to this bug.