Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1154351

Summary: VMs not migrating following move host to maintenance.
Product: Red Hat Enterprise Linux 7 Reporter: Ilanit Stein <istein>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Ilanit Stein <istein>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.0CC: ajia, dyuan, ecohen, fromani, gklein, iheim, istein, lpeer, lsurette, michal.skrivanek, ofrenkel, rbalakri, Rhev-m-bugs, shyu, yeylon, zpeng
Target Milestone: rcKeywords: AutomationBlocker, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1151953 Environment:
Last Closed: 2014-11-10 13:07:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1151953    
Bug Blocks:    
Attachments:
Description Flags
test logs none

Description Ilanit Stein 2014-10-19 07:07:04 UTC
+++ This bug was initially created as a clone of Bug #1151953 +++

Description of problem:
Migration automatic test (run vi rest api) fail on migration of vms, following move host to maintenance.

Test link:
http://jenkins.qa.lab.tlv.redhat.com:8080/view/Compute/view/3.5-git/view/Dashboard/job/3.5-git-compute-virt-vm_migration-iscsi/29/

Test flow:
Have 5 vms, installed with rhel+guest agent
Move host to maintenance.

Version-Release number of selected component (if applicable):
engine: vt5 
hosts: rhel-7, vdsm-4.16.6-1.el7.x86_64

How reproducible:
Failed for several consecutive runs on vt5.

Expected results:
vms should succeed migration, and host should move to maintenance.

Additional info:
Same test PASSED on: 
engine: vt4
hosts: vdsm-4.16.5-2.el7.x86_64

--- Additional comment from Ilanit Stein on 2014-10-13 03:46:29 EDT ---



--- Additional comment from Francesco Romani on 2014-10-14 08:46:45 EDT ---

Hi,

I scanned the logs and found these migrations failed, which most likely caused the issue:

Thread-1322::WARNING::2014-10-13 03:02:02,352::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
Thread-1309::WARNING::2014-10-13 03:02:02,352::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
Thread-1322::CRITICAL::2014-10-13 03:02:02,352::libvirtconnection::137::root::(wrapper) taking calling process down.
Thread-1309::CRITICAL::2014-10-13 03:02:02,352::libvirtconnection::137::root::(wrapper) taking calling process down.
Thread-1322::DEBUG::2014-10-13 03:02:02,353::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
Thread-1309::DEBUG::2014-10-13 03:02:02,353::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error: client socket is closed
Thread-1322::ERROR::2014-10-13 03:02:02,353::migration::259::vm.Vm::(run) vmId=`3b32c9f3-b02f-4497-b176-664a71eeab19`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 245, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error: client socket is closed
Thread-1309::ERROR::2014-10-13 03:02:02,354::migration::259::vm.Vm::(run) vmId=`db2e563c-f0fe-43d7-9b35-0913d5e7e693`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 245, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3642, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error: client socket is closed
VM Channels Listener::DEBUG::2014-10-13 03:02:02,425::vmchannels::133::vds::(_handle_unconnected) Trying to connect fileno 49.


which in turn seems to be caused by a libvirtd crash:

Oct 13 03:02:01 puma20 systemd-machined: Machine qemu-virt-vm-3 terminated.
Oct 13 03:02:01 puma20 libvirtd: Caught Segmentation violation dumping internal log buffer:
Oct 13 03:02:01 puma20 libvirtd: ====== start of log =====
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=8 w=9, f=16 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=9 w=10, f=18 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=10 w=15, f=22 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=11 w=16, f=23 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=12 w=25, f=28 e=25 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=13 w=26, f=27 e=25 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=14 w=36, f=25 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollMakePollFDs:391 : Prepare n=15 w=37, f=29 e=1 d=0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 5 timers
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1413183726235
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:340 : Got a timeout scheduled for 1413183725078
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virFileMakePathHelper:2201 : path=/var/run/libvirt/qemu mode=0777
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:353 : Schedule timeout then=1413183725078 now=1413183721486
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollCalculateTimeout:361 : Timeout at 1413183725078 due in 3592 ms
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1520: debug : virEventPollRunOnce:627 : EVENT_POLL_RUN: nhandles=15 timeout=3592
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virFileClose:90 : Closed fd 24
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f0800cba0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.486+0000: 1526: debug : virConnectIsAlive:21373 : conn=0x7f8f08012c50
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f0800cba0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuDomainObjBeginJobInternal:1038 : Starting job: async nested (async=migration out)
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectUnref:256 : OBJECT_UNREF: obj=0x7f8f1c162680
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virObjectRef:293 : OBJECT_REF: obj=0x7f8f080129f0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorGetMigrationStatus:2089 : mon=0x7f8f080129f0
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToString:1100 : object=0x7f8f08006120
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f08006120 type=0 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f0800c2b0 type=2 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToStringOne:1031 : object=0x7f8f08005320 type=2 gen=0x7f8f0800b840
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virJSONValueToString:1133 : result={"execute":"query-migrate","id":"libvirt-391"}
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorJSONCommandWithFd:264 : Send command '{"execute":"query-migrate","id":"libvirt-391"}' for write with FD -1
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virEventPollUpdateHandle:147 : EVENT_POLL_UPDATE_HANDLE: watch=25 events=15
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : virEventPollInterruptLocked:714 : Interrupting
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1526: debug : qemuMonitorSend:955 : QEMU_MONITOR_SEND_MSG: mon=0x7f8f080129f0 msg={"execute":"query-migrate","id":"libvirt-391"}
Oct 13 03:02:01 puma20 libvirtd: fd=-1
Oct 13 03:02:01 puma20 libvirtd: 2014-10-13 07:02:01.536+0000: 1520: debug : virEventPollRunOnce:638 : Poll got 1 event(s)
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root WARNING connection to libvirt broken. ecode: 1 edom: 7
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 journal: vdsm root CRITICAL taking calling process down.
Oct 13 03:02:01 puma20 systemd: libvirtd.service: main process exited, code=killed, status=11/SEGV
Oct 13 03:02:01 puma20 systemd: Unit libvirtd.service entered failed state.

To be honest, I don't know if this is the only cause of misbehaviour, but for sure it is one cause and quite big, so we need to move the bug against libvirt and block this one on it.

--- Additional comment from Francesco Romani on 2014-10-15 03:12:11 EDT ---

same as https://bugzilla.redhat.com/show_bug.cgi?id=1083485 apparently.

Either we manage to fetch a coredump or we have to close this as well since it seems  there is nothing we can do

Comment 1 Ilanit Stein 2014-10-19 07:10:17 UTC
Created attachment 948203 [details]
test logs

Comment 4 Michal Skrivanek 2014-11-03 13:52:54 UTC
parent bug has been closed as we were unable to get a core dump and the bug stopped reproducing

Comment 5 Jiri Denemark 2014-11-10 13:07:30 UTC
Unfortunately, there's not much we can do without a reproducer or backtrace. I looked at the migration code around the places mentioned in the log generated by SIGSEGV handler but I didn't spot anything suspicious. That said, I'm closing this bug to match the original vdsm bug. Please, file a new bug with a backtrace attached in case this issue reappears.