Created attachment 928396 [details] GA log, vdsm.log Description of problem: GA stopped communicating after messing with vdsm on host. I stopped supervdsmd/vdsmd on a host and even the host was set Up later by SSH soft fencing (BZ1131545) I see GA stopped communicating. [root@localhost ~]# ps aux | grep ovirt ovirtag+ 966 0.2 2.2 533232 22548 ? Ssl 15:47 0:07 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py root 2807 0.0 0.0 112640 980 pts/0 R+ 16:48 0:00 grep --color=auto ovirt [root@localhost ~]# tail /var/log/ovirt-guest-agent/ovirt-guest-agent.log Dummy-1::DEBUG::2014-08-19 16:01:13,846::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None' Dummy-1::DEBUG::2014-08-19 16:01:13,870::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']] Dummy-1::DEBUG::2014-08-19 16:01:23,887::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None' Dummy-1::DEBUG::2014-08-19 16:01:23,911::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']] Dummy-1::DEBUG::2014-08-19 16:01:33,928::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None' Dummy-1::DEBUG::2014-08-19 16:01:33,953::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']] Dummy-1::DEBUG::2014-08-19 16:01:43,970::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None' Dummy-1::DEBUG::2014-08-19 16:01:43,995::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']] Dummy-1::DEBUG::2014-08-19 16:01:54,012::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None' Dummy-1::DEBUG::2014-08-19 16:01:54,036::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']] [root@localhost ~]# date Tue Aug 19 16:48:32 CEST 2014 the host was "tuned"... 2014-Aug-19, 16:02 Host dell-r210ii-04 is non responsive. and Up again... 2014-Aug-19, 16:03 Status of host dell-r210ii-04 was set to Up. I saw following 'WARNING' lines in vdsm.log... (just to see 'id') -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/f8262f1c-71d3-4b0a-b14d-3bb060d89c61.com.redhat.rhevm.vdsm,server,nowait Thread-13::WARNING::2014-08-19 16:03:01,350::vm::2020::vm.Vm::(buildConfDevices) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found clientIFinit::WARNING::2014-08-19 16:03:07,359::vm::2020::vm.Vm::(buildConfDevices) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found Version-Release number of selected component (if applicable): vdsm-4.16.1-6.gita4a4614.el6.x86_64 ovirt-guest-agent-common-1.0.10-1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. stop supervdsmd/vdsmd while having GA running inside RHEL7 VM 2. wait 3. (VM should be incorrectly displayed as 'Paused', wait till host is Up again and make status of the VM 'Up', like 'Run' icon) 4. wait Actual results: GA stops communicating after messing with vdsm but VM and GA is still running Expected results: GA should continue working Additional info: - did not try with RHEL6 (I was testing RHEL7 now)
restarting GA doesn't make GA data visible in vdsClient getVmStats... # vdsClient -s 0 getVmStats f8262f1c-71d3-4b0a-b14d-3bb060d89c61 f8262f1c-71d3-4b0a-b14d-3bb060d89c61 Status = Up displayInfo = [{'tlsPort': '5901', 'ipAddress': '10.34.63.223', 'port': '5900', 'type': 'spice'}] hash = -1084240403787653570 network = {} acpiEnable = true vmType = kvm displayIp = 10.34.63.223 disks = {} pid = 2937 monitorResponse = 0 timeOffset = 0 elapsedTime = 4340 displaySecurePort = 5901 displayPort = 5900 kvmEnable = true clientIp = displayType = qxl
The VM recovery failed: Thread-13::DEBUG::2014-08-19 15:34:23,100::vm::1357::vm.Vm::(blockDev) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unable to determine if the path '/rhev/data-center/00000002-0002-0002-0002-000000000038/f3298d58-7f41-4d71-8a06-560a69451ab0/images/8e03de77-e237-49ca-b8ac-46cf283e78b7/c6ca3708-8adb-4d7e-8f88-a81761539dcf' is a block device Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 1354, in blockDev self._blockDev = utils.isBlockDevice(self.path) File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 104, in isBlockDevice return stat.S_ISBLK(os.stat(path).st_mode) OSError: [Errno 2] No such file or directory: '/rhev/data-center/00000002-0002-0002-0002-000000000038/f3298d58-7f41-4d71-8a06-560a69451ab0/images/8e03de77-e237-49ca-b8ac-46cf283e78b7/c6ca3708-8adb-4d7e-8f88-a81761539dcf' Thread-13::INFO::2014-08-19 15:34:23,101::vm::2251::vm.Vm::(_startUnderlyingVm) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Skipping errors on recovery Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 2235, in _startUnderlyingVm self._run() File "/usr/share/vdsm/virt/vm.py", line 3298, in _run self._domDependentInit() File "/usr/share/vdsm/virt/vm.py", line 3189, in _domDependentInit self._syncVolumeChain(drive) File "/usr/share/vdsm/virt/vm.py", line 5660, in _syncVolumeChain volumes = self._driveGetActualVolumeChain(drive) File "/usr/share/vdsm/virt/vm.py", line 5639, in _driveGetActualVolumeChain sourceAttr = ('file', 'dev')[drive.blockDev] TypeError: tuple indices must be integers, not NoneType
*** This bug has been marked as a duplicate of bug 1131548 ***