Bug 1131567 - [ovirt-guest-agent] [RHEL7] GA stopped communicating after messing with vdsm on host
Summary: [ovirt-guest-agent] [RHEL7] GA stopped communicating after messing with vdsm ...
Keywords:
Status: CLOSED DUPLICATE of bug 1131548
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Vinzenz Feenstra [evilissimo]
QA Contact: Gil Klein
URL:
Whiteboard: virt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-19 14:57 UTC by Jiri Belka
Modified: 2014-08-20 12:35 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-08-20 12:35:09 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
GA log, vdsm.log (860.00 KB, application/x-tar)
2014-08-19 14:57 UTC, Jiri Belka
no flags Details

Description Jiri Belka 2014-08-19 14:57:31 UTC
Created attachment 928396 [details]
GA log, vdsm.log

Description of problem:
GA stopped communicating after messing with vdsm on host. I stopped supervdsmd/vdsmd on a host and even the host was set Up later by SSH soft fencing (BZ1131545) I see GA stopped communicating.

[root@localhost ~]# ps aux | grep ovirt
ovirtag+    966  0.2  2.2 533232 22548 ?        Ssl  15:47   0:07 /usr/bin/python /usr/share/ovirt-guest-agent/ovirt-guest-agent.py
root       2807  0.0  0.0 112640   980 pts/0    R+   16:48   0:00 grep --color=auto ovirt
[root@localhost ~]# tail /var/log/ovirt-guest-agent/ovirt-guest-agent.log 
Dummy-1::DEBUG::2014-08-19 16:01:13,846::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None'
Dummy-1::DEBUG::2014-08-19 16:01:13,870::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']]
Dummy-1::DEBUG::2014-08-19 16:01:23,887::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None'
Dummy-1::DEBUG::2014-08-19 16:01:23,911::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']]
Dummy-1::DEBUG::2014-08-19 16:01:33,928::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None'
Dummy-1::DEBUG::2014-08-19 16:01:33,953::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']]
Dummy-1::DEBUG::2014-08-19 16:01:43,970::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None'
Dummy-1::DEBUG::2014-08-19 16:01:43,995::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']]
Dummy-1::DEBUG::2014-08-19 16:01:54,012::OVirtAgentLogic::334::root::AgentLogicBase::sendUserInfo - cur_user = 'None'
Dummy-1::DEBUG::2014-08-19 16:01:54,036::GuestAgentLinux2::81::root::PkgMgr: list_pkgs returns [['kernel-3.10.0-123.6.3.el7', 'kernel-3.10.0-123.el7', 'ovirt-guest-agent-common-1.0.10-1.el7', 'ksh-20120801-19.el7']]
[root@localhost ~]# date
Tue Aug 19 16:48:32 CEST 2014

the host was "tuned"...

2014-Aug-19, 16:02
Host dell-r210ii-04 is non responsive.

and Up again...

2014-Aug-19, 16:03
Status of host dell-r210ii-04 was set to Up.

I saw following 'WARNING' lines in vdsm.log...

(just to see 'id')
-chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/f8262f1c-71d3-4b0a-b14d-3bb060d89c61.com.redhat.rhevm.vdsm,server,nowait

Thread-13::WARNING::2014-08-19 16:03:01,350::vm::2020::vm.Vm::(buildConfDevices) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found
clientIFinit::WARNING::2014-08-19 16:03:07,359::vm::2020::vm.Vm::(buildConfDevices) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found


Version-Release number of selected component (if applicable):
vdsm-4.16.1-6.gita4a4614.el6.x86_64
ovirt-guest-agent-common-1.0.10-1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. stop supervdsmd/vdsmd while having GA running inside RHEL7 VM
2. wait
3. (VM should be incorrectly displayed as 'Paused', wait till host is Up again and make status of the VM 'Up', like 'Run' icon)
4. wait

Actual results:
GA stops communicating after messing with vdsm but VM and GA is still running

Expected results:
GA should continue working

Additional info:
- did not try with RHEL6 (I was testing RHEL7 now)

Comment 1 Jiri Belka 2014-08-19 14:59:29 UTC
restarting GA doesn't make GA data visible in vdsClient getVmStats...

# vdsClient -s 0 getVmStats f8262f1c-71d3-4b0a-b14d-3bb060d89c61

f8262f1c-71d3-4b0a-b14d-3bb060d89c61
        Status = Up
        displayInfo = [{'tlsPort': '5901', 'ipAddress': '10.34.63.223', 'port': '5900', 'type': 'spice'}]
        hash = -1084240403787653570
        network = {}
        acpiEnable = true
        vmType = kvm
        displayIp = 10.34.63.223
        disks = {}
        pid = 2937
        monitorResponse = 0
        timeOffset = 0
        elapsedTime = 4340
        displaySecurePort = 5901
        displayPort = 5900
        kvmEnable = true
        clientIp = 
        displayType = qxl

Comment 2 Vinzenz Feenstra [evilissimo] 2014-08-20 12:17:54 UTC
The VM recovery failed:

Thread-13::DEBUG::2014-08-19 15:34:23,100::vm::1357::vm.Vm::(blockDev) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Unable to determine if the path '/rhev/data-center/00000002-0002-0002-0002-000000000038/f3298d58-7f41-4d71-8a06-560a69451ab0/images/8e03de77-e237-49ca-b8ac-46cf283e78b7/c6ca3708-8adb-4d7e-8f88-a81761539dcf' is a block device
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 1354, in blockDev
    self._blockDev = utils.isBlockDevice(self.path)
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 104, in isBlockDevice
    return stat.S_ISBLK(os.stat(path).st_mode)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/00000002-0002-0002-0002-000000000038/f3298d58-7f41-4d71-8a06-560a69451ab0/images/8e03de77-e237-49ca-b8ac-46cf283e78b7/c6ca3708-8adb-4d7e-8f88-a81761539dcf'
Thread-13::INFO::2014-08-19 15:34:23,101::vm::2251::vm.Vm::(_startUnderlyingVm) vmId=`f8262f1c-71d3-4b0a-b14d-3bb060d89c61`::Skipping errors on recovery
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2235, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3298, in _run
    self._domDependentInit()
  File "/usr/share/vdsm/virt/vm.py", line 3189, in _domDependentInit
    self._syncVolumeChain(drive)
  File "/usr/share/vdsm/virt/vm.py", line 5660, in _syncVolumeChain
    volumes = self._driveGetActualVolumeChain(drive)
  File "/usr/share/vdsm/virt/vm.py", line 5639, in _driveGetActualVolumeChain
    sourceAttr = ('file', 'dev')[drive.blockDev]
TypeError: tuple indices must be integers, not NoneType

Comment 3 Vinzenz Feenstra [evilissimo] 2014-08-20 12:35:09 UTC

*** This bug has been marked as a duplicate of bug 1131548 ***


Note You need to log in before you can comment on or make changes to this bug.