Description of problem: A VM which is moved to a ovirt-node-iso based host is going down after a short period of time. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 827209 [details] ovirt_engine_3_3_1-2_log failed to start vm on local dc this is the engine log where you can see that test4vm fails to start
Hello, > 2013-11-21 15:34:31,015 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6- > thread-49) [6af2b540] Correlation ID: 6af2b540, Job ID: 6b4fd772-49fe-435f-b1a9-> 5eb6f0a9b41b, Call Stack: null, Custom Event ID: -1, Message: VM test4vm was > > started by admin@internal (Host: test4). > 2013-11-21 15:34:31,775 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] > (DefaultQuartzScheduler_Worker-53) Command GetCapabilitiesVDS execution failed. > Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed > out > 2013-11-21 15:34:31,776 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] > (DefaultQuartzScheduler_Worker-53) Failed to refresh VDS , vds = 9b4bbf22-0655 >-4b0a-b792-3cdca91d5b6b : test3, VDS Network Error, continuing. > java.net.SocketTimeoutException: connect timed out > 2013-11-21 15:34:32,952 ERROR > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > (DefaultQuartzScheduler_Worker-52) IrsBroker::Failed::GetStoragePoolInfoVDS due > to: SocketTimeoutException: connect timed out I can see some connection 'timed out' on log. On oVirt Node can you please check if VDSM is running or is there any firewall/network problem? (Press F2 on oVirt Node to go to console) The command: vdsClient -s 0 getVdsCaps returns? Thanks!
Output from the ovirt-compute node: vdsClient -s 0 getVdsCaps HBAInventory = {'FC': [], 'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:f2531eeaefb4'}]} ISCSIInitiatorName = 'iqn.1994-05.com.redhat:f2531eeaefb4' bondings = {} bridges = {'ovirtmgmt': {'addr': '10.0.1.4', 'cfg': {'BOOTPROTO': 'dhcp', 'DEFROUTE': 'yes', 'DELAY': '0', 'DEVICE': 'ovirtmgmt', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'PEERNTP': 'yes', 'TYPE': 'Bridge'}, 'gateway': '10.0.1.11', 'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'], 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '255.255.255.0', 'ports': ['enp4s0'], 'stp': 'off'}} clusterLevels = ['3.0', '3.1', '3.2', '3.3'] cpuCores = '4' cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,dtherm,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_n270' cpuModel = 'Intel(R) Xeon(R) CPU X3430 @ 2.40GHz' cpuSockets = '1' cpuSpeed = '2400.075' cpuThreads = '4' emulatedMachines = ['pc', 'pc-q35-1.4', 'pc-q35-1.5', 'q35', 'isapc', 'pc-0.10', 'pc-0.11', 'pc-0.12', 'pc-0.13', 'pc-0.14', 'pc-0.15', 'pc-1.0', 'pc-1.1', 'pc-1.2', 'pc-1.3', 'pc-i440fx-1.4', 'pc-i440fx-1.5', 'none'] guestOverhead = '65' hooks = {'after_vm_destroy': {'50_vhostmd': {'md5': '92498ed80c0219749829ecce2813fc7c'}}, 'before_vm_dehibernate': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}}, 'before_vm_migrate_destination': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}}, 'before_vm_start': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}}} kvmEnabled = 'true' lastClient = '10.0.1.11' lastClientIface = 'ovirtmgmt' management_ip = '0.0.0.0' memSize = '5833' netConfigDirty = 'False' networks = {'ovirtmgmt': {'addr': '10.0.1.4', 'bridged': True, 'cfg': {'BOOTPROTO': 'dhcp', 'DEFROUTE': 'yes', 'DELAY': '0', 'DEVICE': 'ovirtmgmt', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'PEERNTP': 'yes', 'TYPE': 'Bridge'}, 'gateway': '10.0.1.11', 'iface': 'ovirtmgmt', 'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'], 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '255.255.255.0', 'ports': ['enp4s0'], 'qosInbound': '', 'qosOutbound': '', 'stp': 'off'}} nics = {'enp3s0': {'addr': '', 'cfg': {}, 'hwaddr': '00:25:90:32:e4:88', 'ipv6addrs': [], 'mtu': '1500', 'netmask': '', 'speed': 0}, 'enp4s0': {'addr': '', 'cfg': {'BRIDGE': 'ovirtmgmt', 'DEVICE': 'enp4s0', 'HWADDR': '00:25:90:32:e4:89', 'NM_CONTROLLED': 'no', 'ONBOOT': 'yes', 'PEERNTP': 'yes'}, 'hwaddr': '00:25:90:32:e4:89', 'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'], 'mtu': '1500', 'netmask': '', 'speed': 1000}, 'enp5s0': {'addr': '', 'cfg': {}, 'hwaddr': '00:25:90:32:e4:8a', 'ipv6addrs': [], 'mtu': '1500', 'netmask': '', 'speed': 0}, 'enp6s0': {'addr': '', 'cfg': {}, 'hwaddr': '00:25:90:32:e4:8b', 'ipv6addrs': [], 'mtu': '1500', 'netmask': '', 'speed': 0}} operatingSystem = {'name': 'oVirt Node', 'release': '1.0.0.fc19', 'version': '3.0.2'} packages2 = {'glusterfs': {'buildtime': 1380287454, 'release': '1.fc19', 'version': '3.4.1'}, 'glusterfs-fuse': {'buildtime': 1380287454, 'release': '1.fc19', 'version': '3.4.1'}, 'glusterfs-rdma': {'buildtime': 1380287454, 'release': '1.fc19', 'version': '3.4.1'}, 'glusterfs-server': {'buildtime': 1380287454, 'release': '1.fc19', 'version': '3.4.1'}, 'kernel': {'buildtime': 1383574143.0, 'release': '200.fc19.x86_64', 'version': '3.11.7'}, 'libvirt': {'buildtime': 1384763408, 'release': '2.fc19', 'version': '1.1.3.1'}, 'mom': {'buildtime': 1375215820, 'release': '3.fc19', 'version': '0.3.2'}, 'qemu-img': {'buildtime': 1383725921, 'release': '1.fc19', 'version': '1.6.1'}, 'qemu-kvm': {'buildtime': 1383725921, 'release': '1.fc19', 'version': '1.6.1'}, 'spice-server': {'buildtime': 1383130020, 'release': '3.fc19', 'version': '0.12.4'}, 'vdsm': {'buildtime': 1384274283, 'release': '11.fc19', 'version': '4.13.0'}} reservedMem = '321' software_revision = '11' software_version = '4.13' supportedENGINEs = ['3.0', '3.1', '3.2', '3.3'] supportedProtocols = ['2.2', '2.3'] uuid = '49434D53-0200-9032-2500-329025008BE4' version_name = 'Snow Man' vlans = {} vmTypes = ['kvm']
Please re-run the whole thing, and before registering to Engine pelase run # setenforce 0 on the shell, after registration and launching a VM please attach /var/log/audit/audit.log
Douglas, could it be that these selinux problems exist, because of the install time selinux problems we saw with vdsm? (Haven't got a link right now)
Created attachment 827738 [details] audit.log
Hey Fabian, (In reply to Fabian Deutsch from comment #4) > Please re-run the whole thing, and before registering to Engine pelase run > # setenforce 0 > on the shell, after registration and launching a VM please attach > /var/log/audit/audit.log I have attached the audit.log. As we talked, setting selinux to permissive on ovirt-node make the vm starts. Otherwise I see the error like on engine logs "Input/output error": 2013-11-22 07:47:42,084 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-86) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM test is down. Exit message: Child quit during startup handshake: Input/output error. The vdsm patch that we discussed is: http://gerrit.ovirt.org/#/c/21432/ but I don't think it's related.
$ audit2allow -i ~/Downloads/audit.log #============= collectd_t ============== allow collectd_t virtd_t:unix_stream_socket connectto; #============= initrc_t ============== allow initrc_t sshd_net_t:process dyntransition; allow initrc_t unconfined_t:process dyntransition; #============= local_login_t ============== allow local_login_t var_log_t:file open; #============= logrotate_t ============== allow logrotate_t virt_cache_t:dir read; #============= svirt_t ============== allow svirt_t initrc_t:unix_stream_socket connectto; #============= tuned_t ============== allow tuned_t ovirt_t:dbus send_msg; A proposed fix is here: http://gerrit.ovirt.org/#/c/21540/