Bug 1033064

Summary: VM can not be started on ovirt-node
Product: [Retired] oVirt Reporter: Fabian Deutsch <fdeutsch>
Component: ovirt-nodeAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.3CC: acathrow, bazulay, danken, dougsland, jboggs, mburns, mgoldboi, mtayer, ovirt-bugs, ovirt-maint, s.kieske, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-node-3.0.3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-28 12:07:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1033287    
Attachments:
Description Flags
ovirt_engine_3_3_1-2_log failed to start vm on local dc
none
audit.log none

Description Fabian Deutsch 2013-11-21 13:54:35 UTC
Description of problem:
A VM which is moved to a ovirt-node-iso based host is going down after a short period of time.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sven Kieske 2013-11-21 13:57:56 UTC
Created attachment 827209 [details]
ovirt_engine_3_3_1-2_log failed to start vm on local dc

this is the engine log where you can see that test4vm fails to start

Comment 2 Douglas Schilling Landgraf 2013-11-21 14:12:53 UTC
Hello,

> 2013-11-21 15:34:31,015 INFO  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-
> thread-49) [6af2b540] Correlation ID: 6af2b540, Job ID: 6b4fd772-49fe-435f-b1a9-> 5eb6f0a9b41b, Call Stack: null, Custom Event ID: -1, Message: VM test4vm was > 
> started by admin@internal (Host: test4).
> 2013-11-21 15:34:31,775 ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] 
> (DefaultQuartzScheduler_Worker-53) Command GetCapabilitiesVDS execution failed. > Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed 
> out
> 2013-11-21 15:34:31,776 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager]
> (DefaultQuartzScheduler_Worker-53) Failed to refresh VDS , vds = 9b4bbf22-0655
>-4b0a-b792-3cdca91d5b6b : test3, VDS Network Error, continuing.
> java.net.SocketTimeoutException: connect timed out
> 2013-11-21 15:34:32,952 ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (DefaultQuartzScheduler_Worker-52) IrsBroker::Failed::GetStoragePoolInfoVDS due > to: SocketTimeoutException: connect timed out

I can see some connection 'timed out' on log. On oVirt Node can you please check if VDSM is running or is there any firewall/network problem? (Press F2 on oVirt Node to go to console)

The command: vdsClient -s 0 getVdsCaps returns?

Thanks!

Comment 3 Sven Kieske 2013-11-21 14:17:05 UTC
Output from the ovirt-compute node:

vdsClient -s 0 getVdsCaps
	HBAInventory = {'FC': [], 'iSCSI': [{'InitiatorName': 'iqn.1994-05.com.redhat:f2531eeaefb4'}]}
	ISCSIInitiatorName = 'iqn.1994-05.com.redhat:f2531eeaefb4'
	bondings = {}
	bridges = {'ovirtmgmt': {'addr': '10.0.1.4',
	                         'cfg': {'BOOTPROTO': 'dhcp',
	                                 'DEFROUTE': 'yes',
	                                 'DELAY': '0',
	                                 'DEVICE': 'ovirtmgmt',
	                                 'NM_CONTROLLED': 'no',
	                                 'ONBOOT': 'yes',
	                                 'PEERNTP': 'yes',
	                                 'TYPE': 'Bridge'},
	                         'gateway': '10.0.1.11',
	                         'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'],
	                         'ipv6gateway': '::',
	                         'mtu': '1500',
	                         'netmask': '255.255.255.0',
	                         'ports': ['enp4s0'],
	                         'stp': 'off'}}
	clusterLevels = ['3.0', '3.1', '3.2', '3.3']
	cpuCores = '4'
	cpuFlags = 'fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,ht,tm,pbe,syscall,nx,rdtscp,lm,constant_tsc,arch_perfmon,pebs,bts,rep_good,nopl,xtopology,nonstop_tsc,aperfmperf,pni,dtes64,monitor,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,sse4_1,sse4_2,popcnt,lahf_lm,dtherm,tpr_shadow,vnmi,flexpriority,ept,vpid,model_Nehalem,model_Conroe,model_coreduo,model_core2duo,model_Penryn,model_n270'
	cpuModel = 'Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz'
	cpuSockets = '1'
	cpuSpeed = '2400.075'
	cpuThreads = '4'
	emulatedMachines = ['pc',
	                    'pc-q35-1.4',
	                    'pc-q35-1.5',
	                    'q35',
	                    'isapc',
	                    'pc-0.10',
	                    'pc-0.11',
	                    'pc-0.12',
	                    'pc-0.13',
	                    'pc-0.14',
	                    'pc-0.15',
	                    'pc-1.0',
	                    'pc-1.1',
	                    'pc-1.2',
	                    'pc-1.3',
	                    'pc-i440fx-1.4',
	                    'pc-i440fx-1.5',
	                    'none']
	guestOverhead = '65'
	hooks = {'after_vm_destroy': {'50_vhostmd': {'md5': '92498ed80c0219749829ecce2813fc7c'}},
	         'before_vm_dehibernate': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}},
	         'before_vm_migrate_destination': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}},
	         'before_vm_start': {'50_vhostmd': {'md5': 'f3ee6dbf6fbd01333bd3e32afec4fbba'}}}
	kvmEnabled = 'true'
	lastClient = '10.0.1.11'
	lastClientIface = 'ovirtmgmt'
	management_ip = '0.0.0.0'
	memSize = '5833'
	netConfigDirty = 'False'
	networks = {'ovirtmgmt': {'addr': '10.0.1.4',
	                          'bridged': True,
	                          'cfg': {'BOOTPROTO': 'dhcp',
	                                  'DEFROUTE': 'yes',
	                                  'DELAY': '0',
	                                  'DEVICE': 'ovirtmgmt',
	                                  'NM_CONTROLLED': 'no',
	                                  'ONBOOT': 'yes',
	                                  'PEERNTP': 'yes',
	                                  'TYPE': 'Bridge'},
	                          'gateway': '10.0.1.11',
	                          'iface': 'ovirtmgmt',
	                          'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'],
	                          'ipv6gateway': '::',
	                          'mtu': '1500',
	                          'netmask': '255.255.255.0',
	                          'ports': ['enp4s0'],
	                          'qosInbound': '',
	                          'qosOutbound': '',
	                          'stp': 'off'}}
	nics = {'enp3s0': {'addr': '',
	                   'cfg': {},
	                   'hwaddr': '00:25:90:32:e4:88',
	                   'ipv6addrs': [],
	                   'mtu': '1500',
	                   'netmask': '',
	                   'speed': 0},
	        'enp4s0': {'addr': '',
	                   'cfg': {'BRIDGE': 'ovirtmgmt',
	                           'DEVICE': 'enp4s0',
	                           'HWADDR': '00:25:90:32:e4:89',
	                           'NM_CONTROLLED': 'no',
	                           'ONBOOT': 'yes',
	                           'PEERNTP': 'yes'},
	                   'hwaddr': '00:25:90:32:e4:89',
	                   'ipv6addrs': ['fe80::225:90ff:fe32:e489/64'],
	                   'mtu': '1500',
	                   'netmask': '',
	                   'speed': 1000},
	        'enp5s0': {'addr': '',
	                   'cfg': {},
	                   'hwaddr': '00:25:90:32:e4:8a',
	                   'ipv6addrs': [],
	                   'mtu': '1500',
	                   'netmask': '',
	                   'speed': 0},
	        'enp6s0': {'addr': '',
	                   'cfg': {},
	                   'hwaddr': '00:25:90:32:e4:8b',
	                   'ipv6addrs': [],
	                   'mtu': '1500',
	                   'netmask': '',
	                   'speed': 0}}
	operatingSystem = {'name': 'oVirt Node', 'release': '1.0.0.fc19', 'version': '3.0.2'}
	packages2 = {'glusterfs': {'buildtime': 1380287454,
	                           'release': '1.fc19',
	                           'version': '3.4.1'},
	             'glusterfs-fuse': {'buildtime': 1380287454,
	                                'release': '1.fc19',
	                                'version': '3.4.1'},
	             'glusterfs-rdma': {'buildtime': 1380287454,
	                                'release': '1.fc19',
	                                'version': '3.4.1'},
	             'glusterfs-server': {'buildtime': 1380287454,
	                                  'release': '1.fc19',
	                                  'version': '3.4.1'},
	             'kernel': {'buildtime': 1383574143.0,
	                        'release': '200.fc19.x86_64',
	                        'version': '3.11.7'},
	             'libvirt': {'buildtime': 1384763408,
	                         'release': '2.fc19',
	                         'version': '1.1.3.1'},
	             'mom': {'buildtime': 1375215820, 'release': '3.fc19', 'version': '0.3.2'},
	             'qemu-img': {'buildtime': 1383725921,
	                          'release': '1.fc19',
	                          'version': '1.6.1'},
	             'qemu-kvm': {'buildtime': 1383725921,
	                          'release': '1.fc19',
	                          'version': '1.6.1'},
	             'spice-server': {'buildtime': 1383130020,
	                              'release': '3.fc19',
	                              'version': '0.12.4'},
	             'vdsm': {'buildtime': 1384274283, 'release': '11.fc19', 'version': '4.13.0'}}
	reservedMem = '321'
	software_revision = '11'
	software_version = '4.13'
	supportedENGINEs = ['3.0', '3.1', '3.2', '3.3']
	supportedProtocols = ['2.2', '2.3']
	uuid = '49434D53-0200-9032-2500-329025008BE4'
	version_name = 'Snow Man'
	vlans = {}
	vmTypes = ['kvm']

Comment 4 Fabian Deutsch 2013-11-21 16:58:06 UTC
Please re-run the whole thing, and before registering to Engine pelase run
# setenforce 0
on the shell, after registration and launching a VM please attach
/var/log/audit/audit.log

Comment 5 Fabian Deutsch 2013-11-21 20:07:56 UTC
Douglas,

could it be that these selinux problems exist, because of the install time selinux problems we saw with vdsm? (Haven't got a link right now)

Comment 6 Douglas Schilling Landgraf 2013-11-22 12:50:38 UTC
Created attachment 827738 [details]
audit.log

Comment 7 Douglas Schilling Landgraf 2013-11-22 12:55:32 UTC
Hey Fabian,

(In reply to Fabian Deutsch from comment #4)
> Please re-run the whole thing, and before registering to Engine pelase run
> # setenforce 0
> on the shell, after registration and launching a VM please attach
> /var/log/audit/audit.log

I have attached the audit.log. As we talked, setting selinux to permissive on ovirt-node make the vm starts. Otherwise I see the error like on engine logs "Input/output error": 

2013-11-22 07:47:42,084 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-86) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM test is down. Exit message: Child quit during startup handshake: Input/output error.

The vdsm patch that we discussed is: http://gerrit.ovirt.org/#/c/21432/ but I don't think it's related.

Comment 8 Fabian Deutsch 2013-11-22 15:25:11 UTC
$ audit2allow -i ~/Downloads/audit.log 


#============= collectd_t ==============
allow collectd_t virtd_t:unix_stream_socket connectto;

#============= initrc_t ==============
allow initrc_t sshd_net_t:process dyntransition;
allow initrc_t unconfined_t:process dyntransition;

#============= local_login_t ==============
allow local_login_t var_log_t:file open;

#============= logrotate_t ==============
allow logrotate_t virt_cache_t:dir read;

#============= svirt_t ==============
allow svirt_t initrc_t:unix_stream_socket connectto;

#============= tuned_t ==============
allow tuned_t ovirt_t:dbus send_msg;


A proposed fix is here:
http://gerrit.ovirt.org/#/c/21540/