1142709 – Trying to deploy hosted-engine via iSCSI device fails due to selinux issue on EL7 and Fedora

Bug 1142709 - Trying to deploy hosted-engine via iSCSI device fails due to selinux issue on EL7 and Fedora

Summary: Trying to deploy hosted-engine via iSCSI device fails due to selinux issue on...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	vdsm
Sub Component:
Version:	3.5
Hardware:	x86_64
OS:	Other
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Sandro Bonazzola
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:	integration
Depends On:	1127460 1146529 1146531 1149705
Blocks:	rhev35betablocker rhev35rcblocker rhev35gablocker
TreeView+	depends on / blocked

Reported:	2014-09-17 09:01 UTC by Thomas Keppler
Modified:	2014-11-14 16:04 UTC (History)
CC List:	24 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-10-17 12:38:17 UTC
oVirt Team:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Logs (1.10 MB, application/octet-stream) 2014-09-17 09:01 UTC, Thomas Keppler	no flags	Details
/var/log/messages of xxx-vmnode4 (3.87 MB, text/plain) 2014-09-18 06:05 UTC, Thomas Keppler	no flags	Details
journalctl -o verbose of xxx-vmnode4 (1.88 MB, application/octet-stream) 2014-09-18 06:44 UTC, Thomas Keppler	no flags	Details
English description of my steps until the failure happened (6.46 KB, text/plain) 2014-09-18 08:55 UTC, Thomas Keppler	no flags	Details
audit log (654.65 KB, text/plain) 2014-09-25 08:46 UTC, Sandro Bonazzola	no flags	Details
ovirt-host-deploy logs (300.83 KB, text/plain) 2014-09-25 08:59 UTC, Sandro Bonazzola	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	33492	0	master	MERGED	lvm: Set libvirt image selinux label on block devices backing vdsm images	Never
oVirt gerrit	33697	0	None	None	None	Never

Description Thomas Keppler 2014-09-17 09:01:24 UTC

Created attachment 938391 [details]
Logs

Description of problem:
Trying to deploy the hosted-engine via iSCSI doesn't work. It results in a working engine, that pauses whenever the first node (aka the first machine the deploy process is run on) is added to the engine.

Version-Release number of selected component (if applicable):


How reproducible:
Very (tried four times, failing at the same exact moment)


Steps to Reproduce:
1.) Install a fresh copy of CentOS 7
2.) Install the nightly snapshot repos for ovirt
3.) Install ovirt-hosted-engine-setup
4.) Running vdsm-tool configure --force (because of the known cert issue)
5.) Running hosted-engine --deploy
6.) Inatll the VM with ovirt-engine + deploying it (successful)
7.) Let the hosted-engine --deploy process add the node to the engine

Actual results: The host starts to be added (becomes visible) into the engine, but then the engine-vm pauses resulting in a broken install on the node side.


Expected results: The host should be added (become visible) into the engine, the VM shouldn't stop and the host should become up and running.

Additional info: Logs from both the engine and the node are provided as an attachment.

Comment 1 Sandro Bonazzola 2014-09-17 10:45:11 UTC

On vmnode4 the hosted engine setup logs ovirt-hosted-engine-setup-20140917090611-vs45gs.log shows 

2014-09-17 09:57:01 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 Hosted Engine successfully set up

Before telling that the setup was successfully it waited for the VM to be down and then destroyed it.
 2014-09-17 09:57:00 DEBUG otopi.ovirt_hosted_engine_setup.tasks tasks.wait:97 Done
 2014-09-17 09:57:00 DEBUG otopi.plugins.ovirt_hosted_engine_setup.ha.ha_services plugin.executeRaw:785 execute: ['/bin/vdsClient', '-s', 'localhost', 'destroy', '8411bfbb-f35c-45bf-8266-b0a6c69b3101'], executable='None', cwd='None', env=None

Then it started the HA daemons:
 2014-09-17 09:57:00 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:785 execute: ('/bin/systemctl', 'enable', u'ovirt-ha-agent.service'), executable='None', cwd='None', env=None
 2014-09-17 09:57:00 DEBUG otopi.plugins.otopi.services.systemd  plugin.executeRaw:785 execute: ('/bin/systemctl', 'start', 'ovirt-ha-agent.service'), executable='None', cwd='None', env=None

 2014-09-17 09:57:01 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:785 execute: ('/bin/systemctl', 'enable', u'ovirt-ha-broker.service'), executable='None', cwd='None', env=None
 2014-09-17 09:57:01 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:785 execute: ('/bin/systemctl', 'start', 'ovirt-ha-broker.service'), executable='None', cwd='None', env=None

So from a setup point of view, everything has gone well.

hosted-engine HA daemon successfully started the VM:
MainThread::INFO::2014-09-17 09:58:30,694::hosted_engine::877::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) Starting vm using `/usr/sbin/hosted-engine --vm-start`
MainThread::INFO::2014-09-17 09:58:30,968::hosted_engine::883::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_start_engine_vm) stdout: 
8411bfbb-f35c-45bf-8266-b0a6c69b3101
	Status = WaitForLaunch
	nicModel = rtl8139,pv
	emulatedMachine = pc
	pid = 0
	devices = [{'index': '2', 'iface': 'ide', 'specParams': {}, 'readonly': 'true', 'deviceId': '8f72938c-4831-4af6-9d6d-2ec507ce5a68', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'shared': 'false', 'path': '/Fedora-20-x86_64-netinst.iso', 'type': 'disk'}, {'index': '0', 'iface': 'virtio', 'reqsize': '0', 'format': 'raw', 'bootOrder': '1', 'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'volumeID': 'd2dbd275-1ba8-4018-a242-75ce5f2e5314', 'imageID': '64acee54-cffd-461e-b27e-4efa9afb350b', 'specParams': {}, 'readonly': 'false', 'domainID': 'c316df3b-2309-4196-8d43-6b54e446daed', 'deviceId': '64acee54-cffd-461e-b27e-4efa9afb350b', 'type': 'disk', 'poolID': '00000000-0000-0000-0000-000000000000', 'device': 'disk', 'shared': 'exclusive', 'propagateErrors': 'off', 'optional': 'false'}, {'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'nicModel': 'pv', 'macAddr': '22:33:aa:bb:33:22', 'linkActive': 'true', 'network': 'ovirtmgmt', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': 'a7627cfc-60ed-4582-8cb7-dc250ec6dea9', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'console', 'specParams': {}, 'type': 'console', 'deviceId': 'd4015181-8d3d-4ff9-8425-44b058c7ea72', 'alias': 'console0'}]
	smp = 4
	vmType = kvm
	displayIp = 0
	displaySecurePort = -1
	memSize = 8192
	displayPort = -1
	cpuType = SandyBridge
	spiceSecureChannels = smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
	clientIp = 
	vmName = HostedEngine
	display = vnc


And reported a good state:
MainThread::INFO::2014-09-17 10:17:37,868::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'health': 'good', 'vm': 'up', 'detail': 'up'}, 'bridge': True, 'mem-free': 61853.0, 'maintenance': False, 'cpu-load': 0.0167, 'gateway': True}

I'm not a VDSM expert but seems to me that the following line says that the VM is up and running:
Thread-1679::DEBUG::2014-09-17 10:18:57,669::BindingXMLRPC::1139::vds::(wrapper) return vmGetStats with {'status': {'message': 'Done', 'code': 0}, 'statsList': [{'displayInfo': [{'tlsPort': '-1', 'ipAddress': '0', 'type': 'vnc', 'port': u'5900'}], 'memUsage': '0', 'acpiEnable': 'true', 'guestFQDN': '', 'pid': '22214', 'session': 'Unknown', 'vmType': 'kvm', 'timeOffset': '0', 'balloonInfo': {'balloon_max': '8388608', 'balloon_target': '0', 'balloon_cur': '8388608', 'balloon_min': '0'}, 'pauseCode': 'NOERR', 'network': {u'vnet0': {'macAddr': '22:33:aa:bb:33:22', 'rxDropped': '0', 'rxErrors': '0', 'txDropped': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': u'vnet0'}}, 'vmJobs': {}, 'cpuUser': '1.67', 'elapsedTime': '1226', 'displayType': 'vnc', 'cpuSys': '0.33', 'appsList': [], 'vcpuCount': '4', 'clientIp': '', 'hash': '-6500834245358214803', 'vmId': '8411bfbb-f35c-45bf-8266-b0a6c69b3101', 'displayIp': '0', 'vcpuPeriod': 100000L, 'displayPort': u'5900', 'vcpuQuota': '-1', 'kvmEnable': 'true', 'disks': {u'vda': {'readLatency': '2161970', 'apparentsize': '26843545600', 'writeLatency': '3532596', 'imageID': '64acee54-cffd-461e-b27e-4efa9afb350b', 'flushLatency': '55917', 'readRate': '300793.44', 'truesize': '26843545600', 'writeRate': '154931.46'}, u'hdc': {'readLatency': '0', 'apparentsize': '0', 'writeLatency': '0', 'flushLatency': '0', 'readRate': '0.00', 'truesize': '0', 'writeRate': '0.00'}}, 'monitorResponse': '0', 'statsAge': '0.35', 'username': 'Unknown', 'status': 'Up', 'guestCPUCount': -1, 'guestIPs': '', 'displaySecurePort': '-1'}]}


So on Hosted Engine and VDSM side everything looks well.

On the other end, the deployment seems to fail:

2014-09-17 09:42:59,360 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Initializing
2014-09-17 09:42:59,369 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Initializing.
2014-09-17 09:42:59,385 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Environment setup
2014-09-17 09:42:59,390 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Environment setup.
2014-09-17 09:42:59,400 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Environment packages setup
2014-09-17 09:42:59,404 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Environment packages setup.
2014-09-17 09:43:01,298 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-24) Initializing Host: xxx-vmnode4
2014-09-17 09:43:13,070 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Programs detection
2014-09-17 09:43:13,083 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Programs detection.
2014-09-17 09:43:13,097 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Environment customization
2014-09-17 09:43:13,106 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Environment customization.
2014-09-17 09:43:17,496 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Kdump supported
2014-09-17 09:43:17,503 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Kdump supported.
2014-09-17 09:43:17,512 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Logs at host located at: '/tmp/ovirt-host-deploy-20140917094259-b7qbdo.log'
2014-09-17 09:43:17,518 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Logs at host located at: '/tmp/ovirt-host-deploy-20140917094259-b7qbdo.log'.
2014-09-17 09:43:17,584 INFO  [org.ovirt.engine.core.bll.VdsDeploy] (VdsDeploy) Host 192.168.1.5 reports unique id 03000200-0400-0500-0006-000700080009
2014-09-17 09:43:17,605 INFO  [org.ovirt.engine.core.bll.VdsDeploy] (VdsDeploy) Assigning unique id 03000200-0400-0500-0006-000700080009 to Host 192.168.1.5
2014-09-17 09:43:17,668 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Disabling Kdump integration
2014-09-17 09:43:17,674 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Disabling Kdump integration.
2014-09-17 09:43:17,686 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Setup validation
2014-09-17 09:43:17,690 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Setup validation.
2014-09-17 09:43:17,743 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Hardware supports virtualization
2014-09-17 09:43:17,747 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Hardware supports virtualization.
2014-09-17 09:43:17,894 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Transaction setup
2014-09-17 09:43:17,900 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Transaction setup.
2014-09-17 09:43:17,942 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Misc configuration
2014-09-17 09:43:17,946 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Misc configuration.
2014-09-17 09:43:17,947 INFO  [org.ovirt.engine.core.bll.InstallerMessages] (VdsDeploy) Installation 192.168.1.5: Stage: Package installation
2014-09-17 09:43:17,951 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) Correlation ID: 1bbe00d9, Call Stack: null, Custom Event ID: -1, Message: Installing Host xxx-vmnode4. Stage: Package installation.

and then garbage in the log file.

Looks like something really bad happened but I can't see it as hosted-engine issue. Whatever happened, it happened around 2014-09-17 09:43:17. looking at the logs around that timestamp

Comment 2 Sandro Bonazzola 2014-09-17 10:51:20 UTC

Around that time I can see in vdsm log file:

Thread-65::DEBUG::2014-09-17 09:43:14,514::blockSD::596::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '1+0 records in\n1+0 records out\n4096 bytes (4.1 kB) copied, 0.00544066 s, 753 kB/s\n'; <rc> = 0
GuestMonitor-HostedEngine::DEBUG::2014-09-17 09:43:18,069::vm::486::vm.Vm::(_getUserCpuTuneInfo) vmId=`8411bfbb-f35c-45bf-8266-b0a6c69b3101`::Domain Metadata is not set
GuestMonitor-HostedEngine::DEBUG::2014-09-17 09:43:18,070::vm::486::vm.Vm::(_getUserCpuTuneInfo) vmId=`8411bfbb-f35c-45bf-8266-b0a6c69b3101`::Domain Metadata is not set
GuestMonitor-HostedEngine::DEBUG::2014-09-17 09:43:18,071::vm::486::vm.Vm::(_getUserCpuTuneInfo) vmId=`8411bfbb-f35c-45bf-8266-b0a6c69b3101`::Domain Metadata is not set
MainThread::DEBUG::2014-09-17 09:43:22,119::vdsm::58::vds::(sigtermHandler) Received signal 15
MainThread::DEBUG::2014-09-17 09:43:22,120::protocoldetector::133::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
Thread-30::ERROR::2014-09-17 09:43:22,120::__init__::317::IOProcessClient::(_communicate) IOProcess failure
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line 272, in _communicate
    raise Exception("FD closed")
Exception: FD closed
MainThread::INFO::2014-09-17 09:43:22,121::__init__::563::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server
Detector thread::DEBUG::2014-09-17 09:43:22,121::protocoldetector::104::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor
Thread-30::DEBUG::2014-09-17 09:43:22,122::__init__::159::IOProcessClient::(_run) Starting IOProcess...
MainThread::INFO::2014-09-17 09:43:22,123::vmchannels::188::vds::(stop) VM channels listener was stopped.
MainThread::INFO::2014-09-17 09:43:22,129::momIF::91::MOM::(stop) Shutting down MOM


Moving to ioprocess since I can't see anything else failing around that time.

Comment 3 Dan Kenigsberg 2014-09-17 13:11:03 UTC

I don't think it's related to ioprocess. Its FD was closed right after

MainThread::DEBUG::2014-09-17 09:43:22,119::vdsm::58::vds::(sigtermHandler) Received signal 15

We need to understand what stopped/restarted vdsm. Could it be hosted-engine-setup itself?

Something else which is worrying are repeated calls to hbaScan() every 10 seconds.

Comment 4 Dan Kenigsberg 2014-09-17 14:41:15 UTC

Do you have hints of vdsmd restart on /var/log/vdsm/spm-lock.log or /var/log/messages?

Comment 5 Thomas Keppler 2014-09-17 16:16:01 UTC

Do you need those from the node or the engine VM?
I am currently not able to see those files, but I will review and post them ASAP. 
Thanks for your help and interest. 

(In reply to Dan Kenigsberg from comment #4)
> Do you have hints of vdsmd restart on /var/log/vdsm/spm-lock.log or
> /var/log/messages?

Comment 6 Dan Kenigsberg 2014-09-17 18:24:01 UTC

(In reply to Thomas Keppler from comment #5)
> Do you need those from the node or the engine VM?

From the node which had vdsmd restarted - we need to guess what caused that.

Comment 7 Thomas Keppler 2014-09-18 06:05:14 UTC

Created attachment 938756 [details]
/var/log/messages of xxx-vmnode4

Comment 8 Thomas Keppler 2014-09-18 06:08:07 UTC

As I don't have the /var/log/vdsm/spm-lock.log as requested, else it would have been in my other file I've already uploaded, but I'll provide you with the /var/log/messages file and the output of journalctl -o verbose (still uploading as of now). I hope you can find your desired information in there and that it helps to hunt the problem down!!

Best regards
Thomas

(In reply to Dan Kenigsberg from comment #6)
> (In reply to Thomas Keppler from comment #5)
> > Do you need those from the node or the engine VM?
> 
> From the node which had vdsmd restarted - we need to guess what caused that.

Comment 9 Thomas Keppler 2014-09-18 06:44:27 UTC

Created attachment 938757 [details]
journalctl -o verbose of xxx-vmnode4

Comment 10 Sandro Bonazzola 2014-09-18 08:28:28 UTC

I've reproduced the issue following http://pastebin.com/raw.php?i=sH9cfvDx so it seems reproducible.

I think it may be somehow related to bug #1136368. 

The engine VM got paused:
# vdsClient -s 0 getVmStats b692df43-4c12-498e-b6d4-aff7e928bd85

b692df43-4c12-498e-b6d4-aff7e928bd85
	Status = Paused
	displayInfo = [{'tlsPort': '-1', 'ipAddress': '0', 'port': '5900', 'type': 'vnc'}]
	memUsage = 0
	acpiEnable = true
	guestFQDN = 
	session = Unknown
	displaySecurePort = -1
	timeOffset = 0
	balloonInfo = {'balloon_max': '8388608', 'balloon_min': '0', 'balloon_target': '0', 'balloon_cur': '8388608'}
	pauseCode = NOERR
	network = {'vnet0': {'macAddr': '22:33:aa:bb:33:22', 'rxDropped': '0', 'txDropped': '0', 'rxErrors': '0', 'txRate': '0.0', 'rxRate': '0.0', 'txErrors': '0', 'state': 'unknown', 'speed': '1000', 'name': 'vnet0'}}
	vmType = kvm
	cpuUser = 0.00
	elapsedTime = 1918
	vmJobs = {}
	cpuSys = 0.00
	appsList = []
	displayType = vnc
	hash = -7643996875850563108
	pid = 4676
	displayIp = 0
	displayPort = 5900
	guestIPs = 
	kvmEnable = true
	disks = {'vda': {'readLatency': '0', 'apparentsize': '21474836480', 'writeLatency': '0', 'imageID': '94f46485-ab0f-4ac8-995f-aaea87724284', 'flushLatency': '0', 'truesize': '21474836480'}, 'hdc': {'flushLatency': '0', 'readLatency': '0', 'truesize': '0', 'apparentsize': '0', 'writeLatency': '0'}}
	monitorResponse = 0
	statsAge = 0.16
	username = Unknown
	guestCPUCount = -1
	clientIp = 

# vdsClient -s 0 getDeviceList
[{'GUID': '360000000000000000e00000000010001',
  'capacity': '26843545600',
  'devtype': 'iSCSI',
  'fwrev': '0001',
  'logicalblocksize': '512',
  'pathlist': [{'connection': '192.168.1.108',
                'initiatorname': 'default',
                'iqn': 'iqn.2009-02.com.example:for.all',
                'port': '3260',
                'portal': '1'}],
  'pathstatus': [{'lun': '1',
                  'physdev': 'sdb',
                  'state': 'active',
                  'type': 'iSCSI'}],
  'physicalblocksize': '4096',
  'productID': 'VIRTUAL-DISK',
  'pvUUID': '3rQ820-EGtx-gdXL-SiYM-SgGY-ghv2-TYa18Z',
  'serial': 'SIET_VIRTUAL-DISK',
  'status': 'used',
  'vendorID': 'IET',
  'vgUUID': 'OgZx4n-phrw-3zoV-p6wv-ZsqT-2Imh-Fe5crv'}]


# vdsClient -s 0 getDevicesVisibility 360000000000000000e00000000010001
	360000000000000000e00000000010001 = True


# iptables-save 
# Generated by iptables-save v1.4.21 on Thu Sep 18 10:23:29 2014
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [19483:1557514]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp -m tcp --dport 54321 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p udp -m udp --dport 161 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 16514 -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT
-A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited
COMMIT
# Completed on Thu Sep 18 10:23:29 2014

# vdsClient -s 0 getStorageDomainsList
be03d311-751b-4623-82a0-f3113f542043

# vdsClient -s 0 getStorageDomainStats be03d311-751b-4623-82a0-f3113f542043
	disktotal = 26440892416 (24.0GB)
	diskfree  = 536870912 (512.0MB)

# vdsClient -s 0 getStorageDomainInfo be03d311-751b-4623-82a0-f3113f542043
	uuid = be03d311-751b-4623-82a0-f3113f542043
	vguuid = OgZx4n-phrw-3zoV-p6wv-ZsqT-2Imh-Fe5crv
	state = OK
	version = 3
	role = Master
	type = ISCSI
	class = Data
	pool = ['ffcdbc2f-0fa2-4991-8b6d-94050a1954e7']
	name = hosted_storage

# vdsClient -s 0 getVolumesList be03d311-751b-4623-82a0-f3113f542043 0
31ccf8c8-0547-4b98-8e8c-4109f3c345f8 : Hosted Engine Image. 
d602fc34-e4b4-498a-b2df-84de80e68ec5 : hosted-engine.lockspace. 
a1b1c513-9367-4092-a5c3-3744395a279b : hosted-engine.metadata.

# vdsClient -s 0 getVolumeInfo be03d311-751b-4623-82a0-f3113f542043 0 0 31ccf8c8-0547-4b98-8e8c-4109f3c345f8
	status = OK
	domain = be03d311-751b-4623-82a0-f3113f542043
	capacity = 21474836480
	voltype = LEAF
	description = Hosted Engine Image
	parent = 00000000-0000-0000-0000-000000000000
	format = RAW
	image = 94f46485-ab0f-4ac8-995f-aaea87724284
	uuid = 31ccf8c8-0547-4b98-8e8c-4109f3c345f8
	disktype = 2
	legality = LEGAL
	mtime = 0
	apparentsize = 21474836480
	truesize = 21474836480
	type = PREALLOCATED
	children = []
	pool = 
	ctime = 1411024926

Comment 11 Sandro Bonazzola 2014-09-18 08:29:52 UTC

vdsm logs shows:

Thread-15::ERROR::2014-09-18 10:29:17,084::sampling::475::vm.Vm::(collect) vmId=`b692df43-4c12-498e-b6d4-aff7e928bd85`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x3134750>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 313, in _sampleCpu
    cpuStats = self._vm._dom.getCPUStats(True, 0)
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2016, in getCPUStats
    if ret is None: raise libvirtError ('virDomainGetCPUStats() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPUACCT controller is not mounted
Thread-15::DEBUG::2014-09-18 10:29:17,088::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 55 edom: 10 level: 2 message: Requested operation is not valid: cgroup CPU controller is not mounted
Thread-15::ERROR::2014-09-18 10:29:17,088::sampling::475::vm.Vm::(collect) vmId=`b692df43-4c12-498e-b6d4-aff7e928bd85`::Stats function failed: <AdvancedStatsFunction _sampleCpuTune at 0x312bd18>
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/sampling.py", line 471, in collect
    statsFunction()
  File "/usr/share/vdsm/virt/sampling.py", line 346, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/virt/vm.py", line 359, in _sampleCpuTune
    infos = self._vm._dom.schedulerParameters()
  File "/usr/share/vdsm/virt/vm.py", line 670, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2134, in schedulerParameters
    if ret is None: raise libvirtError ('virDomainGetSchedulerParameters() failed', dom=self)
libvirtError: Requested operation is not valid: cgroup CPU controller is not mounted

Comment 12 Allon Mureinik 2014-09-18 08:52:38 UTC

Need to clear up the cgroups issue first anyway, then re-determine if there's anything else to be done.

Moving to SLA to handle the above.

Comment 13 Thomas Keppler 2014-09-18 08:55:34 UTC

Created attachment 938809 [details]
English description of my steps until the failure happened

Added my steps to failure for future reference.

Comment 14 Sandro Bonazzola 2014-09-18 09:51:17 UTC

Not sure it may help, but:

# mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

Comment 15 Sandro Bonazzola 2014-09-18 13:27:42 UTC

Tried to reproduce using CentOS 6.5 host and CentOS 6.5 VM, the issue is not reproducible there.

Now going to try Fedora 20 host and Fedora 20 VM, I fear it's a CentOS 7 only issue.

Comment 16 Sandro Bonazzola 2014-09-18 15:16:58 UTC

I reproduced with Fedora 20 host and Fedora 20 VM.
For a reason I don't understand yet, on Fedora 20 I can see in the manager UI that libvirt has been stopped by host-deploy:
	
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stopping libvirtd.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stage: Closing up.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stage: Transaction commit.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Enrolling certificate.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Setting time.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stage: Misc configuration.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Yum Verify: 10/10: python-setuptools.noarch 0:1.4.2-1.fc20 - u.
...
...
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Hardware supports virtualization.
2014-Sep-18, 17:05 Host hosted_engine_1 installation in progress . Cannot validate host name settings, reason: cannot resolve own name 'mididell.home'.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stage: Setup validation.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Disabling Kdump integration.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Logs at host located at: '/tmp/ovirt-host-deploy-20140918170506-bmvcub.log'.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Kdump unsupported.
...

2014-Sep-18, 17:05 Installing Host hosted_engine_1. Stage: Initializing.
2014-Sep-18, 17:05 Installing Host hosted_engine_1. Connected to host 192.168.1.106 with SSH key fingerprint: 97:dc:00:97:de:42:d3:d4:e8:e3:b8:01:f5:04:d8:9b.

This may explain why everything stop working later on.
I'm sure this didn't happened in 3.4 and in 3.5 beta.

For Martin, here is the xml dump:

<domain type='kvm' id='3'>
  <name>HostedEngine</name>
  <uuid>7f9cd40e-252e-4bed-8247-5e2221fe9807</uuid>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <shares>1020</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>oVirt</entry>
      <entry name='product'>oVirt Node</entry>
      <entry name='version'>20-3</entry>
      <entry name='serial'>4C4C4544-0039-4610-8039-B4C04F563032_f8:b1:56:c8:96:1f</entry>
      <entry name='uuid'>7f9cd40e-252e-4bed-8247-5e2221fe9807</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-1.6'>hvm</type>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Haswell</model>
  </cpu>
  <clock offset='variable' adjustment='0' basis='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/tmp/Fedora-20-x86_64-netinst.iso' startupPolicy='optional'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <serial></serial>
      <alias name='ide0-1-0'/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/var/run/vdsm/storage/442b0aca-7a4c-45bd-9d54-f41c05571bb1/a65bb620-5324-47b1-8832-44866f8fa22d/5e3f5725-156d-4da1-a0f7-f00846c93d58'/>
      <target dev='vda' bus='virtio'/>
      <serial>a65bb620-5324-47b1-8832-44866f8fa22d</serial>
      <boot order='1'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <lease>
      <lockspace>442b0aca-7a4c-45bd-9d54-f41c05571bb1</lockspace>
      <key>5e3f5725-156d-4da1-a0f7-f00846c93d58</key>
      <target path='/dev/442b0aca-7a4c-45bd-9d54-f41c05571bb1/leases' offset='111149056'/>
    </lease>
    <interface type='bridge'>
      <mac address='00:16:3e:72:62:3f'/>
      <source bridge='ovirtmgmt'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='virtio' port='0'/>
      <alias name='console0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/7f9cd40e-252e-4bed-8247-5e2221fe9807.com.redhat.rhevm.vdsm'/>
      <target type='virtio' name='com.redhat.rhevm.vdsm'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channels/7f9cd40e-252e-4bed-8247-5e2221fe9807.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5900' autoport='yes' listen='0' passwdValidTo='2014-09-18T17:34:15'>
      <listen type='address' address='0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='none'>
      <alias name='balloon0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c379,c386</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c379,c386</imagelabel>
  </seclabel>
</domain>

Comment 17 Alon Bar-Lev 2014-09-18 15:48:41 UTC

How:
"""
This may explain why everything stop working later on.
I'm sure this didn't happened in 3.4 and in 3.5 beta.
"""

What explains? what is the explanation?

Comment 18 Sandro Bonazzola 2014-09-19 06:22:48 UTC

(In reply to Alon Bar-Lev from comment #17)
> How:
> """
> This may explain why everything stop working later on.
> I'm sure this didn't happened in 3.4 and in 3.5 beta.
> """
> 
> What explains? what is the explanation?

Restarting libvirt causes iSCSI device to become unreachable, vdsm to receive sigterm, the hosted engine to be paused, the host deploy to be incomplete since the VM running the engine is paused due to above events cascade.

Comment 19 Alon Bar-Lev 2014-09-19 07:01:00 UTC

host-deploy stops vdsmd, stops libvirtd then start libvirtd (if no systemd) and start vdsm.

vdsm is down during libvirtd restart.

please explain again how it is related to host-deploy.

if libvirt does not survive restart due to invalid configuration it is something else.

Comment 20 Thomas Keppler 2014-09-19 10:07:45 UTC

I tried it again with firewalld removed both on the host machine and in the engine-vm. Will provide some logs later, but for now it seems to be OK.

I will also try to add another host to this configuration later and see what I can do...

A (really bad) workaround for this issue would be, to be really quick and forcefully power the VM down (hosted-engine --vm-poweroff) and start it back up (hosted-engine --vm-start). After that (if you are lucky) the host is "successfully" added, the hosted-engine --deploy script doesn't notice that, though, so you have to wait for the timeout in order to go on with the deploy process.

All in all, it sort of works but this is just far from stable or good.

Comment 21 Sandro Bonazzola 2014-09-19 10:18:14 UTC

(In reply to Alon Bar-Lev from comment #19)
> host-deploy stops vdsmd, stops libvirtd then start libvirtd (if no systemd)
> and start vdsm.
> 
> vdsm is down during libvirtd restart.
> 
> please explain again how it is related to host-deploy.
> 
> if libvirt does not survive restart due to invalid configuration it is
> something else.

Alon, this is hosted engine. libvirt and vdsmd are configured and running because they're needed for creating the VM where the engine is running.
host-deploy is executed when the host running the hosted engine is added to the engine itself.

is host-deploy restarting libvirt because of both firewalld and iptables installed on the host? looking at comment #20 it seems that removing firewalld before deploying the host solve the issue.
If so, I can add a Conflict in the hosted engine spec file.
BTW, I don't think that stopping vdsm / libvirt on a host already running any VM managed by VDSM (not only the hosted engine one) is the right thing to do.

Also, looking at the sequence in comment #16 vdsm is not stopped before stopping libvirt. Stopping libvirt causes SIGTERM in vdsmd and this is also something I don't think it's what host deploy should do.

Reducing severity from urgent to high since a workaround (removing firewalld on the host) seems to be enough for having the system running.

Comment 22 Thomas Keppler 2014-09-19 10:23:22 UTC

I am sorry if I hadn't made that clear before but no, it *doesn't help*. 
Same result.

Comment 23 Alon Bar-Lev 2014-09-19 10:26:10 UTC

it is not new that vdsm and libvirt are restarted, please reassign to your-self until you figure out what's wrong.

Comment 24 Sandro Bonazzola 2014-09-19 10:37:31 UTC

Alon, I'm reassigning this to myself, but nothing changed on hosted-engine setup between 3.4 and 3.5 that may justify this issue and I still think it's wrong restarting vdsm and libvirt on a system where a vm relying on those 2 daemons is already running. 
Anyway, any help finding out the root issue from the 20 people in CC on this bug is welcome.

Comment 25 Alon Bar-Lev 2014-09-19 10:46:26 UTC

it may be wrong but this what ensures that after update of these components the latest is used. also, if you look at the spec files of these you see that they restart self post upgrade anyway.

nothing in host-deploy was modified in this regard since 3.2, you can see the git log.

the entire solution of hosted-engine in this regard is faulty anyway... not that we did not know.

restarting libvirt should not effect running vms as far as I know, so I still think the root cause was not found.

ovirt-host-deploy is natural per above default component behavior that we do not control.

Comment 26 Sandro Bonazzola 2014-09-19 12:39:46 UTC

(In reply to Alon Bar-Lev from comment #25)
> the entire solution of hosted-engine in this regard is faulty anyway... not
> that we did not know.

Please open a separate bug describing why it is faulty and how it should be.
We can consider to use a different implementation in a future release.

Comment 27 Thomas Keppler 2014-09-19 13:55:48 UTC

Update:
Using NFS seems to work using CentOS 7/FC20!

...but this doesn't fix the problem.

Comment 28 Oved Ourfali 2014-09-21 10:11:39 UTC

Moving to integration for further investigation.
Please needinfo me in case it turns out as an infra issue.

Comment 29 Thomas Keppler 2014-09-22 09:44:43 UTC

Could confirm sbonazzo's result that with a CentOS 6.5 Host + CentOS 6.5 Engine, (nothing changes beyond that) this setup works perfectly.

Comment 30 Jiri Moskovcak 2014-09-23 08:43:45 UTC

Please check the audit.log if there any selinux warning. Can you try to deploy it with selinux in permissive mode (running setenforce 0 before you start the deploy)

Comment 31 Thomas Keppler 2014-09-24 18:37:56 UTC

I have already reinstalled CentOS 6.5 on those servers and currently don't have the time to debug this fault, if you need any file, talk to Sandro Bonazzola as he will have the same issues on his testing setup.
I will, of course, provide those files if I can in the future, but I'm redirecting this to Mr. Bonazzola because of the issue stated above.

(In reply to Jiri Moskovcak from comment #30)
> Please check the audit.log if there any selinux warning. Can you try to
> deploy it with selinux in permissive mode (running setenforce 0 before you
> start the deploy)

Comment 32 Sandro Bonazzola 2014-09-25 06:45:39 UTC

Dropping Regression keyword, since NFS setup available in previous releases is not affected.
Jiri, I'll try to reproduce again and will check selinux. But if you're thinking about the issue you had with tgtd, this is not the same issue.

Comment 33 Sandro Bonazzola 2014-09-25 08:46:02 UTC

Created attachment 941013 [details]
audit log

Comment 34 Sandro Bonazzola 2014-09-25 08:59:35 UTC

Created attachment 941018 [details]
ovirt-host-deploy logs

Comment 35 Alon Bar-Lev 2014-09-25 09:04:43 UTC

(In reply to Sandro Bonazzola from comment #34)
> Created attachment 941018 [details]
> ovirt-host-deploy logs

Job for vdsmd.service failed. See 'systemctl status vdsmd.service' and 'journalctl -xn' for details.

?

Comment 36 Sandro Bonazzola 2014-09-25 12:00:38 UTC

I've been able to perform the whole setup on EL7 with selinux in permissive mode.
As Jiri suspected is a selinux issue. Now trying to get the needed rules.
Workaround for now is set selinux for running in permissive mode.

Comment 37 Sandro Bonazzola 2014-09-25 12:36:36 UTC

Dan, since this seems to be something that may impact vdsm in general and since you already have selinux dependencies in vdsm, maybe better to bump selinux requirements in vdsm once new release will be available fixing the issue.

Moving this to vdsm component. I can take care of bumping the requirements if you don't want to do it yourself.

Comment 38 Allon Mureinik 2014-09-29 18:10:15 UTC

Nir, does this look related to bug 1127460 ?

Comment 39 Sandro Bonazzola 2014-09-30 14:41:56 UTC

VDSM provided a workaround for lvm selinux rules, currently under verification.

Comment 40 Nir Soffer 2014-10-01 06:32:07 UTC

(In reply to Sandro Bonazzola from comment #39)
> VDSM provided a workaround for lvm selinux rules, currently under
> verification.

Sandro, do you have any reason to believe that this issue is related to selinux label on lvm devices?

That issue is relevant when you have thin provisioning disk and disk becomes full. vdsm detect this, extend the disk and refresh the lv. Refreshing the lvm cause it to loose the selinux label (systemd bug). Here I think you are working with preallocated disk, so this does not seem to be the issue.

On xxx-vmnode4, we can see 3 refreshes during bootstrap - these refresh only 
the special lvs which are not accessed by libvirt/qemu and do not have
the libvirt image selinux label.

storageRefresh::INFO::2014-09-17 09:01:36,503::lvm::668::Storage.LVM::(bootstrap) Refreshing lvs: vg=c316df3b-2309-4196-8d43-6b54e446daed lvs=['ids', 'master', 'metadata', 'out
box', 'inbox', 'leases']
storageRefresh::DEBUG::2014-09-17 09:01:36,505::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/36090a058f04424900541b5000000904b|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh c316df3b-2309-4196-8d43-6b54e446daed/ids c316df3b-2309-4196-8d43-6b54e446daed/master c316df3b-2309-4196-8d43-6b54e446daed/metadata c316df3b-2309-4196-8d43-6b54e446daed/outbox c316df3b-2309-4196-8d43-6b54e446daed/inbox c316df3b-2309-4196-8d43-6b54e446daed/leases (cwd None)

storageRefresh::INFO::2014-09-17 09:07:00,218::lvm::668::Storage.LVM::(bootstrap) Refreshing lvs: vg=c316df3b-2309-4196-8d43-6b54e446daed lvs=['ids', 'master', 'metadata', 'out
box', 'inbox', 'leases']
storageRefresh::DEBUG::2014-09-17 09:07:00,219::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/36090a058f04424900541b5000000904b|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh c316df3b-2309-4196-8d43-6b54e446daed/ids c316df3b-2309-4196-8d43-6b54e446daed/master c316df3b-2309-4196-8d43-6b54e446daed/metadata c316df3b-2309-4196-8d43-6b54e446daed/outbox c316df3b-2309-4196-8d43-6b54e446daed/inbox c316df3b-2309-4196-8d43-6b54e446daed/leases (cwd None)

storageRefresh::INFO::2014-09-17 09:43:41,176::lvm::668::Storage.LVM::(bootstrap) Refreshing lvs: vg=c316df3b-2309-4196-8d43-6b54e446daed lvs=['outbox', 'master', 'inbox', 'met
adata', 'ids', 'leases']
storageRefresh::DEBUG::2014-09-17 09:43:41,178::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm lvchange --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/36090a058f04424900541b5000000904b|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --refresh c316df3b-2309-4196-8d43-6b54e446daed/outbox c316df3b-2309-4196-8d43-6b54e446daed/master c316df3b-2309-4196-8d43-6b54e446daed/inbox c316df3b-2309-4196-8d43-6b54e446daed/metadata c316df3b-2309-4196-8d43-6b54e446daed/ids c316df3b-2309-4196-8d43-6b54e446daed/leases (cwd None)

I don't see any other vdsm log that show a related error, so I don't see any relation to bug 1127460.

Comment 41 Sandro Bonazzola 2014-10-01 06:52:31 UTC

Nir are you sure those are unrelated?
Just verified on EL7 that using http://gerrit.ovirt.org/#/c/33492/4/vdsm/storage/vdsm-lvm.rules.in as suggested yesterday by Dan allow to complete the deploy without reproducing this bug.

Comment 42 Nir Soffer 2014-10-01 07:09:38 UTC

(In reply to Sandro Bonazzola from comment #41)
> Nir are you sure those are unrelated?

I'm sure there is no evidence in this bug that they are.

Can you verify that when you install the same version but without
the patch, the deployment fails?

To verify that this was indeed the issue, when you run without the 
patch, and the vm pause, please find the block device backing the 
hosted engine lv, and check its selinux label:

You can tell the device using lvs, then find the block device
using:

    ls -l /dev/vgname/lvname
    /dev/vgname/lvname -> /dev/dm-xxx

And check the selinux label with:

    ls -Z /dev/dm-xxx

It should be svirt_image_t:s0,cxxx,cyyy. It you see fixed_disk_device_t:s0 then you indeed have the same issue.

Comment 43 Sandro Bonazzola 2014-10-02 07:26:04 UTC

I tried to reproduce getting latest tag and removing your patches from the code but I can't reproduce. So maybe you're right about not being the same issue.
On the other hand it seems to be fixed anyway in latest vdsm.
Pushing a new hosted-engine setup package requiring latest vdsm.

Comment 44 Sandro Bonazzola 2014-10-17 12:38:17 UTC

oVirt 3.5 has been released and should include the fix for this issue.

Note You need to log in before you can comment on or make changes to this bug.

alonbl
amureini
asegurap
bazulay
bugs
danken
didi
dougsland
ecohen
fsimonce
gklein
iheim
jmoskovc
lsurette
lveyde
mgoldboi
nsoffer
oourfali
pstehlik
rbalakri
sbonazzo
stirabos
thomas.keppler
yeylon