Bug 1461429 - PPC - Failed to run VM - VM is down with error. Exit message: internal error: process exited while connecting to monitor: 2017-06-11T16:33:41.103745Z qemu-kvm
PPC - Failed to run VM - VM is down with error. Exit message: internal error:...
Status: CLOSED NEXTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt (Show other bugs)
4.1.3.2
Unspecified Unspecified
unspecified Severity medium (vote)
: ---
: ---
Assigned To: Michal Skrivanek
meital avital
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-14 08:34 EDT by Avihai
Modified: 2017-07-11 01:54 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-07-11 01:54:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine , vdsm and server logs (1.55 MB, application/x-gzip)
2017-06-14 08:34 EDT, Avihai
no flags Details
relevant_host_vdsm log (470.96 KB, application/x-xz)
2017-06-16 03:10 EDT, Avihai
no flags Details
ibm-p8-rhevm-03_vdsm_log (470.96 KB, application/x-xz)
2017-06-17 09:50 EDT, Avihai
no flags Details
ibm-p8-rhevm-04_vdsm_log (612.71 KB, application/x-xz)
2017-06-17 09:51 EDT, Avihai
no flags Details
New_reproduction_17_2017_logs (1.18 MB, application/x-gzip)
2017-06-17 10:30 EDT, Avihai
no flags Details
rpm -qa | sort (24.15 KB, text/plain)
2017-06-20 06:06 EDT, Michal Skrivanek
no flags Details
rpm -qa | sort (24.80 KB, text/plain)
2017-06-20 06:06 EDT, Michal Skrivanek
no flags Details

  None (edit)
Description Avihai 2017-06-14 08:34:10 EDT
Created attachment 1287644 [details]
engine , vdsm and server logs

Description of problem:
Activation of a VM failed with :
VM vm_TestCase5062_REST_ISCSI_1119330305 is  down with error. Exit message: internal error: process exited while connecting to monitor: 2017-06-11T16:33:41.103745Z qemu-kvm: -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515,format=qcow2,if=none,id=drive-virtio-disk1,serial=87a080bc-60af-405b-aa2a-a6cc722bcc27,cache=none,werror=stop,rerror=stop,aio=native: Could not open '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515': Permission denied


Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-4.1.3.2-0.1.el7.noarch


VDSM:
vdsm-4.19.18-1.el7ev.ppc64le

How reproducible:
Happened once so far in Tier1 PPC Test runs, did not reproduce locally on non ppc environment.


Steps to Reproduce:
Automation TestCase5062 
1.Clone VM from Template (cloning vm vm_TestCase5062_REST_ISCSI_1119330305)
2.Start VM (2017-06-11 19:33:38)

Actual results:
VM did not start & error message is seen in the engine .

Expected results:
VM should start .


Additional info:
Only in engine log I see the error , vdsm/libvirt log did not show an error at this time 
 
Engine log:
2017-06-11 19:33:43,568+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [] EVENT_ID: VM_DOWN_ERROR(119), Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm
_TestCase5062_REST_ISCSI_1119330305 is down with error. Exit message: internal error: process exited while connecting to monitor: 2017-06-11T16:33:41.103745Z qemu-kvm: -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e
4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515,format=qcow2,if=none,id=drive-virtio-disk1,serial=87a080bc-60af-405b-aa2a-a6cc722bcc27,cache=none,werror=stop,rerror=stop,a
io=native: Could not open '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515': Permission denied.
2017-06-11 19:33:43,569+03 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-7) [] add VM '4057c03f-3a04-498c-8a33-53608327b085'(vm_TestCase5062_REST_ISCSI_1119330305) to rerun treatment
2017-06-11 19:33:43,572+03 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-7) [] Rerun VM '4057c03f-3a04-498c-8a33-53608327b085'. Called from VDS 'host_mixed_1'
2017-06-11 19:33:43,578+03 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-26) [] EVENT_ID: USER_INITIATED_RUN_VM_FAILED(151), Correlation ID: vms_syncAction_16b4f525-e8ee-42b5,
 Job ID: 914a57b0-be17-4135-997e-2442501d5fce, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vm_TestCase5062_REST_ISCSI_1119330305 on Host host_mixed_1.


VDSM log does not show any error on this time:
[root@ibm-p8-rhevm-03 vdsm]# less vdsm.log.11.xz

2017-06-11 19:33:38,777-0400 INFO  (periodic/1) [vdsm.api] FINISH repoStats return={u'21e4752c-eabd-4b30-abe7-a5b9322adb1a': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000672249', 'lastCheck': '2.0', 'valid': 
True}, u'a1c8a1f5-b7a1-4299-91b7-782b20c52d60': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000594578', 'lastCheck': '1.9', 'valid': True}, u'4e986561-0848-40b0-9bb5-bf3c27d53d39': {'code': 0, 'actual': True, '
version': 4, 'acquired': True, 'delay': '0.0292774', 'lastCheck': '2.0', 'valid': True}, u'eccfa045-b125-4cec-a0a5-e40988a24cd7': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000329248', 'lastCheck': '3.0', 'val
id': True}, u'5efcc54c-e195-41f6-ab60-38ff0037141a': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000361319', 'lastCheck': '3.0', 'valid': True}, u'ee93477d-c1da-4979-9e20-51ae86c67e35': {'code': 0, 'actual': Tr
ue, 'version': 4, 'acquired': True, 'delay': '0.000388562', 'lastCheck': '3.1', 'valid': True}, u'25f5a19e-0997-4139-8813-bd25ae578d4d': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000335127', 'lastCheck': '3.0
', 'valid': True}} from=internal (api:52)
2017-06-11 19:33:38,778-0400 DEBUG (periodic/1) [storage.TaskManager.Task] (Task='c0c07188-6c98-4821-ab02-e189e3135822') finished: {u'21e4752c-eabd-4b30-abe7-a5b9322adb1a': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'dela
y': '0.000672249', 'lastCheck': '2.0', 'valid': True}, u'a1c8a1f5-b7a1-4299-91b7-782b20c52d60': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000594578', 'lastCheck': '1.9', 'valid': True}, u'4e986561-0848-40b0-9
bb5-bf3c27d53d39': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.0292774', 'lastCheck': '2.0', 'valid': True}, u'eccfa045-b125-4cec-a0a5-e40988a24cd7': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000329248', 'lastCheck': '3.0', 'valid': True}, u'5efcc54c-e195-41f6-ab60-38ff0037141a': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000361319', 'lastCheck': '3.0', 'valid': True}, u'ee93477d-c1da-4979-9e20-51ae86c67e35': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000388562', 'lastCheck': '3.1', 'valid': True}, u'25f5a19e-0997-4139-8813-bd25ae578d4d': {'code': 0, 'actual': True, 'version': 4, 'acquired': True, 'delay': '0.000335127', 'lastCheck': '3.0', 'valid': True}} (task:1195)
2017-06-11 19:33:38,778-0400 DEBUG (periodic/1) [storage.TaskManager.Task] (Task='c0c07188-6c98-4821-ab02-e189e3135822') moving from state preparing -> state finished (task:599)
2017-06-11 19:33:38,778-0400 DEBUG (periodic/1) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:910)
2017-06-11 19:33:38,778-0400 DEBUG (periodic/1) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:947)
2017-06-11 19:33:38,778-0400 DEBUG (periodic/1) [storage.TaskManager.Task] (Task='c0c07188-6c98-4821-ab02-e189e3135822') ref 0 aborting False (task:997)

On server log I see 1min before this happened :
2017-06-11 19:32:43,469+03 INFO  [org.quartz.core.JobRunShell] (DefaultQuartzScheduler10) Job DEFAULT.org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher.poll#-9223372036854775799 threw a JobExecutionException: : org.quartz.JobExecutionException: failed to execute job
        at sun.reflect.GeneratedMethodAccessor188.invoke(Unknown Source) [:1.8.0_131]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_131]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_131]
        at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:83) [scheduler.jar:]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:55) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_131]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_131]


Jenkins timeline:
Engine log :
2017-06-11 19:33:03,874 - MainThread - vms - INFO - Using Correlation-Id: vms_create_bdaaa938-5587-40bf
2017-06-11 19:33:05,593 - MainThread - vms - INFO - New entity was added
2017-06-11 19:33:05,602 - MainThread - vms - INFO - Property 'vm->virtio_scsi->enabled' has correct value: true
2017-06-11 19:33:05,603 - MainThread - vms - INFO - Property 'vm->name' has correct value: vm_TestCase5062_REST_ISCSI_1119330305
2017-06-11 19:33:05,603 - MainThread - vms - INFO - Property 'vm->display->type_' has correct value: vnc
2017-06-11 19:33:05,604 - MainThread - vms - INFO - Property 'vm->cluster->id' has correct value: e95581eb-2155-4001-9522-0a84375ff047
2017-06-11 19:33:05,607 - MainThread - vms - INFO - Property 'vm->type_' has correct value: desktop
2017-06-11 19:33:05,609 - MainThread - vms - INFO - Property 'vm->template->id' has correct value: 03e56fce-8c43-48b0-817b-510cf3f23034
2017-06-11 19:33:05,610 - MainThread - vms - INFO - Property 'vm->os->type_' has correct value: rhel_7_ppc64
2017-06-11 19:33:36,040 - MainThread - vms - INFO - vm status is 'down'
2017-06-11 19:33:36,602 - MainThread - diskattachments - INFO - Using Correlation-Id: diskattachments_update_744ef35a-7e25-484f
2017-06-11 19:33:38,170 - MainThread - diskattachments - INFO - disk_attachment was updated
2017-06-11 19:33:38,171 - MainThread - diskattachments - INFO - Property 'disk_attachment->interface' has correct value: virtio
2017-06-11 19:33:38,171 - MainThread - diskattachments - INFO - Property 'disk_attachment->bootable' has correct value: true
2017-06-11 19:33:38,176 - MainThread - root - INFO - Get SPM host from the list of hosts host_mixed_1,host_mixed_2
2017-06-11 19:33:38,180 - MainThread - root - INFO - Checking Storage Pool Manager (SPM) status of the host host_mixed_1.
2017-06-11 19:33:38,184 - MainThread - root - INFO - Get host object by host_name host_mixed_1. with {'attribute': 'name'}
2017-06-11 19:33:38,260 - MainThread - art.ll_lib.hosts - INFO - check_host_spm_status - SPM Status of host host_mixed_1 is: none
2017-06-11 19:33:38,260 - MainThread - root - ERROR - Failed to checking storage pool manager (spm) status of the host host_mixed_1.
2017-06-11 19:33:38,265 - MainThread - root - INFO - Checking Storage Pool Manager (SPM) status of the host host_mixed_2.
2017-06-11 19:33:38,269 - MainThread - root - INFO - Get host object by host_name host_mixed_2. with {'attribute': 'name'}
2017-06-11 19:33:38,338 - MainThread - art.ll_lib.hosts - INFO - check_host_spm_status - SPM Status of host host_mixed_2 is: spm
2017-06-11 19:33:38,342 - MainThread - root - INFO - Get IP of a host host_mixed_2 with given name in RHEVM
2017-06-11 19:33:38,346 - MainThread - root - INFO - Get host object by host_name host_mixed_2. with {'attribute': 'name'}
2017-06-11 19:33:38,512 - MainThread - art.ll_lib.vms - INFO - Starting vm vm_TestCase5062_REST_ISCSI_1119330305
2017-06-11 19:33:38,610 - Thread-3461 - art.ll_lib.vms - INFO - start VM vm_TestCase5062_REST_ISCSI_1119330305
2017-06-11 19:33:38,611 - Thread-3461 - vms - INFO - Using Correlation-Id: vms_syncAction_16b4f525-e8ee-42b5
2017-06-11 19:33:39,042 - Thread-3461 - vms - INFO - Waiting for query `name=vm_TestCase5062_REST_ISCSI_1119330305 and status=up or name=vm_TestCase5062_REST_ISCSI_1119330305 and status=up` and event_id None up to 600 seconds,sampling every 10 second.
2017-06-11 19:34:19,625 - MainThread - art.ll_lib.disks - INFO - Adding disk disk_TestCase5062_REST_ISCSI_1119341950
2017-06-11 19:34:19,627 - MainThread - disks - INFO - Using Correlation-Id: disks_create_f539130b-aa6b-4668
2017-06-11 19:34:21,119 - MainThread - disks - INFO - New entity was added
2017-06-11 19:34:21,122 - MainThread - disks - INFO - Property 'disk->storage_domains->storage_domain->id' has correct value: 21e4752c-eabd-4b30-abe7-a5b9322adb1a
2017-06-11 19:34:21,123 - MainThread - disks - INFO - Attribute: import___ changed to: import
2017-06-11 19:34:21,123 - MainThread - disks - INFO - Property 'disk->wipe_after_delete' has correct value: false
2017-06-11 19:34:21,124 - MainThread - disks - INFO - Property 'disk->description' has correct value: 
2017-06-11 19:34:21,124 - MainThread - disks - INFO - Property 'disk->provisioned_size' has correct value: 1073741824
2017-06-11 19:34:21,124 - MainThread - disks - INFO - Property 'disk->format' has correct value: raw
2017-06-11 19:34:21,124 - MainThread - disks - INFO - Property 'disk->alias' has correct value: disk_TestCase5062_REST_ISCSI_1119341950
2017-06-11 19:34:21,125 - MainThread - disks - INFO - Property 'disk->shareable' has correct value: false
2017-06-11 19:34:21,125 - MainThread - disks - INFO - Property 'disk->sparse' has correct value: false
2017-06-11 19:34:21,126 - MainThread - art.ll_lib.disks - INFO - Waiting for status ok on disks ['disk_TestCase5062_REST_ISCSI_1119341950']
2017-06-11 19:34:41,686 - MainThread - art.ll_lib.disks - INFO - Attaching disk disk_TestCase5062_REST_ISCSI_1119341950 to vm vm_TestCase5062_REST_ISCSI_1119330305
2017-06-11 19:34:41,687 - MainThread - diskattachments - INFO - Using Correlation-Id: diskattachments_create_21d60de6-7b29-40b7
2017-06-11 19:34:43,674 - MainThread - diskattachments - INFO - New entity was added
2017-06-11 19:34:43,675 - MainThread - diskattachments - INFO - Property 'disk_attachment->interface' has correct value: virtio
2017-06-11 19:34:43,676 - MainThread - diskattachments - INFO - Property 'disk_attachment->disk->id' has correct value: 6f795e25-f68a-4c5b-870b-162da28c35b2
2017-06-11 19:34:43,676 - MainThread - diskattachments - INFO - Property 'disk_attachment->bootable' has correct value: false
2017-06-11 19:34:43,677 - MainThread - art.ll_lib.disks - INFO - Waiting for status ok on disks ['disk_TestCase5062_REST_ISCSI_1119341950']
2017-06-11 19:34:43,746 - MainThread - art.logging - INFO - ================================================================================2017-06-11 19:34:43,746 - MainThread - art.logging - INFO - Test Name: rhevmtests.storage.storage_virtual_disk_resize.test_virtual_disk_resize.TestCase5062_REST_ISCSI.test_preallocated_block_resize
2017-06-11 19:34:43,747 - MainThread - art.logging - INFO - Test Description: - VM with preallocated disk and OS
2017-06-11 19:34:43,747 - MainThread - art.logging - INFO - Test Description: - Resize the VM disk to 2G total
2017-06-11 19:34:43,747 - MainThread - art.logging - INFO - Test Description: - Send IOs to disk
2017-06-11 19:34:43,747 - MainThread - art.logging - INFO - Test Description: - Check LV size on VDSM and disk size on guest
2017-06-11 19:34:43,747 - MainThread - art.logging - INFO - Iteration number: 206
Comment 1 Allon Mureinik 2017-06-14 08:48:32 EDT
Can you try this with selinux disabled in order to confrim/refute the suspicion that that's the cause?
Comment 2 Avihai 2017-06-14 10:11:53 EDT
(In reply to Allon Mureinik from comment #1)
> Can you try this with selinux disabled in order to confrim/refute the
> suspicion that that's the cause?

I ran the same test on another stand with selinux enabled state as in this stand & the test passed without issues so it does not look like selinux (unless something specific occurs with SELinux in this stand).

The stand that I opened this bug on is not mine.
This stand is Nelly's TierX automation stand that run all the test cases of storage/network/infra on a Tier1/2/3 every couple of day's so I'm not sure if its still in the same state or already been reinstalled & if I can use it.
Comment 3 Avihai 2017-06-14 10:35:14 EDT
I got a hold of the same stand ,jenkins-vm-14.qa.lab.tlv.redhat.com, & ran the same test without disabling the selinux & the issue did not reproduce either .
Comment 4 Avihai 2017-06-14 10:42:12 EDT
(In reply to Avihai from comment #3)
> I got a hold of the same stand ,jenkins-vm-14.qa.lab.tlv.redhat.com, & ran
> the same test without disabling the selinux & the issue did not reproduce
> either .

Disregard last comment , I rechecked and although test passed the same issue did reproduce .

I'm now rerunning with SELinux permissive mode & I'll update soon .
Comment 5 Avihai 2017-06-14 11:32:12 EDT
(In reply to Avihai from comment #2)
> (In reply to Allon Mureinik from comment #1)
> > Can you try this with selinux disabled in order to confrim/refute the
> > suspicion that that's the cause?
> 
> I ran the same test on another stand with selinux enabled state as in this
> stand & the test passed without issues so it does not look like selinux
> (unless something specific occurs with SELinux in this stand).
> 
> The stand that I opened this bug on is not mine.
> This stand is Nelly's TierX automation stand that run all the test cases of
> storage/network/infra on a Tier1/2/3 every couple of day's so I'm not sure
> if its still in the same state or already been reinstalled & if I can use it.

I ran the same test with  SELinux in permissive mode and got the same issue so it does not look like an SElinux issue .
Comment 6 Michal Skrivanek 2017-06-15 02:07:06 EDT
- please attach libvirt in debug mode and qemu logs
- vdsm.log doesn't seem to be the right one since it doesn't contain anything about the VM in question. Please doublecheck the host and the time period
Comment 7 Avihai 2017-06-16 03:09:54 EDT
(In reply to Michal Skrivanek from comment #6)
> - please attach libvirt in debug mode and qemu logs


I added libvirtd.log + relevant qemu(vm_TestCase5062_REST_ISCSI_1119330305.log )  log but as libvird.log is very big (50M+ after zip ) I added it in my redhat google drive & shared it with you - you should receive a mail to michal.skrivanek@redhat.com
with the share info .

Please ping me if something is missing. 

> - vdsm.log doesn't seem to be the right one since it doesn't contain
> anything about the VM in question. Please double check the host and the time
> period

Rechecked & Relevant Host is  host_mixed_1 (hypervisor-ibm-p8-rhevm-03) , there is a time shift from engine time of -3H  so issue in the hosts are at 2017-06-11 16:33:41 not 19:33:41.


In its libvirt log you see an issue at the relevant time 2017-06-11 16:33:41:

2017-06-11 16:33:41.603+0000: 5883: info : virObjectRef:296 : OBJECT_REF: obj=0x3fff9c108c80
2017-06-11 16:33:41.603+0000: 5883: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x3fff9c108c80
2017-06-11 16:33:41.607+0000: 5883: info : virSecuritySELinuxRestoreFileLabel:1035 : Restoring SELinux context on '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-
a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515'
2017-06-11 16:33:41.607+0000: 5883: info : virSecuritySELinuxSetFileconHelper:894 : Setting SELinux context on '/dev/dm-64' to 'system_u:object_r:fixed_disk_device_t:s0'
2017-06-11 16:33:41.607+0000: 5883: info : virDBusCall:1548 : DBUS_METHOD_CALL: 'org.freedesktop.DBus.ListActivatableNames' on '/org/freedesktop/DBus' at 'org.freedesktop.DBus'
2017-06-11 16:33:41.607+0000: 5883: info : virDBusCall:1577 : DBUS_METHOD_REPLY: 'org.freedesktop.DBus.ListActivatableNames' on '/org/freedesktop/DBus' at 'org.freedesktop.DBus'
2017-06-11 16:33:41.607+0000: 5883: info : virDBusCall:1548 : DBUS_METHOD_CALL: 'org.freedesktop.DBus.ListNames' on '/org/freedesktop/DBus' at 'org.freedesktop.DBus'
2017-06-11 16:33:41.607+0000: 5883: info : virDBusCall:1577 : DBUS_METHOD_REPLY: 'org.freedesktop.DBus.ListNames' on '/org/freedesktop/DBus' at 'org.freedesktop.DBus'
2017-06-11 16:33:41.607+0000: 5832: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=11 events=1
2017-06-11 16:33:41.607+0000: 5883: info : virDBusCall:1548 : DBUS_METHOD_CALL: 'org.freedesktop.machine1.Manager.TerminateMachine' on '/org/freedesktop/machine1' at 'org.freedesktop.machine1'
2017-06-11 16:33:41.607+0000: 5832: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=14 timeout=-1
2017-06-11 16:33:41.607+0000: 5832: info : virEventPollDispatchHandles:506 : EVENT_POLL_DISPATCH_HANDLE: watch=11 events=1
2017-06-11 16:33:41.608+0000: 5883: info : virDBusCall:1558 : DBUS_METHOD_ERROR: 'org.freedesktop.machine1.Manager.TerminateMachine' on '/org/freedesktop/machine1' at 'org.freedesktop.machine1' error org.freedesktop.machine1.NoSuchMachine: No machine 'qemu-29-vmTestCase5062RESTISCSI1119330305' known
2017-06-11 16:33:41.608+0000: 5832: info : virEventPollRunOnce:640 : EVENT_POLL_RUN: nhandles=14 timeout=-1


I found the relevant qemu error at vm_TestCase5062_REST_ISCSI_1119330305.log  :

2017-06-11 16:33:41.034+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-05-04-06:49:32, ppc-027.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.10), hostname: ibm-p8-rhevm-03.rhts.eng.bos.redhat.com
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=vm_TestCase5062_REST_ISCSI_1119330305,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-29-vm_TestCase5062_REST/master-key.aes -machine pseries-rhel7.3.0,accel=kvm,usb=off -cpu POWER8 -m size=1048576k,slots=16,maxmem=4194304k -realtime mlock=off -smp 1,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0,mem=1024 -uuid 4057c03f-3a04-498c-8a33-53608327b085 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-29-vm_TestCase5062_REST/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2017-06-11T16:33:40,driftfix=slew -no-shutdown -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device spapr-vscsi,id=scsi0,reg=0x2000 -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive if=none,id=drive-scsi0-0-0-0,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515,format=qcow2,if=none,id=drive-virtio-disk1,serial=87a080bc-60af-405b-aa2a-a6cc722bcc27,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:16:01:75,bus=pci.0,addr=0x1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/4057c03f-3a04-498c-8a33-53608327b085.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/4057c03f-3a04-498c-8a33-53608327b085.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device usb-kbd,id=input1,bus=usb.0,port=2 -device usb-mouse,id=input2,bus=usb.0,port=3 -vnc 10.16.69.249:0,password -k en-us -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x8 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -msg timestamp=on
2017-06-11 16:33:41.039+0000: 54913: info : libvirt version: 2.0.0, package: 10.el7_3.9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-05-04-06:49:32, ppc-027.build.eng.bos.redhat.com)
2017-06-11 16:33:41.039+0000: 54913: info : hostname: ibm-p8-rhevm-03.rhts.eng.bos.redhat.com
2017-06-11 16:33:41.039+0000: 54913: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x3fff9c1b3190
2017-06-11T16:33:41.103745Z qemu-kvm: -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515,format=qcow2,if=none,id=drive-virtio-disk1,serial=87a080bc-60af-405b-aa2a-a6cc722bcc27,cache=none,werror=stop,rerror=stop,aio=native: Could not open '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/87a080bc-60af-405b-aa2a-a6cc722bcc27/ddee92d7-eb1c-4d16-ba23-c8e2b984d515': Permission denied
2017-06-11 16:33:41.339+0000: shutting down



VDSM from the this host attached but also here nothing special seen in the relevant time frame (2017-06-11 16:33:41) - attached log:
2017-06-11 16:33:39,986-0400 DEBUG (jsonrpc/4) [storage.TaskManager.Task] (Task='02c810c1-4074-40bc-8022-bc2ab47f0608') moving from state preparing -> state finished (task:599)
2017-06-11 16:33:39,986-0400 DEBUG (jsonrpc/4) [storage.ResourceManager.Owner] Owner.releaseAll requests {} resources {} (resourceManager:910)
2017-06-11 16:33:39,987-0400 DEBUG (jsonrpc/4) [storage.ResourceManager.Owner] Owner.cancelAll requests {} (resourceManager:947)
2017-06-11 16:33:39,987-0400 DEBUG (jsonrpc/4) [storage.TaskManager.Task] (Task='02c810c1-4074-40bc-8022-bc2ab47f0608') ref 0 aborting False (task:997)
2017-06-11 16:33:39,992-0400 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call Host.getStats succeeded in 0.01 seconds (__init__:539)
2017-06-11 16:33:44,078-0400 DEBUG (periodic/0) [storage.TaskManager.Task] (Task='9ffdfdcf-ce3e-42f0-bdfa-b983b6ec072d') moving from state init -> state preparing (task:599)
2017-06-11 16:33:44,078-0400 INFO  (periodic/0) [vdsm.api] START repoStats(options=None) from=internal (api:46)
Comment 8 Avihai 2017-06-16 03:10 EDT
Created attachment 1288270 [details]
relevant_host_vdsm log
Comment 9 Michal Skrivanek 2017-06-16 09:11:19 EDT
(In reply to Avihai from comment #8)
> Created attachment 1288270 [details]
> relevant_host_vdsm log

this is still not the right vdsm.log s there is no mentioning of that VM. Please identify the right log file

do you have a reproducible environment I can log in to?
Comment 10 Avihai 2017-06-17 09:49:13 EDT
(In reply to Michal Skrivanek from comment #9)
> (In reply to Avihai from comment #8)
> > Created attachment 1288270 [details]
> > relevant_host_vdsm log
> 
> this is still not the right vdsm.log s there is no mentioning of that VM.
> Please identify the right log file

I have 2 hosts only in this enviroment so this should be it , anyway I'll reattach both vdsm logs again.

When I looked on both of them I did not see anything unusual , No warn/Error .. what are you looking for ?

Host1:ibm-p8-rhevm-03.rhts.eng.bos.redhat.com
Host2:ibm-p8-rhevm-04.rhts.eng.bos.redhat.com
Engine:jenkins-vm-14.qa.lab.tlv.redhat.com


> do you have a reproducible environment I can log in to?
Its not mine & can change in any minute as it runs TierX automation run for all the QA teams, but I checked & for now (Jun17 16:45) its still in the same state as before.

I did reproduce already as requested by Allon M. see comment 5.

If by the time you login the env is taken or changed & you need additional logs you can connect to jenkins following link & get it :
https://rhev-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.1v4-ge-runner-tier1/167/artifact/ansible-playbooks/playbooks/ovirt-collect-logs/logs/
Comment 11 Avihai 2017-06-17 09:50 EDT
Created attachment 1288558 [details]
ibm-p8-rhevm-03_vdsm_log
Comment 12 Avihai 2017-06-17 09:51 EDT
Created attachment 1288559 [details]
ibm-p8-rhevm-04_vdsm_log
Comment 13 Avihai 2017-06-17 10:29:23 EDT
I rerun the same test on the same environment(described in the previous comment) & issue reproduced again.

New logs attached.

Time shift is worse than I first thought, Engine is 7H ahead of hosts! 

[root@jenkins-vm-14 ~]# date
Sat Jun 17 17:14:47 IDT 2017

[root@ibm-p8-rhevm-04 ~]# date
Sat Jun 17 10:14:47 EDT 2017

[root@ibm-p8-rhevm-03 ~]# date
Sat Jun 17 10:14:47 EDT 2017

So issue occured now at :

engine time= Jun 17, 2017 5:05:39 
Hosts time  = Jun 17, 2017 10:05:39 

Event:
Jun 17, 2017 5:05:39 PM VM vm_TestCase5062_REST_ISCSI_1717050783 is down with error. Exit message: internal error: process exited while connecting to monitor: 2017-06-17T14:05:36.290977Z qemu-kvm: -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-470f-9e42-cf46ff03de64/007d4b65-96b6-40d6-81ca-d2b785b3c3a9,format=qcow2,if=none,id=drive-virtio-disk1,serial=1f4d5f83-8d8f-470f-9e42-cf46ff03de64,cache=none,werror=stop,rerror=stop,aio=native: Could not open '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-470f-9e42-cf46ff03de64/007d4b65-96b6-40d6-81ca-d2b785b3c3a9': Permission denied. 

VDSM_H1.log:
2017-06-17 10:05:36,817-0400 ERROR (vm/d96972e1) [virt.vm] (vmId='d96972e1-89e2-40c4-a0e7-729f4aadfd3c') The vm start process failed (vm:631)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 562, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2021, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3777, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error: process exited while connecting to monitor: 2017-06-17T14:05:36.290977Z qemu-kvm: -drive file=/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-4
70f-9e42-cf46ff03de64/007d4b65-96b6-40d6-81ca-d2b785b3c3a9,format=qcow2,if=none,id=drive-virtio-disk1,serial=1f4d5f83-8d8f-470f-9e42-cf46ff03de64,cache=none,werror=stop,rerror=stop,aio=native: Could not open '/rhev/data-center/c546b026-46
32-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-470f-9e42-cf46ff03de64/007d4b65-96b6-40d6-81ca-d2b785b3c3a9': Permission denied
2017-06-17 10:05:36,819-0400 INFO  (vm/d96972e1) [virt.vm] (vmId='d96972e1-89e2-40c4-a0e7-729f4aadfd3c') Changed state to Down: internal error: process exited while connecting to monitor: 2017-06-17T14:05:36.290977Z qemu-kvm: -drive file=
/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-470f-9e42-cf46ff03de64/007d4b65-96b6-40d6-81ca-d2b785b3c3a9,format=qcow2,if=none,id=drive-virtio-disk1,serial=1f4d5f83-8d8f-4
70f-9e42-cf46ff03de64,cache=none,werror=stop,rerror=stop,aio=native: Could not open '/rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/1f4d5f83-8d8f-470f-9e42-cf46ff03de64/007d4b65-96b6-40d
6-81ca-d2b785b3c3a9': Permission denied (code=1) (vm:1221)
2017-06-17 10:05:36,819-0400 INFO  (vm/d96972e1) [virt.vm] (vmId='d96972e1-89e2-40c4-a0e7-729f4aadfd3c') Stopping connection (guestagent:430)
Comment 14 Avihai 2017-06-17 10:30 EDT
Created attachment 1288561 [details]
New_reproduction_17_2017_logs
Comment 15 Michal Skrivanek 2017-06-20 04:23:37 EDT
Avihai, not sure what you've been doing in comment #5 but I reproduced it in your environment as a SELinux labeling issue. Setting host to Permissive resolved the problem and I can see the label not being set correctly e.g. "lrwxrwxrwx. vdsm kvm system_u:object_r:unlabeled_t:s0 61b0684c-2ba4-48ea-ad5a-5ca0d7c053b0 -> /dev/21e4752c-eabd-4b30-abe7-a5b9322adb1a/61b0684c-2ba4-48ea-ad5a-5ca0d7c053b0"

on host -04 it works, labeled as:
"lrwxrwxrwx. vdsm kvm system_u:object_r:mnt_t:s0       /rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-abe7-a5b9322adb1a/images/78d83f92-0da5-410c-847e-b43b497373a5/61b0684c-2ba4-48ea-ad5a-5ca0d7c053b0 -> /dev/21e4752c-eabd-4b30-abe7-a5b9322adb1a/61b0684c-2ba4-48ea-ad5a-5ca0d7c053b0"

I wasn't able to find any configuration difference, but apparently there is some as it works on one host and doesn't on the other
Allon/Nir?
Comment 16 Allon Mureinik 2017-06-20 04:50:42 EDT
(In reply to Michal Skrivanek from comment #15)
> Avihai, not sure what you've been doing in comment #5 but I reproduced it in
> your environment as a SELinux labeling issue. Setting host to Permissive
> resolved the problem and I can see the label not being set correctly e.g.
> "lrwxrwxrwx. vdsm kvm system_u:object_r:unlabeled_t:s0
> 61b0684c-2ba4-48ea-ad5a-5ca0d7c053b0 ->
> /dev/21e4752c-eabd-4b30-abe7-a5b9322adb1a/61b0684c-2ba4-48ea-ad5a-
> 5ca0d7c053b0"
> 
> on host -04 it works, labeled as:
> "lrwxrwxrwx. vdsm kvm system_u:object_r:mnt_t:s0      
> /rhev/data-center/c546b026-4632-4450-8e9b-17dc879b62a4/21e4752c-eabd-4b30-
> abe7-a5b9322adb1a/images/78d83f92-0da5-410c-847e-b43b497373a5/61b0684c-2ba4-
> 48ea-ad5a-5ca0d7c053b0 ->
> /dev/21e4752c-eabd-4b30-abe7-a5b9322adb1a/61b0684c-2ba4-48ea-ad5a-
> 5ca0d7c053b0"
> 
> I wasn't able to find any configuration difference, but apparently there is
> some as it works on one host and doesn't on the other
> Allon/Nir?

Both have the same selinux policy rpms?
Comment 17 Nir Soffer 2017-06-20 05:59:10 EDT
Please attach output of 
rpm -qa 
for both hosts.

If they are not using the same and latest packages please update them and check again if this issue is reproducible.
Comment 18 Michal Skrivanek 2017-06-20 06:06 EDT
Created attachment 1289556 [details]
rpm -qa | sort
Comment 19 Michal Skrivanek 2017-06-20 06:06 EDT
Created attachment 1289557 [details]
rpm -qa | sort
Comment 20 Michal Skrivanek 2017-06-20 06:09:01 EDT
seems close enough. comment #18 is the working host -04 (working), comment #19 is the -03 (not working)
Comment 22 Michal Skrivanek 2017-06-21 04:00:12 EDT
possibly a same issue as in bug 1461811
Comment 23 Tomas Jelinek 2017-07-10 08:23:23 EDT
@Avihai - could you please check if it reproduces 4.1.4? As per comment 22 should be fixed.
Comment 24 Avihai 2017-07-11 01:33:56 EDT
@Tomas , I checked the same test on PP stand with 4.1.4 & issue does not reproduce.

Note You need to log in before you can comment on or make changes to this bug.