Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1162539

Summary: libvirtd cannot be started using upstart if started manually using sysvinit
Product: Red Hat Enterprise Virtualization Manager Reporter: Israel Pinto <ipinto>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED WONTFIX QA Contact: Israel Pinto <ipinto>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: alonbl, bazulay, danken, dougsland, ecohen, fromani, gklein, ipinto, istein, lpeer, lsurette, mavital, mtayer, ofrenkel, oourfali, rbalakri, Rhev-m-bugs, ybronhei, yeylon
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-05-07 07:13:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Host logs (vdsm and libvirt)
none
engine logs
none
certificate check
none
hosts deplay logs
none
engine screenshot
none
host1 (mig src) deploy log
none
host2 (mig dst) deploy log
none
starting libvirt twice none

Description Israel Pinto 2014-11-11 09:50:54 UTC
Created attachment 956197 [details]
Host logs (vdsm and libvirt)

Description of problem:
Migration failed due to peer certificate issue,
The Migration is trigered since the host is set to maintenance.

Version-Release number of selected component (if applicable):
RHEL: 6.6
RHEVM: 3.5.0-0.19.beta.el6ev

Steps to Reproduce:
1. Setup:
   2 hosts on one cluster.
   6 vms running on one of the hosts.
2. Set the host with the VM's to maintenance.
3. Check VM's and Hosts status.

Actual results:
1. Host is was switched to Maintenance mode with warring sing.
2. All VM's failed to do migrate to second host.

Expected results:
1. All VM's migrate to second host.
2. Host switched to Maintenance mode successfully

Additional info:
1. From libvirt log:
2014-11-11 09:23:23.015+0000: 9122: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer.
2014-11-11 09:23:23.015+0000: 9122: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate
2014-11-11 09:23:23.015+0000: 9122: error : doPeer2PeerMigrate:2768 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.4.91/system
2. From vdsm log:
Thread-179218::DEBUG::2014-11-11 09:57:31,308::migration::154::vm.Vm::(_prepareGuest) vmId=`5be3b022-44c7-45a5-a579-7caf105283d0`::Migration started
JsonRpc (StompReactor)::DEBUG::2014-11-11 09:57:31,350::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling message <StompFrame command='SEND'>
JsonRpcServer::DEBUG::2014-11-11 09:57:31,358::__init__::504::jsonrpc.JsonRpcServer::(serve_requests) Waiting for request
Thread-179221::DEBUG::2014-11-11 09:57:31,358::__init__::467::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.migrate' in bridge with {u'params': {u'tunneled': u'false', u'dstqemu': u'10.35.4.91', u'src': u'10.35.4.161', u'dst': u'10.35.4.91:54321', u'vmId': u'208f82f4-6e3c-4496-aaf7-5ca5b8fc5e97', u'abortOnError': u'true', u'method': u'online'}, u'vmID': u'208f82f4-6e3c-4496-aaf7-5ca5b8fc5e97'}
Thread-179221::DEBUG::2014-11-11 09:57:31,366::API::511::vds::(migrate) {u'tunneled': u'false', u'dstqemu': u'10.35.4.91', u'src': u'10.35.4.161', u'dst': u'10.35.4.91:54321', u'vmId': u'208f82f4-6e3c-4496-aaf7-5ca5b8fc5e97', u'abortOnError': u'true', u'method': u'online'}
Thread-179214::DEBUG::2014-11-11 09:57:31,371::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 9 edom: 10 level: 2 message: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.4.91/system
Thread-179221::DEBUG::2014-11-11 09:57:31,372::__init__::498::jsonrpc.JsonRpcServer::(_serveRequest) Return 'VM.migrate' in bridge with True

See also  bug#: https://bugzilla.redhat.com/show_bug.cgi?id=1151953 
Maybe there is relationship.

Comment 1 Israel Pinto 2014-11-11 09:52:12 UTC
Created attachment 956198 [details]
engine logs

Comment 2 Omer Frenkel 2014-11-11 11:34:00 UTC
Alon, can you please take a look on this error? could it be a deploy issue?

2014-11-11 09:23:23.015+0000: 9122: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer.
2014-11-11 09:23:23.015+0000: 9122: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate
2014-11-11 09:23:23.015+0000: 9122: error : doPeer2PeerMigrate:2768 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.4.91/system

Comment 3 Alon Bar-Lev 2014-11-11 11:48:27 UTC
host:/etc/pki/CA/cacert.pem
host:/etc/pki/vdsm/certs/vdsmcert.pem
engine:/etc/pki/ovirt-engine/ca.pem

you can use:
$ openssl x509 -in ${FILE}

the result should be same content.

Comment 4 Israel Pinto 2014-11-12 08:18:29 UTC
Hi
Did the certificate check on both host and engine see out put below, i see that in the cacert.pem there is problem in both hosts.

host 10.35.4.161:
[root@dhcp-4-161 pki]# openssl x509 -in cacert.pam
Error opening Certificate cacert.pam
139935435089736:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('cacert.pam','r')
139935435089736:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:

[root@dhcp-4-161 pki]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem
-----BEGIN CERTIFICATE-----
MIIEojCCA4qgAwIBAgICEAkwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMzI3
MTArMDAwMBcNMTkxMTA4MTMyNzExWjA3MR8wHQYDVQQKExZzY2wubGFiLnRsdi5y
ZWRoYXQuY29tMRQwEgYDVQQDEwsxMC4zNS40LjE2MTCCASIwDQYJKoZIhvcNAQEB
BQADggEPADCCAQoCggEBAMqJ0iEZHsAyyAOA6jSNQQum+c+nUTwmJn9zoRCNhHAz
E3E5qHq/J0GPjTFqQ25hQyui1cNxtVi+bsz+PNXr0v/Av/DSxTzGO30j7UxIwkcd
44hrADzJ56y5BA3kP2XEhTjlMcCxgCrBKkPzDxzLh55rEvT2h7UnkB/fx1rmUuAt
ueNou2h+VVqFwkyGqU3mufyANUK55VxakaSTn8s1IROhHjl8s2FL3kJkrIOhAzLs
6zKFcIxrmT7w0zg6KMElVi3RQchuNb77xQuUlCNEgGmxC/AIAuWBeivWFhmEYi12
B+gYiaV7z+yCbO8z6sMnANp5RJfHAI8mpVALvj5BQj8CAwEAAaOCAYkwggGFMB0G
A1UdDgQWBBSHcqKOQsnUiX5RAVF1oDNNcey/hjCBmAYIKwYBBQUHAQEEgYswgYgw
gYUGCCsGAQUFBzAChnlodHRwOi8vZGhjcDE2My00My5zY2wubGFiLnRsdi5yZWRo
YXQuY29tOjgwL292aXJ0LWVuZ2luZS9zZXJ2aWNlcy9wa2ktcmVzb3VyY2U/cmVz
b3VyY2U9Y2EtY2VydGlmaWNhdGUmZm9ybWF0PVg1MDktUEVNLUNBMIGLBgNVHSME
gYMwgYCAFLNH1HjnPDKcD9IdiDiIVftUPbf8oWSkYjBgMQswCQYDVQQGEwJVUzEf
MB0GA1UEChMWc2NsLmxhYi50bHYucmVkaGF0LmNvbTEwMC4GA1UEAxMnZGhjcDE2
My00My5zY2wubGFiLnRsdi5yZWRoYXQuY29tLjI0OTEyggIQADAJBgNVHRMEAjAA
MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAUBggrBgEFBQcDAQYIKwYBBQUH
AwIwDQYJKoZIhvcNAQEFBQADggEBAJ73ghRYPEpSIJ8sZ9cNAg0YamPJ8JUPzF2h
hRLLMzzpEFC1W5qm5cxvzWQ0K/30LIkkfqAevyLAQQg7XqQ6ad0k99QNFq8Qs/oY
VwJRZhibKPIC7++B8R0quvhZFO8mMcnvFiU8r07Yq0J39gFsctcG8wJ2T7gY4f8B
8WyHKmNV4wqSOUW2e9V/8X4uT5xuPMa5D60LKp1ABwKjcDmuDxRFBA+XKFiXhNe7
E+RpT8UIArVS9fgNle73SOT4bOqgOUmBkkK6IfyyVH0nVyuWvqFyhDKhEb1qubzX
zOGiZL/AwTqA2ZGuUtYeGeKYpx+FwbgkEm6cVbYN/lv6U0N0+44=
-----END CERTIFICATE-----

host 10.35.4.137:
[root@dhcp-4-137 pki]# openssl x509 -in cacert.pam
Error opening Certificate cacert.pam
140472460683080:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('cacert.pam','r')
140472460683080:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
unable to load certificate

[root@dhcp-4-137 pki]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem
-----BEGIN CERTIFICATE-----
MIIEoTCCA4mgAwIBAgICEAgwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMzEy
MDErMDAwMBcNMTkxMTA4MTMxMjAxWjA2MR8wHQYDVQQKExZzY2wubGFiLnRsdi5y
ZWRoYXQuY29tMRMwEQYDVQQDEwoxMC4zNS40LjkxMIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEA1dx9gDdJv68ZDaYPgjOVlGQXOadXx2lg356L2RYjgfKJ
PANfXt3nimevKPBnOO4kQB1nd3UW2JC51JsCt/7uulTSaTKqfgB5SqKsuvJefo5p
Jr/8Zk/N0NShdz+d2+BlyTEXO/ZzRiVh86xaHbjsFYWSKSsXaQA2p+EzXyXBzZcM
0KOo6aY1ZavALDq3LU/eAbc+i8pvU/PMbPZSOxPrB5e9efDjoMD+MuU3QqESE0vB
CmO7Udz+CwVXSSaASoYzXDYMrCpRKWKgaEr4KxyxjZOCob5WMRX2boybyGqxc4sW
+iZqmCS+4WdoJeRzYWqqN8UMyTeqYjOky/VwRay1yQIDAQABo4IBiTCCAYUwHQYD
VR0OBBYEFIt/znwkvyHka/9IhSGJyi8XjuI4MIGYBggrBgEFBQcBAQSBizCBiDCB
hQYIKwYBBQUHMAKGeWh0dHA6Ly9kaGNwMTYzLTQzLnNjbC5sYWIudGx2LnJlZGhh
dC5jb206ODAvb3ZpcnQtZW5naW5lL3NlcnZpY2VzL3BraS1yZXNvdXJjZT9yZXNv
dXJjZT1jYS1jZXJ0aWZpY2F0ZSZmb3JtYXQ9WDUwOS1QRU0tQ0EwgYsGA1UdIwSB
gzCBgIAUs0fUeOc8MpwP0h2IOIhV+1Q9t/yhZKRiMGAxCzAJBgNVBAYTAlVTMR8w
HQYDVQQKExZzY2wubGFiLnRsdi5yZWRoYXQuY29tMTAwLgYDVQQDEydkaGNwMTYz
LTQzLnNjbC5sYWIudGx2LnJlZGhhdC5jb20uMjQ5MTKCAhAAMAkGA1UdEwQCMAAw
DgYDVR0PAQH/BAQDAgWgMCAGA1UdJQEB/wQWMBQGCCsGAQUFBwMBBggrBgEFBQcD
AjANBgkqhkiG9w0BAQUFAAOCAQEAj5ZYCkPdko3BygcCxGCObq/x/vDD5yD6WKVs
M2jTA91eYESadN4PPIzhv3clxW9O/F2NX5xyKkXOsdHxdTPQKUaQgCTRsW4+I1QZ
k2a+yMYkVUkpCkZSFuXsR43vUSRIZQQVEPJW76u0JgAjRdWf2M3iZGz9ijT2rjmc
cO2xotbQHyv+ORBBHAm7iOp5RVU3JNE6/lJFrkMAqy4dRsgSOIKt2Wc7J19ZnmoB
e8uZq+Vxh/tw6mGR8FCGnPK92+llwHwivVxc0Ghk/I/xwOHkpWNP7vxCunUvgWJH
rX8JU0TX+Unlzxs0YQU6pC76js83FKAXvT1h2HS5B6vJ1ju0tQ==
-----END CERTIFICATE-----

Engine 10.35.163.43:
[root@dhcp163-43 ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem
-----BEGIN CERTIFICATE-----
MIIEEjCCAvqgAwIBAgICEAAwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMjA2
NDIrMDAwMBcNMjQxMTA2MTIwNjQyWjBgMQswCQYDVQQGEwJVUzEfMB0GA1UEChMW
c2NsLmxhYi50bHYucmVkaGF0LmNvbTEwMC4GA1UEAxMnZGhjcDE2My00My5zY2wu
bGFiLnRsdi5yZWRoYXQuY29tLjI0OTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A
MIIBCgKCAQEAqJLDSWdje0utJGBCi9dJ5XoqLpFJtmop4JFU6F12zZahs3cbX0MH
Eb5JJTfjiKUpSgmO+WHvroM8XLncRZID648HnhswD20pvSfkpAi6hsMC3WCOj4kq
0kDNcuOaP7WxFoG5UnuB70dA6FThGHC/NTVuP3QXU7rx23HqOHhnyDi+1bMyw1XN
GQYoolxSM2lzPTmuYf9v104yueXtqZOr91+ad01If4061g0RjTH4Fu9CQs0v2nYE
1kFqY1rwY/kqsX9sRy+zabvqB17UlBv79ndQWxGSeeVQXASwU1SeppJW7YhU5mDJ
TEzhKy1j9tDVFCPQMJsRBzvTC/quIUwwFwIDAQABo4HRMIHOMB0GA1UdDgQWBBSz
R9R45zwynA/SHYg4iFX7VD23/DCBiwYDVR0jBIGDMIGAgBSzR9R45zwynA/SHYg4
iFX7VD23/KFkpGIwYDELMAkGA1UEBhMCVVMxHzAdBgNVBAoTFnNjbC5sYWIudGx2
LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3AxNjMtNDMuc2NsLmxhYi50bHYucmVk
aGF0LmNvbS4yNDkxMoICEAAwDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8EBAMC
AQYwDQYJKoZIhvcNAQEFBQADggEBAHQ6Kfv4iY8DbYkwPh6XuXKpK3C4bxI1Py8D
G9xr7iK0cgXdU3prLPbEp3AXcuUfocXIY0Z9Nki4Hr3ueSbWWVmN+5u5jQLHBT1v
smLbrR+rvBIr2Pv6EFxB/n0QYjzB4fBynAFsoS6lNiFj6nXCU2aS0RphCvmImE3g
yEYr+pDVWPmL9RT7rRhliPASlKNgYc4BQNRQOI0YeWhYy4uTrR6HsIn9Dz37uPVB
dPv83i0EnszPOXUsi/KrGAYYwkBnAJbeiZkqyq6y/TBY78syLY6etppF0kRlIBuT
Og69dV7CjCk7EyefDpscnk9lCVE2z/SUkP2WHQK3WIhPcATyZRw=
-----END CERTIFICATE-----

Comment 5 Alon Bar-Lev 2014-11-12 08:25:05 UTC
No need to paste certificates, just to compare to engine.

My mistake... it should be /etc/pki/CA/cacert.pem for libvirt.

Comment 6 Israel Pinto 2014-11-12 08:33:56 UTC
Created attachment 956619 [details]
certificate check

Comment 7 Alon Bar-Lev 2014-11-12 08:39:19 UTC
as you can see there are different ca certificate spread all over, this is probably due to invalid deployments.
please re-deploy all hosts, or at least these that have CA that differ from the engine.

Comment 8 Israel Pinto 2014-11-12 12:04:13 UTC
Created attachment 956700 [details]
hosts deplay logs

Comment 9 Alon Bar-Lev 2014-11-12 12:51:11 UTC
not sure why you communicate in multiple channels, already wrote to you:

1. certificates are ok now, and host-deploy do not change them.

2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/vdsm/certs/cacert.pem''
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/vdsm/certs/cacert.pem' exists
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/vdsm/certs/cacert.pem' already has content
2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/vdsm/libvirt-spice/ca-cert.pem''
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/vdsm/libvirt-spice/ca-cert.pem' exists
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/vdsm/libvirt-spice/ca-cert.pem' already has content
2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/libvirt/../CA/cacert.pem''
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/libvirt/../CA/cacert.pem' exists
2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/libvirt/../CA/cacert.pem' already has content

2. I also checked this manually using ssh, you can do the same.

3. libvirtd is being restarted, also vdsm.

please confirm the above.
if a file suddenly changed post deploy, please find root cause.

Comment 10 Israel Pinto 2014-11-12 15:35:13 UTC
Check certificate again after deploy.
copmare hosts files to ca on engine.
Host files:
openssl x509 -n /etc/pki/CA/cacert.pem /etc/pki/vdsm/certs/cacert.pem
openssl x509 -n /etc/pki/vdsm/libvirt-spice/ca-cert.pem
Engine file:
openssl x509 -n /etc/pki/ovirt-engine/ca.pem

The certificates are the same, Alon confirm that the deplay is OK.
Suggesting check with Virt the following:
1. libvirtd configration
2. vdsm usage of libvirt client

Comment 11 Omer Frenkel 2014-11-12 15:47:02 UTC
Francesco can you take a look?

Comment 12 Francesco Romani 2014-11-12 16:18:23 UTC
I checked the affected hosts.
The only strange thing I found is this when (re)starting libvirt:

2014-11-12 16:09:15.330+0000: 22929: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.330+0000: 22929: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.330+0000: 22929: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.330+0000: 22929: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.331+0000: 22929: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.331+0000: 22929: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.387+0000: 22968: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.387+0000: 22968: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.387+0000: 22968: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.387+0000: 22968: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.387+0000: 22968: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.387+0000: 22968: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.420+0000: 22973: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.420+0000: 22973: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.420+0000: 22973: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.420+0000: 22973: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.420+0000: 22973: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.420+0000: 22973: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.454+0000: 22982: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.454+0000: 22982: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.454+0000: 22982: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.454+0000: 22982: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.454+0000: 22982: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.454+0000: 22982: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.499+0000: 22988: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.499+0000: 22988: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.499+0000: 22988: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.499+0000: 22988: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.499+0000: 22988: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.499+0000: 22988: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.533+0000: 22993: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.533+0000: 22993: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.533+0000: 22993: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.533+0000: 22993: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.533+0000: 22993: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.533+0000: 22993: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.567+0000: 22999: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.567+0000: 22999: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.567+0000: 22999: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.567+0000: 22999: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.567+0000: 22999: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.567+0000: 22999: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.600+0000: 23004: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.600+0000: 23004: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.600+0000: 23004: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.600+0000: 23004: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.600+0000: 23004: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable
2014-11-12 16:09:15.600+0000: 23004: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-11-12 16:09:15.634+0000: 23009: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com)
2014-11-12 16:09:15.634+0000: 23009: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid'
2014-11-12 16:09:15.634+0000: 23009: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro'
2014-11-12 16:09:15.634+0000: 23009: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists
2014-11-12 16:09:15.634+0000: 23009: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable

however eventually libvirt (re)started correctly on both hosts.
I also restarted vdsm on both hosts.
After that, VMs migrated back and forth succesfully, and also they migrated succesfully after I put the host where they were running into maintenance mode.
See attachment for proof.

Comment 13 Francesco Romani 2014-11-12 16:19:45 UTC
Created attachment 956795 [details]
engine screenshot

Comment 14 Barak 2014-11-16 10:22:46 UTC
So guys - is there a bug here ? or was it just bad deployment ?

Comment 15 Alon Bar-Lev 2014-11-16 10:52:13 UTC
(In reply to Barak from comment #14)
> So guys - is there a bug here ? or was it just bad deployment ?

I have no idea, deployment is ok, not sure about the other.

from logs it is obvious that libvirtd is restarted and certificates are ok.

we need predictable reproduction from clean setup (CLEAN: aka fresh installation of all components, including operating system).

Comment 16 Barak 2014-11-16 18:40:14 UTC
Israel - please provide a reproduction scenario.

Comment 17 Francesco Romani 2014-11-17 08:21:31 UTC
(In reply to Alon Bar-Lev from comment #15)
> (In reply to Barak from comment #14)
> > So guys - is there a bug here ? or was it just bad deployment ?
> 
> I have no idea, deployment is ok, not sure about the other.
> 
> from logs it is obvious that libvirtd is restarted and certificates are ok.
> 
> we need predictable reproduction from clean setup (CLEAN: aka fresh
> installation of all components, including operating system).

+1

I manually restarted (using service vdsmd restart / service libvirtd restart) the services once again before to test it and after that everything worked like a charm.

From a virt perspective everything seems right, and everything works.
Only possible issues seems a missed restart after deploy, but i Alon says the deploy is correct, then I don't know.

To go ahead, we need a reproducer as Alon suggested.

Comment 18 Francesco Romani 2014-11-17 08:22:01 UTC
Restoring NEEDINFO as per https://bugzilla.redhat.com/show_bug.cgi?id=1162539#c16

Comment 19 Oved Ourfali 2014-11-17 13:43:00 UTC
(In reply to Francesco Romani from comment #18)
> Restoring NEEDINFO as per
> https://bugzilla.redhat.com/show_bug.cgi?id=1162539#c16

Reducing severity, as it seems like it is hard to reproduce.

Comment 20 Ilanit Stein 2014-11-17 15:59:08 UTC
I reproduced the problem on vt10.1.

1. I had engine with vt9,
2. I run engine-cleanup, and yum remove rhevm-*
3. I installed, vt10.1 (clean install).
4. 2 hosts were installed (hosts are not after fresh installed of OS, but were installed before on the same setup, when it was engine vt9)
5. 5 vms created & run (installed with OS from pxe)
6. 4 vms run on host2, and 1 vm run on host1
7. I tried from UI to migrate 1 vm, from host1, to host2 => Migration failed.
Migration failed.

- Certificate on host1, host2, engine (read as in comment #10) are the same.

src libvirt.log:
----------------
014-11-17 13:18:06.747+0000: 2121: error : virNetTLSContextValidCertificate:1031 : Certificate failed validation: The certificate hasn't got a known issuer.
2014-11-17 13:18:06.747+0000: 2121: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer.
2014-11-17 13:18:06.747+0000: 2121: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate
2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 24
2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 28
2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 27
2014-11-17 13:18:06.747+0000: 2121: debug : do_open:1180 : driver 2 remote returned ERROR
2014-11-17 13:18:06.747+0000: 2121: error : doPeer2PeerMigrate:2768 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.117.22/system

src vdsm.log:
------------
Thread-2439::ERROR::2014-11-17 15:18:06,874::migration::259::vm.Vm::(run) vmId=`321985ed-796f-48d0-beac-51e35a84d536`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 245, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/virt/vm.py", line 668, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1210, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.117.22/system

Attached deploy logs.

Comment 21 Ilanit Stein 2014-11-17 16:02:06 UTC
Created attachment 958281 [details]
host1 (mig src) deploy log

Comment 22 Ilanit Stein 2014-11-17 16:03:57 UTC
Created attachment 958283 [details]
host2 (mig dst) deploy log

Comment 23 Alon Bar-Lev 2014-11-17 16:04:23 UTC
OK, reproduction:

service vdsmd stop
initctl stop libvirtd
service libvirtd stop
sleep 5
service libvirtd start
sleep 5
initctl start libvirtd
sleep 5
service vdsmd start

Notice 3 times libvirtd is started, while the sysvinit is the one that survives.
----
[root@alma02 ~]# service vdsmd stop
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: Running run_final_hooks                              [  OK  ]
vdsm stop                                                  [  OK  ]
[root@alma02 ~]# initctl stop libvirtd
initctl: Unknown instance: 
[root@alma02 ~]# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]
[root@alma02 ~]# sleep 5
[root@alma02 ~]# service libvirtd start
Starting libvirtd daemon:                                  [  OK  ]
[root@alma02 ~]# sleep 5
[root@alma02 ~]# initctl start libvirtd
libvirtd start/running, process 117675
[root@alma02 ~]# sleep 5
[root@alma02 ~]# service vdsmd start
libvirtd start/running, process 117846
vdsm: Running mkdirs
vdsm: Running configure_coredump
vdsm: Running configure_vdsm_logs
vdsm: Running run_init_hooks
vdsm: Running upgraded_version_check
vdsm: Running check_is_configured
libvirt is already configured for vdsm
vdsm: Running validate_configuration
SUCCESS: ssl configured to true. No conflicts
vdsm: Running prepare_transient_repository
vdsm: Running syslog_available
vdsm: Running nwfilter
vdsm: Running dummybr
vdsm: Running load_needed_modules
vdsm: Running tune_system
vdsm: Running test_space
vdsm: Running test_lo
vdsm: Running unified_network_persistence_upgrade
vdsm: Running restore_nets
vdsm: Running upgrade_300_nets
Starting up vdsm daemon: 
vdsm start                                                 [  OK  ]
[root@alma02 ~]# ps -efa | grep libvirtd
root     117605      1  0 18:01 ?        00:00:00 libvirtd --daemon --listen

Comment 24 Alon Bar-Lev 2014-11-17 16:25:55 UTC
it is actually vdsmd issue, as it should stop libvirtd sysvinit if started before attempting to start libvirtd via upstart. as manual invocation of libvirtd via sysvinit is possible at any time.

Comment 26 Dan Kenigsberg 2014-11-18 09:50:46 UTC
The sysv/initctl conflict is as old as RHEV. It should not block 3.5; `pkill libvirtd` is a workaround.

Comment 27 Gil Klein 2014-11-19 09:08:16 UTC
QE were able to reproduce this easily when connecting a new engine to a host that was already installed (older build). 

Dan, why not fixing this for 3.5?

Comment 28 Ilanit Stein 2014-11-19 09:40:54 UTC
Adding more info, by alonbl:

Workaround:
==========
stop libvirt
restart vdsm
when vdsm is started it will start libvirt properly
only if libvirt was starting using service libvirtd start it will fail to do so

Upgrade case (this bug was reported for clean installation):
===========================================================
if for some reason the libvirtd is not stopped during upgrade and it was started using upstart and not sysinit
then we have a problem
as far as I seen upstrart manages the process right
but again...
if you have started libvirtd manually service libvirtd start
upgrade will not be able to stop it properly

Comment 29 Dan Kenigsberg 2014-11-19 10:51:26 UTC
(In reply to Gil Klein from comment #27)
> QE were able to reproduce this easily when connecting a new engine to a host
> that was already installed (older build). 
> 
> Dan, why not fixing this for 3.5?

Because (as I said in comment 26) this problem is not new. It's is as old as RHEV's usage of libvirt. And since 3.5.0 is around the corner, it should not block on it.

Unless I misunderstand something and there's a regression due to a change in RHEL-6.6 or vdsm?

What made libvirtd start using sysv on the formerly-installed host?

Comment 30 Gil Klein 2014-11-19 11:51:45 UTC
I can't say what made libvirtd using sysv. 

My concern is that QE are working in this flow for a while, and this issue only appeared lately in several test environments.
 
The only change we did is moving more hosts to RHEL 6.6.

Could you please invest some more effort in this issue before we are pushing it 3.6 without a clear explanation?

Comment 31 Dan Kenigsberg 2014-11-19 12:45:26 UTC
Gil, I gave a very clear explanation: the problem --- as understood by me --- is ancient. If an admin starts up libvirtd via sysv, things break.

HOWEVER, if you believe that rhev-3.5 or rhel-6.6 introduced some kind of a change that triggers this with no admin intervension, then this is a regression.

If so, please assist in understanding what is the trigger: in what instances libvirtd is being run by sysv?

Comment 32 Mooli Tayer 2014-11-20 18:46:13 UTC
After inspection, I confirm there is no regression here:

our code now and always ignores the possibility of someone manually
starting libvirtd with sysV, the only thing that we do is:
upon installation if upstart is present (/sbin/initctl) and libvirt package
has a file for upstart (libvirtd.upstart) we install it under /etc/init/.

I would be glad to fix this for 3.6(If it will prove possible since it might be problematic).

Alon:
I noticed that on el6.6, when libvirt was initiated by sysV:
sudo initctl status libvirtd
libvirtd stop/waiting

sudo service libvirtd status
libvirtd (pid  8795) is running...

And when it is run by upstart both return running.

Can/Should I relay on this to add a validate check at configurator?
(If libvirtd is running and upstart is present, and /etc/init/libvirt.conf installed and initctl returns false then return False)

Comment 33 Alon Bar-Lev 2014-11-20 18:53:57 UTC
(In reply to Mooli Tayer from comment #32)
> Can/Should I relay on this to add a validate check at configurator?
> (If libvirtd is running and upstart is present, and /etc/init/libvirt.conf
> installed and initctl returns false then return False)

this is not configurator issue, as someone can start libvirt manually post deploy and when vdsm is started it will attempt to start another instance.

from what I could see you can simply stop libvirt using service libvirtd stop command, if it is started via upstart then it will refuse to stop.

Comment 34 Mooli Tayer 2014-11-21 13:41:26 UTC
(In reply to Alon Bar-Lev from comment #33)
> (In reply to Mooli Tayer from comment #32)
> > Can/Should I relay on this to add a validate check at configurator?
> > (If libvirtd is running and upstart is present, and /etc/init/libvirt.conf
> > installed and initctl returns false then return False)
> 
> this is not configurator issue, as someone can start libvirt manually post
> deploy and when vdsm is started it will attempt to start another instance.

But configurator also runs when vdsm starts, so unless someone fiddles with
libvirtd while vdsm is running we can handle it, no?

ps it seems to me I cannot run two services at least not naively(see attached).


> 
> from what I could see you can simply stop libvirt using service libvirtd
> stop command, if it is started via upstart then it will refuse to stop.

That should work and is easier(could be done at vdsm_init_common), however I
would still prefer to put in configurator. There is additional logic that already sits there - we switch to sysV only when we have both sysV and upstart and libvirtd.upstart is packaged by libvirt. 

I would also prefer that the administrator will see (vdsm failed to start since libvirt is manged by sysV, please restart using upstart or use "vdsm-tool configure --force". )

Comment 35 Mooli Tayer 2014-11-21 13:50:55 UTC
Created attachment 959744 [details]
starting libvirt twice

Comment 36 Alon Bar-Lev 2014-11-21 19:44:29 UTC
(In reply to Mooli Tayer from comment #34)
> I would also prefer that the administrator will see (vdsm failed to start
> since libvirt is manged by sysV, please restart using upstart or use
> "vdsm-tool configure --force". )

ok, this will be nice. you can use any mean to do so. probably get both status of upstart and sysvinit and decide.

Comment 37 Israel Pinto 2014-11-23 15:36:04 UTC
I upgrate engine to vt11 from vt10.1
Deplay 2 host to engine with vt10.1, hosts upgrate to vt11.
Create new VM on one host, did migration --> migration pass.

Comment 38 Yaniv Bronhaim 2015-01-12 08:46:43 UTC
Reducing severity, as it requires a manual intervention in the service level in order to get to this use-case.
Installing and using the host according to the documentation will not cause any issue with that.

Comment 39 Yaniv Bronhaim 2015-05-07 07:13:35 UTC
In 3.6 we use only systemd , this bug is not relevant anymore