Bug 1162539
| Summary: | libvirtd cannot be started using upstart if started manually using sysvinit | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Israel Pinto <ipinto> | ||||||||||||||||||
| Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||||||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Israel Pinto <ipinto> | ||||||||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||
| Version: | 3.5.0 | CC: | alonbl, bazulay, danken, dougsland, ecohen, fromani, gklein, ipinto, istein, lpeer, lsurette, mavital, mtayer, ofrenkel, oourfali, rbalakri, Rhev-m-bugs, ybronhei, yeylon | ||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||
| Target Release: | 3.6.0 | ||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||
| Whiteboard: | infra | ||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||
| Last Closed: | 2015-05-07 07:13:35 UTC | Type: | Bug | ||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
| Embargoed: | |||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||
|
Description
Israel Pinto
2014-11-11 09:50:54 UTC
Created attachment 956198 [details]
engine logs
Alon, can you please take a look on this error? could it be a deploy issue? 2014-11-11 09:23:23.015+0000: 9122: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer. 2014-11-11 09:23:23.015+0000: 9122: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate 2014-11-11 09:23:23.015+0000: 9122: error : doPeer2PeerMigrate:2768 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.4.91/system host:/etc/pki/CA/cacert.pem
host:/etc/pki/vdsm/certs/vdsmcert.pem
engine:/etc/pki/ovirt-engine/ca.pem
you can use:
$ openssl x509 -in ${FILE}
the result should be same content.
Hi
Did the certificate check on both host and engine see out put below, i see that in the cacert.pem there is problem in both hosts.
host 10.35.4.161:
[root@dhcp-4-161 pki]# openssl x509 -in cacert.pam
Error opening Certificate cacert.pam
139935435089736:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('cacert.pam','r')
139935435089736:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
[root@dhcp-4-161 pki]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem
-----BEGIN CERTIFICATE-----
MIIEojCCA4qgAwIBAgICEAkwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMzI3
MTArMDAwMBcNMTkxMTA4MTMyNzExWjA3MR8wHQYDVQQKExZzY2wubGFiLnRsdi5y
ZWRoYXQuY29tMRQwEgYDVQQDEwsxMC4zNS40LjE2MTCCASIwDQYJKoZIhvcNAQEB
BQADggEPADCCAQoCggEBAMqJ0iEZHsAyyAOA6jSNQQum+c+nUTwmJn9zoRCNhHAz
E3E5qHq/J0GPjTFqQ25hQyui1cNxtVi+bsz+PNXr0v/Av/DSxTzGO30j7UxIwkcd
44hrADzJ56y5BA3kP2XEhTjlMcCxgCrBKkPzDxzLh55rEvT2h7UnkB/fx1rmUuAt
ueNou2h+VVqFwkyGqU3mufyANUK55VxakaSTn8s1IROhHjl8s2FL3kJkrIOhAzLs
6zKFcIxrmT7w0zg6KMElVi3RQchuNb77xQuUlCNEgGmxC/AIAuWBeivWFhmEYi12
B+gYiaV7z+yCbO8z6sMnANp5RJfHAI8mpVALvj5BQj8CAwEAAaOCAYkwggGFMB0G
A1UdDgQWBBSHcqKOQsnUiX5RAVF1oDNNcey/hjCBmAYIKwYBBQUHAQEEgYswgYgw
gYUGCCsGAQUFBzAChnlodHRwOi8vZGhjcDE2My00My5zY2wubGFiLnRsdi5yZWRo
YXQuY29tOjgwL292aXJ0LWVuZ2luZS9zZXJ2aWNlcy9wa2ktcmVzb3VyY2U/cmVz
b3VyY2U9Y2EtY2VydGlmaWNhdGUmZm9ybWF0PVg1MDktUEVNLUNBMIGLBgNVHSME
gYMwgYCAFLNH1HjnPDKcD9IdiDiIVftUPbf8oWSkYjBgMQswCQYDVQQGEwJVUzEf
MB0GA1UEChMWc2NsLmxhYi50bHYucmVkaGF0LmNvbTEwMC4GA1UEAxMnZGhjcDE2
My00My5zY2wubGFiLnRsdi5yZWRoYXQuY29tLjI0OTEyggIQADAJBgNVHRMEAjAA
MA4GA1UdDwEB/wQEAwIFoDAgBgNVHSUBAf8EFjAUBggrBgEFBQcDAQYIKwYBBQUH
AwIwDQYJKoZIhvcNAQEFBQADggEBAJ73ghRYPEpSIJ8sZ9cNAg0YamPJ8JUPzF2h
hRLLMzzpEFC1W5qm5cxvzWQ0K/30LIkkfqAevyLAQQg7XqQ6ad0k99QNFq8Qs/oY
VwJRZhibKPIC7++B8R0quvhZFO8mMcnvFiU8r07Yq0J39gFsctcG8wJ2T7gY4f8B
8WyHKmNV4wqSOUW2e9V/8X4uT5xuPMa5D60LKp1ABwKjcDmuDxRFBA+XKFiXhNe7
E+RpT8UIArVS9fgNle73SOT4bOqgOUmBkkK6IfyyVH0nVyuWvqFyhDKhEb1qubzX
zOGiZL/AwTqA2ZGuUtYeGeKYpx+FwbgkEm6cVbYN/lv6U0N0+44=
-----END CERTIFICATE-----
host 10.35.4.137:
[root@dhcp-4-137 pki]# openssl x509 -in cacert.pam
Error opening Certificate cacert.pam
140472460683080:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('cacert.pam','r')
140472460683080:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:
unable to load certificate
[root@dhcp-4-137 pki]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem
-----BEGIN CERTIFICATE-----
MIIEoTCCA4mgAwIBAgICEAgwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMzEy
MDErMDAwMBcNMTkxMTA4MTMxMjAxWjA2MR8wHQYDVQQKExZzY2wubGFiLnRsdi5y
ZWRoYXQuY29tMRMwEQYDVQQDEwoxMC4zNS40LjkxMIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEA1dx9gDdJv68ZDaYPgjOVlGQXOadXx2lg356L2RYjgfKJ
PANfXt3nimevKPBnOO4kQB1nd3UW2JC51JsCt/7uulTSaTKqfgB5SqKsuvJefo5p
Jr/8Zk/N0NShdz+d2+BlyTEXO/ZzRiVh86xaHbjsFYWSKSsXaQA2p+EzXyXBzZcM
0KOo6aY1ZavALDq3LU/eAbc+i8pvU/PMbPZSOxPrB5e9efDjoMD+MuU3QqESE0vB
CmO7Udz+CwVXSSaASoYzXDYMrCpRKWKgaEr4KxyxjZOCob5WMRX2boybyGqxc4sW
+iZqmCS+4WdoJeRzYWqqN8UMyTeqYjOky/VwRay1yQIDAQABo4IBiTCCAYUwHQYD
VR0OBBYEFIt/znwkvyHka/9IhSGJyi8XjuI4MIGYBggrBgEFBQcBAQSBizCBiDCB
hQYIKwYBBQUHMAKGeWh0dHA6Ly9kaGNwMTYzLTQzLnNjbC5sYWIudGx2LnJlZGhh
dC5jb206ODAvb3ZpcnQtZW5naW5lL3NlcnZpY2VzL3BraS1yZXNvdXJjZT9yZXNv
dXJjZT1jYS1jZXJ0aWZpY2F0ZSZmb3JtYXQ9WDUwOS1QRU0tQ0EwgYsGA1UdIwSB
gzCBgIAUs0fUeOc8MpwP0h2IOIhV+1Q9t/yhZKRiMGAxCzAJBgNVBAYTAlVTMR8w
HQYDVQQKExZzY2wubGFiLnRsdi5yZWRoYXQuY29tMTAwLgYDVQQDEydkaGNwMTYz
LTQzLnNjbC5sYWIudGx2LnJlZGhhdC5jb20uMjQ5MTKCAhAAMAkGA1UdEwQCMAAw
DgYDVR0PAQH/BAQDAgWgMCAGA1UdJQEB/wQWMBQGCCsGAQUFBwMBBggrBgEFBQcD
AjANBgkqhkiG9w0BAQUFAAOCAQEAj5ZYCkPdko3BygcCxGCObq/x/vDD5yD6WKVs
M2jTA91eYESadN4PPIzhv3clxW9O/F2NX5xyKkXOsdHxdTPQKUaQgCTRsW4+I1QZ
k2a+yMYkVUkpCkZSFuXsR43vUSRIZQQVEPJW76u0JgAjRdWf2M3iZGz9ijT2rjmc
cO2xotbQHyv+ORBBHAm7iOp5RVU3JNE6/lJFrkMAqy4dRsgSOIKt2Wc7J19ZnmoB
e8uZq+Vxh/tw6mGR8FCGnPK92+llwHwivVxc0Ghk/I/xwOHkpWNP7vxCunUvgWJH
rX8JU0TX+Unlzxs0YQU6pC76js83FKAXvT1h2HS5B6vJ1ju0tQ==
-----END CERTIFICATE-----
Engine 10.35.163.43:
[root@dhcp163-43 ~]# openssl x509 -in /etc/pki/ovirt-engine/ca.pem
-----BEGIN CERTIFICATE-----
MIIEEjCCAvqgAwIBAgICEAAwDQYJKoZIhvcNAQEFBQAwYDELMAkGA1UEBhMCVVMx
HzAdBgNVBAoTFnNjbC5sYWIudGx2LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3Ax
NjMtNDMuc2NsLmxhYi50bHYucmVkaGF0LmNvbS4yNDkxMjAiFxExNDExMDgxMjA2
NDIrMDAwMBcNMjQxMTA2MTIwNjQyWjBgMQswCQYDVQQGEwJVUzEfMB0GA1UEChMW
c2NsLmxhYi50bHYucmVkaGF0LmNvbTEwMC4GA1UEAxMnZGhjcDE2My00My5zY2wu
bGFiLnRsdi5yZWRoYXQuY29tLjI0OTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A
MIIBCgKCAQEAqJLDSWdje0utJGBCi9dJ5XoqLpFJtmop4JFU6F12zZahs3cbX0MH
Eb5JJTfjiKUpSgmO+WHvroM8XLncRZID648HnhswD20pvSfkpAi6hsMC3WCOj4kq
0kDNcuOaP7WxFoG5UnuB70dA6FThGHC/NTVuP3QXU7rx23HqOHhnyDi+1bMyw1XN
GQYoolxSM2lzPTmuYf9v104yueXtqZOr91+ad01If4061g0RjTH4Fu9CQs0v2nYE
1kFqY1rwY/kqsX9sRy+zabvqB17UlBv79ndQWxGSeeVQXASwU1SeppJW7YhU5mDJ
TEzhKy1j9tDVFCPQMJsRBzvTC/quIUwwFwIDAQABo4HRMIHOMB0GA1UdDgQWBBSz
R9R45zwynA/SHYg4iFX7VD23/DCBiwYDVR0jBIGDMIGAgBSzR9R45zwynA/SHYg4
iFX7VD23/KFkpGIwYDELMAkGA1UEBhMCVVMxHzAdBgNVBAoTFnNjbC5sYWIudGx2
LnJlZGhhdC5jb20xMDAuBgNVBAMTJ2RoY3AxNjMtNDMuc2NsLmxhYi50bHYucmVk
aGF0LmNvbS4yNDkxMoICEAAwDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8EBAMC
AQYwDQYJKoZIhvcNAQEFBQADggEBAHQ6Kfv4iY8DbYkwPh6XuXKpK3C4bxI1Py8D
G9xr7iK0cgXdU3prLPbEp3AXcuUfocXIY0Z9Nki4Hr3ueSbWWVmN+5u5jQLHBT1v
smLbrR+rvBIr2Pv6EFxB/n0QYjzB4fBynAFsoS6lNiFj6nXCU2aS0RphCvmImE3g
yEYr+pDVWPmL9RT7rRhliPASlKNgYc4BQNRQOI0YeWhYy4uTrR6HsIn9Dz37uPVB
dPv83i0EnszPOXUsi/KrGAYYwkBnAJbeiZkqyq6y/TBY78syLY6etppF0kRlIBuT
Og69dV7CjCk7EyefDpscnk9lCVE2z/SUkP2WHQK3WIhPcATyZRw=
-----END CERTIFICATE-----
No need to paste certificates, just to compare to engine. My mistake... it should be /etc/pki/CA/cacert.pem for libvirt. Created attachment 956619 [details]
certificate check
as you can see there are different ca certificate spread all over, this is probably due to invalid deployments. please re-deploy all hosts, or at least these that have CA that differ from the engine. Created attachment 956700 [details]
hosts deplay logs
not sure why you communicate in multiple channels, already wrote to you: 1. certificates are ok now, and host-deploy do not change them. 2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/vdsm/certs/cacert.pem'' 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/vdsm/certs/cacert.pem' exists 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/vdsm/certs/cacert.pem' already has content 2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/vdsm/libvirt-spice/ca-cert.pem'' 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/vdsm/libvirt-spice/ca-cert.pem' exists 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/vdsm/libvirt-spice/ca-cert.pem' already has content 2014-11-12 11:56:54 DEBUG otopi.transaction transaction._prepare:77 preparing 'File transaction for '/etc/pki/libvirt/../CA/cacert.pem'' 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:196 file '/etc/pki/libvirt/../CA/cacert.pem' exists 2014-11-12 11:56:54 DEBUG otopi.filetransaction filetransaction.prepare:200 file '/etc/pki/libvirt/../CA/cacert.pem' already has content 2. I also checked this manually using ssh, you can do the same. 3. libvirtd is being restarted, also vdsm. please confirm the above. if a file suddenly changed post deploy, please find root cause. Check certificate again after deploy. copmare hosts files to ca on engine. Host files: openssl x509 -n /etc/pki/CA/cacert.pem /etc/pki/vdsm/certs/cacert.pem openssl x509 -n /etc/pki/vdsm/libvirt-spice/ca-cert.pem Engine file: openssl x509 -n /etc/pki/ovirt-engine/ca.pem The certificates are the same, Alon confirm that the deplay is OK. Suggesting check with Virt the following: 1. libvirtd configration 2. vdsm usage of libvirt client Francesco can you take a look? I checked the affected hosts. The only strange thing I found is this when (re)starting libvirt: 2014-11-12 16:09:15.330+0000: 22929: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.330+0000: 22929: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.330+0000: 22929: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.330+0000: 22929: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.331+0000: 22929: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.331+0000: 22929: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.387+0000: 22968: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.387+0000: 22968: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.387+0000: 22968: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.387+0000: 22968: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.387+0000: 22968: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.387+0000: 22968: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.420+0000: 22973: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.420+0000: 22973: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.420+0000: 22973: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.420+0000: 22973: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.420+0000: 22973: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.420+0000: 22973: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.454+0000: 22982: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.454+0000: 22982: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.454+0000: 22982: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.454+0000: 22982: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.454+0000: 22982: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.454+0000: 22982: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.499+0000: 22988: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.499+0000: 22988: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.499+0000: 22988: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.499+0000: 22988: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.499+0000: 22988: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.499+0000: 22988: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.533+0000: 22993: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.533+0000: 22993: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.533+0000: 22993: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.533+0000: 22993: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.533+0000: 22993: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.533+0000: 22993: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.567+0000: 22999: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.567+0000: 22999: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.567+0000: 22999: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.567+0000: 22999: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.567+0000: 22999: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.567+0000: 22999: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.600+0000: 23004: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.600+0000: 23004: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.600+0000: 23004: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.600+0000: 23004: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.600+0000: 23004: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable 2014-11-12 16:09:15.600+0000: 23004: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services 2014-11-12 16:09:15.634+0000: 23009: info : libvirt version: 0.10.2, package: 46.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-09-09-07:49:04, x86-022.build.eng.bos.redhat.com) 2014-11-12 16:09:15.634+0000: 23009: debug : main:1021 : Decided on pid file path '/var/run/libvirtd.pid' 2014-11-12 16:09:15.634+0000: 23009: debug : main:1031 : Decided on socket paths '/var/run/libvirt/libvirt-sock' and '/var/run/libvirt/libvirt-sock-ro' 2014-11-12 16:09:15.634+0000: 23009: debug : main:1068 : Ensuring run dir '/var/run/libvirt' exists 2014-11-12 16:09:15.634+0000: 23009: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable however eventually libvirt (re)started correctly on both hosts. I also restarted vdsm on both hosts. After that, VMs migrated back and forth succesfully, and also they migrated succesfully after I put the host where they were running into maintenance mode. See attachment for proof. Created attachment 956795 [details]
engine screenshot
So guys - is there a bug here ? or was it just bad deployment ? (In reply to Barak from comment #14) > So guys - is there a bug here ? or was it just bad deployment ? I have no idea, deployment is ok, not sure about the other. from logs it is obvious that libvirtd is restarted and certificates are ok. we need predictable reproduction from clean setup (CLEAN: aka fresh installation of all components, including operating system). Israel - please provide a reproduction scenario. (In reply to Alon Bar-Lev from comment #15) > (In reply to Barak from comment #14) > > So guys - is there a bug here ? or was it just bad deployment ? > > I have no idea, deployment is ok, not sure about the other. > > from logs it is obvious that libvirtd is restarted and certificates are ok. > > we need predictable reproduction from clean setup (CLEAN: aka fresh > installation of all components, including operating system). +1 I manually restarted (using service vdsmd restart / service libvirtd restart) the services once again before to test it and after that everything worked like a charm. From a virt perspective everything seems right, and everything works. Only possible issues seems a missed restart after deploy, but i Alon says the deploy is correct, then I don't know. To go ahead, we need a reproducer as Alon suggested. Restoring NEEDINFO as per https://bugzilla.redhat.com/show_bug.cgi?id=1162539#c16 (In reply to Francesco Romani from comment #18) > Restoring NEEDINFO as per > https://bugzilla.redhat.com/show_bug.cgi?id=1162539#c16 Reducing severity, as it seems like it is hard to reproduce. I reproduced the problem on vt10.1. 1. I had engine with vt9, 2. I run engine-cleanup, and yum remove rhevm-* 3. I installed, vt10.1 (clean install). 4. 2 hosts were installed (hosts are not after fresh installed of OS, but were installed before on the same setup, when it was engine vt9) 5. 5 vms created & run (installed with OS from pxe) 6. 4 vms run on host2, and 1 vm run on host1 7. I tried from UI to migrate 1 vm, from host1, to host2 => Migration failed. Migration failed. - Certificate on host1, host2, engine (read as in comment #10) are the same. src libvirt.log: ---------------- 014-11-17 13:18:06.747+0000: 2121: error : virNetTLSContextValidCertificate:1031 : Certificate failed validation: The certificate hasn't got a known issuer. 2014-11-17 13:18:06.747+0000: 2121: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer. 2014-11-17 13:18:06.747+0000: 2121: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate 2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 24 2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 28 2014-11-17 13:18:06.747+0000: 2121: debug : virFileClose:72 : Closed fd 27 2014-11-17 13:18:06.747+0000: 2121: debug : do_open:1180 : driver 2 remote returned ERROR 2014-11-17 13:18:06.747+0000: 2121: error : doPeer2PeerMigrate:2768 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.117.22/system src vdsm.log: ------------ Thread-2439::ERROR::2014-11-17 15:18:06,874::migration::259::vm.Vm::(run) vmId=`321985ed-796f-48d0-beac-51e35a84d536`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 245, in run self._startUnderlyingMigration(time.time()) File "/usr/share/vdsm/virt/migration.py", line 324, in _startUnderlyingMigration None, maxBandwidth) File "/usr/share/vdsm/virt/vm.py", line 668, in f ret = attr(*args, **kwargs) File "/usr/lib/python2.6/site-packages/vdsm/libvirtconnection.py", line 111, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1210, in migrateToURI2 if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self) libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.117.22/system Attached deploy logs. Created attachment 958281 [details]
host1 (mig src) deploy log
Created attachment 958283 [details]
host2 (mig dst) deploy log
OK, reproduction: service vdsmd stop initctl stop libvirtd service libvirtd stop sleep 5 service libvirtd start sleep 5 initctl start libvirtd sleep 5 service vdsmd start Notice 3 times libvirtd is started, while the sysvinit is the one that survives. ---- [root@alma02 ~]# service vdsmd stop Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: Running run_final_hooks [ OK ] vdsm stop [ OK ] [root@alma02 ~]# initctl stop libvirtd initctl: Unknown instance: [root@alma02 ~]# service libvirtd stop Stopping libvirtd daemon: [ OK ] [root@alma02 ~]# sleep 5 [root@alma02 ~]# service libvirtd start Starting libvirtd daemon: [ OK ] [root@alma02 ~]# sleep 5 [root@alma02 ~]# initctl start libvirtd libvirtd start/running, process 117675 [root@alma02 ~]# sleep 5 [root@alma02 ~]# service vdsmd start libvirtd start/running, process 117846 vdsm: Running mkdirs vdsm: Running configure_coredump vdsm: Running configure_vdsm_logs vdsm: Running run_init_hooks vdsm: Running upgraded_version_check vdsm: Running check_is_configured libvirt is already configured for vdsm vdsm: Running validate_configuration SUCCESS: ssl configured to true. No conflicts vdsm: Running prepare_transient_repository vdsm: Running syslog_available vdsm: Running nwfilter vdsm: Running dummybr vdsm: Running load_needed_modules vdsm: Running tune_system vdsm: Running test_space vdsm: Running test_lo vdsm: Running unified_network_persistence_upgrade vdsm: Running restore_nets vdsm: Running upgrade_300_nets Starting up vdsm daemon: vdsm start [ OK ] [root@alma02 ~]# ps -efa | grep libvirtd root 117605 1 0 18:01 ? 00:00:00 libvirtd --daemon --listen it is actually vdsmd issue, as it should stop libvirtd sysvinit if started before attempting to start libvirtd via upstart. as manual invocation of libvirtd via sysvinit is possible at any time. The sysv/initctl conflict is as old as RHEV. It should not block 3.5; `pkill libvirtd` is a workaround. QE were able to reproduce this easily when connecting a new engine to a host that was already installed (older build). Dan, why not fixing this for 3.5? Adding more info, by alonbl: Workaround: ========== stop libvirt restart vdsm when vdsm is started it will start libvirt properly only if libvirt was starting using service libvirtd start it will fail to do so Upgrade case (this bug was reported for clean installation): =========================================================== if for some reason the libvirtd is not stopped during upgrade and it was started using upstart and not sysinit then we have a problem as far as I seen upstrart manages the process right but again... if you have started libvirtd manually service libvirtd start upgrade will not be able to stop it properly (In reply to Gil Klein from comment #27) > QE were able to reproduce this easily when connecting a new engine to a host > that was already installed (older build). > > Dan, why not fixing this for 3.5? Because (as I said in comment 26) this problem is not new. It's is as old as RHEV's usage of libvirt. And since 3.5.0 is around the corner, it should not block on it. Unless I misunderstand something and there's a regression due to a change in RHEL-6.6 or vdsm? What made libvirtd start using sysv on the formerly-installed host? I can't say what made libvirtd using sysv. My concern is that QE are working in this flow for a while, and this issue only appeared lately in several test environments. The only change we did is moving more hosts to RHEL 6.6. Could you please invest some more effort in this issue before we are pushing it 3.6 without a clear explanation? Gil, I gave a very clear explanation: the problem --- as understood by me --- is ancient. If an admin starts up libvirtd via sysv, things break. HOWEVER, if you believe that rhev-3.5 or rhel-6.6 introduced some kind of a change that triggers this with no admin intervension, then this is a regression. If so, please assist in understanding what is the trigger: in what instances libvirtd is being run by sysv? After inspection, I confirm there is no regression here: our code now and always ignores the possibility of someone manually starting libvirtd with sysV, the only thing that we do is: upon installation if upstart is present (/sbin/initctl) and libvirt package has a file for upstart (libvirtd.upstart) we install it under /etc/init/. I would be glad to fix this for 3.6(If it will prove possible since it might be problematic). Alon: I noticed that on el6.6, when libvirt was initiated by sysV: sudo initctl status libvirtd libvirtd stop/waiting sudo service libvirtd status libvirtd (pid 8795) is running... And when it is run by upstart both return running. Can/Should I relay on this to add a validate check at configurator? (If libvirtd is running and upstart is present, and /etc/init/libvirt.conf installed and initctl returns false then return False) (In reply to Mooli Tayer from comment #32) > Can/Should I relay on this to add a validate check at configurator? > (If libvirtd is running and upstart is present, and /etc/init/libvirt.conf > installed and initctl returns false then return False) this is not configurator issue, as someone can start libvirt manually post deploy and when vdsm is started it will attempt to start another instance. from what I could see you can simply stop libvirt using service libvirtd stop command, if it is started via upstart then it will refuse to stop. (In reply to Alon Bar-Lev from comment #33) > (In reply to Mooli Tayer from comment #32) > > Can/Should I relay on this to add a validate check at configurator? > > (If libvirtd is running and upstart is present, and /etc/init/libvirt.conf > > installed and initctl returns false then return False) > > this is not configurator issue, as someone can start libvirt manually post > deploy and when vdsm is started it will attempt to start another instance. But configurator also runs when vdsm starts, so unless someone fiddles with libvirtd while vdsm is running we can handle it, no? ps it seems to me I cannot run two services at least not naively(see attached). > > from what I could see you can simply stop libvirt using service libvirtd > stop command, if it is started via upstart then it will refuse to stop. That should work and is easier(could be done at vdsm_init_common), however I would still prefer to put in configurator. There is additional logic that already sits there - we switch to sysV only when we have both sysV and upstart and libvirtd.upstart is packaged by libvirt. I would also prefer that the administrator will see (vdsm failed to start since libvirt is manged by sysV, please restart using upstart or use "vdsm-tool configure --force". ) Created attachment 959744 [details]
starting libvirt twice
(In reply to Mooli Tayer from comment #34) > I would also prefer that the administrator will see (vdsm failed to start > since libvirt is manged by sysV, please restart using upstart or use > "vdsm-tool configure --force". ) ok, this will be nice. you can use any mean to do so. probably get both status of upstart and sysvinit and decide. I upgrate engine to vt11 from vt10.1 Deplay 2 host to engine with vt10.1, hosts upgrate to vt11. Create new VM on one host, did migration --> migration pass. Reducing severity, as it requires a manual intervention in the service level in order to get to this use-case. Installing and using the host according to the documentation will not cause any issue with that. In 3.6 we use only systemd , this bug is not relevant anymore |