Bug 1172387
Summary: | Failed to start domain continually due to virDBusCall() during concurrent jobs | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Hu Jianwei <jiahu> | |
Component: | systemd | Assignee: | systemd-maint | |
Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 7.1 | CC: | dyuan, ebarrera, fsumsal, honzhang, jherrman, lnykryn, lyarwood, mzhan, pdhange, psklenar, rbalakri, shyu, solganik, systemd-maint-list | |
Target Milestone: | rc | Keywords: | Reopened, ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | systemd-219-1.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, migrating several guests at once in some cases failed with a "did not receive reply" error. This update improves the responsiveness of systemd when handling multiple guests at the same time, which prevents the described problem from occurring.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1243401 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-19 15:02:17 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1243401 |
Description
Hu Jianwei
2014-12-10 01:02:58 UTC
Libvirt currently uses timeout of 30 seconds to all DBus calls. If systemd is unable to reply to us in that time I don't think there's much we can do. I mean, yeah, we can lift the timeout, but sooner or later you'll hit it again, just migrate more domains at once. I've discussed this bug on the internal call and nobody seems to have any bright idea. I mean, the timeout is big already so I don't think raising it will help anything. Therefore I'm closing this one. I mean, if host is under so heavy load that couple of mkdirs() take systemd more than 30 seconds, you certainly don't want to run virtual machines there. Hi, I had encountered a similar issue and investigation showed that the reason for the issue is the behavior of systemd-machined. See the detailed description and the bug itself filed on Centos bugzilla https://bugs.centos.org/view.php?id=8564 To summarize, seems that machined might miss handling dbus messages in some cases which causes libvirt to fail spawning vm. (In reply to Solganik Alexander from comment #4) > Hi, I had encountered a similar issue and investigation showed that the > reason for the issue is the behavior of systemd-machined. > See the detailed description and the bug itself filed on Centos bugzilla > https://bugs.centos.org/view.php?id=8564 > To summarize, seems that machined might miss handling dbus messages in some > cases which causes libvirt to fail spawning vm. This confirms my initial suspicion that libvirt's innocent. Swithing over to systemd. Would you mind sending that patch to systemd-devel mailinglist, unless you have already done so? Can you please test https://copr.fedoraproject.org/coprs/lnykryn/systemd/ , this hsould be in 7.2 You mean below src rpm in your link, but I can not find the related patch(mentioned in comment 7) in that rpm. Could you give me a scratch build? https://people.redhat.com/lnykryn/systemd/systemd-219-3.el7.1.src.rpm Sorry, please ignore my comment 9. After using your repo, I installed below systemd version, right? [root@localhost rpmbuild]# rpm -qa| grep systemd systemd-devel-219-3.el7.centos.1.x86_64 systemd-sysv-219-3.el7.centos.1.x86_64 systemd-python-219-3.el7.centos.1.x86_64 systemd-debuginfo-208-18.el7.x86_64 systemd-libs-219-3.el7.centos.1.x86_64 systemd-219-3.el7.centos.1.x86_64 If the version is right, I'll do some testing next week, thanks. Yep that patch is not there, whole loop was rewritten. Were you able to test the rebased version? If you give me a candidate version or link, I can help to test it. Thanks. Have you tried the version from my copr repo? https://copr.fedoraproject.org/coprs/lnykryn/systemd/ When I use your mentioned version, the issue is disappeared, I think the patch works. [root@localhost reproduce]# rpm -q systemd libvirt systemd-219-3.el7.centos.1.x86_64 libvirt-1.2.15-2.el7.x86_64 I still can reproduce it on below old version: [root@localhost reproduce]# rpm -q systemd libvirt systemd-208-20.el7.x86_64 libvirt-1.2.15-2.el7.x86_64 ... Domain test saved to /test/save-dir/test.save error: Failed to restore domain from /test/save-dir/usb.save error: error from service: CreateMachine: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. ... I think you can move on to next step. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2092.html |