Bug 1582379
| Summary: | All hosts stuck in connecting/not responding state until engine restarted | ||
|---|---|---|---|
| Product: | [oVirt] vdsm-jsonrpc-java | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | Core | Assignee: | Ravi Nori <rnori> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavol Brilla <pbrilla> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | --- | CC: | bugs, ebenahar, fromani, gveitmic, mgoldboi, mperina, nicolas, nsoffer, pkliczew, rnori |
| Target Milestone: | ovirt-4.2.7 | Flags: | rule-engine:
ovirt-4.2+
|
| Target Release: | 1.4.15 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | v1.4.15 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-02 14:30:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Germano Veit Michel
2018-05-25 04:03:23 UTC
Germano, it sounds like vdsm crashed (e.g. segfault) on host 3, and we would like to understand why. Can you attach /var/log/messages from this host, and the relevant abrt crash reports? This probably need a separate bug, feel free to open one for vdsm. (In reply to Nir Soffer from comment #11) > Germano, it sounds like vdsm crashed (e.g. segfault) on host 3, and we would > like to understand why. Can you attach /var/log/messages from this host, and > the relevant abrt crash reports? > > This probably need a separate bug, feel free to open one for vdsm. Yes, this is the reboot I mentioned on comment #0. It was not vdsm that crashed. The host had a kernel panic on kvm and rebooted, this host does't have much memory and there is some cache flushing involved. I'll take a better look and submit a kernel bug later if necessary. So I think there only thing to be done here is to make the engine more resilient to vdsm/host failures? (In reply to Germano Veit Michel from comment #12) > > So I think there only thing to be done here is to make the engine more > resilient to vdsm/host failures? Yes, now we need to reproduce and see exactly how sslengine behaves in similar situation and handle it correctly. reducing the priority since this is a corner case, however we would probably like to target this one to 4.3. pending on Pioter analysis. Ravi, Did you try to reproduce the issue? @Piotr I was unable to reproduce the issue. Germano, Any ideas how to reproduce? (In reply to Piotr Kliczewski from comment #17) > Germano, Any ideas how to reproduce? Unfortunately no. I did try a few more times to repeat what I was doing as per comment #0 with no luck. And I've been using the same environment for some time, and it has been all good. Can't you attempt to force such a situation by modifying the code? (In reply to Germano Veit Michel from comment #18) > > Can't you attempt to force such a situation by modifying the code? Let's try to do it. I will talk to Ravi what needs to be done. Verification steps? Verification steps 1. Have vdsm in up status 2. Kill vdsm host 3. Start vdsm host and make sure vdsm is running The host should move to UP status. Repeat the above 20 times to make sure everything works. 35 times 0 issues, host went always up *** Bug 1641836 has been marked as a duplicate of this bug. *** This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. *** Bug 1657852 has been marked as a duplicate of this bug. *** |