Bug 1632128
| Summary: | Invalid status on Data Center/lvm segfault | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | oliver.albl | ||||||||||
| Component: | Core | Assignee: | bugs <bugs> | ||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | mlehrer | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 4.20.31 | CC: | bugs, dagur, eshenitz, lsvaty, maurizio.antillon, oliver.albl, tnisan, vjuranek | ||||||||||
| Target Milestone: | --- | Keywords: | Performance | ||||||||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.3+
|
||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2021-03-09 07:56:54 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 1486317 [details]
Logfiles (vdsm.log, messages and core files) from server 1
Created attachment 1486318 [details]
Logfiles (vdsm.log, messages and core files) from server 2
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. Hi Oliver, Does this issue still occur? Hi Tal, no, I did not see another occurrence of this situation. Created attachment 1542701 [details]
Logfiles (vdsm.log, messages) from SPM
Hi Tal,
today, we saw the problem again (10:19). I attached vdsm.log and messages from SPM.
Vojtech, please try to understand from the logs what is the issue here Looking on the initial description , This looks like a RHV scale issue. Changing QA contact for scale team leader Mordechai. Oliver, can you please add details of exactly what is the workloads used here? I run an oVirt installation with 50 hosts and 45 FC storage domains connected to two all flash arrays. The datacenter has eight clusters, the largest cluster has 20 hosts. Main workload is created by automatically creating VMs from templates/clones (up to 100-200 new VMs in 5-10 minutes), running automatic test workload within the VMs and removing VMs. low reproducibility -> lowering severity, CLOSE? This random issue suggests that there might be an environmental issue here. Also, since the is a low reproducibility I suggest closing the bug and re-open if we see it again. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Created attachment 1486316 [details] Engine log Description of problem: I had two occurrences of "Invalid status on Data Center <name>. Setting status to Non Responsive.". It seems this is caused by an lvm segfault (kernel: lvm[274391]: segfault at 18 ip 00007f9a03d30905 sp 00007ffd10f64740 error 4 in libc-2.17.so[7f9a03cae000+1c3000]) Version-Release number of selected component (if applicable): vdsm-4.20.39.1-1.el7.x86_64 oVirt 4.2.6.4-1.el7 Linux <server> 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Generate load on hosts by automatically creating vms, running vm workloads, deleting vms. Steps to Reproduce: 1. 2. 3. Actual results: SPM change, datacenter error Expected results: Additional info: