Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1632128

Summary: Invalid status on Data Center/lvm segfault
Product: [oVirt] vdsm Reporter: oliver.albl
Component: CoreAssignee: bugs <bugs>
Status: CLOSED WONTFIX QA Contact: mlehrer
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.20.31CC: bugs, dagur, eshenitz, lsvaty, maurizio.antillon, oliver.albl, tnisan, vjuranek
Target Milestone: ---Keywords: Performance
Target Release: ---Flags: rule-engine: ovirt-4.3+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-09 07:56:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Engine log
none
Logfiles (vdsm.log, messages and core files) from server 1
none
Logfiles (vdsm.log, messages and core files) from server 2
none
Logfiles (vdsm.log, messages) from SPM none

Description oliver.albl 2018-09-24 07:44:08 UTC
Created attachment 1486316 [details]
Engine log

Description of problem:
I had two occurrences of "Invalid status on Data Center <name>. Setting status to Non Responsive.". It seems this is caused by an lvm segfault (kernel: lvm[274391]: segfault at 18 ip 00007f9a03d30905 sp 00007ffd10f64740 error 4 in libc-2.17.so[7f9a03cae000+1c3000])

Version-Release number of selected component (if applicable):
vdsm-4.20.39.1-1.el7.x86_64
oVirt 4.2.6.4-1.el7
Linux <server> 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
Generate load on hosts by automatically creating vms, running vm workloads, deleting vms. 

Steps to Reproduce:
1.
2.
3.

Actual results:
SPM change, datacenter error

Expected results:


Additional info:

Comment 1 oliver.albl 2018-09-24 07:44:57 UTC
Created attachment 1486317 [details]
Logfiles (vdsm.log, messages and core files) from server 1

Comment 2 oliver.albl 2018-09-24 07:45:24 UTC
Created attachment 1486318 [details]
Logfiles (vdsm.log, messages and core files) from server 2

Comment 3 Sandro Bonazzola 2019-01-28 09:36:36 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 4 Tal Nisan 2019-03-01 03:26:47 UTC
Hi Oliver,
Does this issue still occur?

Comment 5 oliver.albl 2019-03-01 07:42:31 UTC
Hi Tal,
no, I did not see another occurrence of this situation.

Comment 6 oliver.albl 2019-03-10 20:49:14 UTC
Created attachment 1542701 [details]
Logfiles (vdsm.log, messages) from SPM

Hi Tal,
today, we saw the problem again (10:19). I attached vdsm.log and messages from SPM.

Comment 8 Tal Nisan 2019-11-18 12:40:20 UTC
Vojtech, please try to understand from the logs what is the issue here

Comment 9 Avihai 2019-11-24 09:28:31 UTC
Looking on the initial description , This looks like a RHV scale issue.
Changing QA contact for scale team leader Mordechai.

Oliver, can you please add details of exactly what is the workloads used here?

Comment 10 oliver.albl 2019-11-24 12:29:44 UTC
I run an oVirt installation with 50 hosts and 45 FC storage domains connected to two all flash arrays. The datacenter has eight clusters, the largest cluster has 20 hosts. Main workload is created by automatically creating VMs from templates/clones (up to 100-200 new VMs in 5-10 minutes), running automatic test workload within the VMs and removing VMs.

Comment 11 Lukas Svaty 2020-04-03 09:26:07 UTC
low reproducibility -> lowering severity, CLOSE?

Comment 18 Eyal Shenitzky 2021-03-09 07:56:54 UTC
This random issue suggests that there might be an environmental issue here.
Also, since the is a low reproducibility I suggest closing the bug and re-open if we see it again.

Comment 19 Red Hat Bugzilla 2023-09-15 00:12:34 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days