Bug 1151835 - [PPC] Vdsm fails to reconstruct master domain,sanlock causes host's to reboot instead of vdsmd restart
Summary: [PPC] Vdsm fails to reconstruct master domain,sanlock causes host's to reboo...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: ppc64
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.4.3
Assignee: Nir Soffer
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On: 1142454 1152594 1156017
Blocks: 1122979 1148013
TreeView+ depends on / blocked
 
Reported: 2014-10-12 13:13 UTC by Ori Gofen
Modified: 2016-02-10 17:41 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-10 13:09:09 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm+engine logs (290.63 KB, application/x-bzip)
2014-10-12 13:13 UTC, Ori Gofen
no flags Details

Description Ori Gofen 2014-10-12 13:13:15 UTC
Created attachment 946094 [details]
vdsm+engine logs

Description of problem:

This is a clone of BZ #1141658,though consequences are far worse.
The rebooted host never come back due to PPC wrong boot configuration, hence, makes the whole setup useless.

We also need to make sure that the sanlock fix for rhel7 BZ #1142454 will be available for the rhev-m/IBM/fedora OS as well. 

Version-Release number of selected component (if applicable):

How reproducible:
100%

see BZ #1141658 for more description

Comment 1 Nir Soffer 2014-10-12 13:45:19 UTC
(In reply to Ori from comment #0)
It is not clear what is "wrong PPC boot configuration", and how is this related to vdsm.

If the root cause of this bug is the same as in bug 1141658, then this is a duplicate.

If this bug describe another issue (wrong boot configuration?), then it is not a vdsm bug.

Also I don't see how reconstruct master is related to bug 1141658. That bug is about blocking access to storage, which leads to reboot of the machine, because sanlock cannot terminate or kill vdsm (because selinux policy is incorrect).

To make it more clear, please attach these logs:
- /var/log/sanlock.log
- /var/log/audit/audit.log*

Also missing:
- vdsm version
- steps to reproduce
- reproducible: 100% - how may times did you reproduced this?

Comment 2 Gilad Lazarovich 2014-10-13 08:49:45 UTC
Nir,
This bug is related to BZ #1141658, but this pertains to PPC (RHEV for IBM PPC), we require separation between the bug given the different platforms and multiple validations.

Regarding the "wrong PPC boot configuration", the reboot caused by the selinux blocking the SAN locking mechanism ends up with the OS not booting up (wrong petitboot configuration).  This indeed is not vdsm related but is just my observation.

Per comment https://bugzilla.redhat.com/show_bug.cgi?id=1141658#c4, the reconstruct master is related to this bug.

This is 100% reproducible as stated in the bug, I reproduced this 4 times.  I will attach the logs requested with the 5th try.

Thanks, Ori.

Comment 3 Nir Soffer 2014-10-13 10:14:24 UTC
(In reply to Gilad Lazarovich from comment #2)
> Per comment https://bugzilla.redhat.com/show_bug.cgi?id=1141658#c4, the
> reconstruct master is related to this bug.

The issue is not related to in any way to reconstruct master.

The issue (bug 1141658) is:
1. blocking access to storage
2. sanlock try to stop the spm and fail (selinux policy bug)
3. sanlock reboot the machine (expected behavior)

Reconstruct master did not happen because the machine was rebooted.

> This is 100% reproducible as stated in the bug, I reproduced this 4 times. 
> I will attach the logs requested with the 5th try.

Do not forget the steps to reproduce. Currently we can do nothing with this bug.

Comment 7 Ori Gofen 2014-10-19 08:45:54 UTC
Sorry Nir, PPC OS is taking it's first QA steps which means that at any given time, some Engineers are running tests on the system, I cannot make any changes to the SElinux,in addition, from the web https://brewweb.devel.redhat.com/taskinfo?taskID=8095125 it's not clear whether the packages supports PPC, I don't want to break any running tests.

Comment 8 Nir Soffer 2014-10-19 09:07:33 UTC
(In reply to Ori from comment #7)
> Sorry Nir, PPC OS is taking it's first QA steps which means that at any
> given time, some Engineers are running tests on the system, I cannot make
> any changes to the SElinux,in addition, from the web
> https://brewweb.devel.redhat.com/taskinfo?taskID=8095125 it's not clear
> whether the packages supports PPC, I don't want to break any running tests.

Then this bug will have to wait until you have a release including this selinux policy.

Comment 9 Allon Mureinik 2014-10-22 11:50:44 UTC
(In reply to Ori from comment #7)
> Sorry Nir, PPC OS is taking it's first QA steps which means that at any
> given time, some Engineers are running tests on the system, I cannot make
> any changes to the SElinux,in addition, from the web
> https://brewweb.devel.redhat.com/taskinfo?taskID=8095125 it's not clear
> whether the packages supports PPC, I don't want to break any running tests.

It's a noarch package - It supports PPC.
Please schedule yourself a window on the PPC machine and test with this package.

Comment 10 Ori Gofen 2014-10-22 13:07:09 UTC
will do it next time I'll have those hosts.

Comment 13 Michal Skrivanek 2014-10-23 09:55:22 UTC
this is already in 3.4.3, ON_QA for retest…

Comment 14 Ori Gofen 2014-10-23 12:09:09 UTC
BZ #1156017 have been marked as blocker due to lack of storage resources on host,
I need 2 domains with different ip's to verify the correct behavior, right now the creation of any nfs Storage domain is impossible.

Comment 16 Ori Gofen 2014-10-28 10:50:58 UTC
verified on powerKVM latest ver.
the whole operation takes longer than usual though:
normal time is about 15 minutes where reconstruct on PPC setup took about 22 minutes.

Comment 17 Michal Skrivanek 2014-11-10 13:09:09 UTC
released Oct 31


Note You need to log in before you can comment on or make changes to this bug.