Bug 1320078 - RHEV deployment freezes at 87.5% when failed to attach storage, polling indefinitely
Summary: RHEV deployment freezes at 87.5% when failed to attach storage, polling indef...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Quickstart Cloud Installer
Classification: Red Hat
Component: Installation - RHEV
Version: 1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ga
: 1.0
Assignee: Dylan Murray
QA Contact: bmorriso
Dan Macpherson
URL:
Whiteboard:
: 1357684 (view as bug list)
Depends On:
Blocks: rhci-sprint-16 qci-sprint-17
TreeView+ depends on / blocked
 
Reported: 2016-03-22 09:43 UTC by Antonin Pagac
Modified: 2016-09-13 16:27 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-13 16:27:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1862 0 normal SHIPPED_LIVE Red Hat Quickstart Installer 1.0 2016-09-13 20:18:48 UTC

Description Antonin Pagac 2016-03-22 09:43:58 UTC
Description of problem:
I have by mistake selected a non-empty folder to use as a storage for RHEV during the deployment configuration, which caused RHEV to fail when adding storage domain. From RHEV's engine.log:

"Message: Failed to attach Storage Domain my_storage to Data Center Default."

This is expected. However, it also caused RHCI to poll for the DataCenter all night, and in the morning I can see the deployment is still in progress with 87.5%. From deployment.log:

"2016-03-22 05:37:49 [I] ================ Rhev::WaitForDataCenter get_status method ===================="

is repeating for ~17 hours now.

Version-Release number of selected component (if applicable):
TP3 RC1

How reproducible:
Happened to me once

Steps to Reproduce:
1. Start a RHEV deployment.
2. Input non-empty folder to be used as a storage domain.
3. Installation of both host and engine completes successfully, but RHEV DataCenter is not green and RHCI does not stop polling for it's status.

Actual results:
RHCI polling for DataCenter status indefinitely

Expected results:
RHCI polling for DataCenter status with a timeout (1 hour)

Additional info:

Comment 1 John Matthews 2016-03-31 12:31:28 UTC
Below commit will provide a warning prior to starting the deployment that the NFS path is non-empty.

https://github.com/fusor/fusor/pull/736


(Note this commit is not in TP3, it's in our master branch but we don't plan to bring this into TP3)

Comment 2 bmorriso 2016-07-14 17:39:51 UTC
Failed to verify on compose QCI-1.2-RHEL-7-20160711.t.1

I went through two deployments for this and didn't receive a warning about a non-empty NFS share for either. 

In the first case, the directory contained text files and directories that were added manually. No warning appeared, but the RHEV deployment succeeded anyway, and just left the added files where they were. 

In the second case, the directory contained the contents of a previous RHEV deployment that had been torn down previously. No warning appeared, but in this case the RHEV deployment failed due to being unable to attach to the storage domain.

From the production.log in both cases:

2016-07-13 16:01:33 [app] [I] Processing by Fusor::Api::V21::DeploymentsController#check_mount_point as JSON                                                                
2016-07-13 16:01:33 [app] [I]   Parameters: {"path"=>"/var/lib/exports/vms", "address"=>"192.168.165.254", "type"=>"NFS", "api_version"=>"v21", "id"=>"2", "deployment"=>{}}
2016-07-13 16:01:33 [app] [I] Completed 200 OK in 94ms (Views: 9.8ms | ActiveRecord: 4.9ms)

Comment 8 bmorriso 2016-07-25 20:23:15 UTC
*** Bug 1357684 has been marked as a duplicate of this bug. ***

Comment 9 Dylan Murray 2016-08-01 18:02:13 UTC
The ISO environment was failing due to a SElinux violation. I have submitted a PR to add read capabilities on NFS shares to prevent the SElinux violation. https://github.com/fusor/fusor-selinux/pull/26

Comment 10 John Matthews 2016-08-02 18:44:58 UTC
QCI-1.0-RHEL-7-20160801.t.2-QCI-x86_64-dvd1.iso

Comment 11 Dylan Murray 2016-08-03 18:17:38 UTC
https://github.com/fusor/fusor-selinux/pull/27

PR to add mounton capabilites as well.

Comment 12 John Matthews 2016-08-05 17:18:28 UTC
Will be in compose 8/5

Comment 13 bmorriso 2016-08-09 19:43:19 UTC
Failed QA on QCI-1.0-RHEL-7-20160808.t.0

I tested this twice with a non-empty NFS share. In the first instance, the share contained directories and text files added by me. In the second instance, it contained the files from a RHEV storage domain created in a previous deployment.

No warnings appeared on the RHV Storage or Installation Review pages related to non-empty NFS shares. These messages appeared in /var/log/audit/audit.log:

type=AVC msg=audit(1470757954.379:4579): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123793 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir
type=SYSCALL msg=audit(1470757954.379:4579): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=7fe967f50e60 a2=90800 a3=0 items=0 ppid=22898 pid=23336 auid=4294967295 uid=989 gid=987 euid=989 suid=989 fsuid=989 egid=987 
sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="diagnostic_con*" exe="/opt/rh/rh-ruby22/root/usr/bin/ruby" subj=system_u:system_r:passenger_t:s0 key=(null)
type=AVC msg=audit(1470757954.469:4580): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123794 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir
type=SYSCALL msg=audit(1470757954.469:4580): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=7fe967f4f070 a2=90800 a3=0 items=0 ppid=22898 pid=23336 auid=4294967295 uid=989 gid=987 euid=989 suid=989 fsuid=989 egid=987 
sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="diagnostic_con*" exe="/opt/rh/rh-ruby22/root/usr/bin/ruby" subj=system_u:system_r:passenger_t:s0 key=(null)
type=AVC msg=audit(1470757965.553:4581): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123793 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir
type=SYSCALL msg=audit(1470757965.553:4581): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=7fe9659e7e80 a2=90800 a3=0 items=0 ppid=22898 pid=23336 auid=4294967295 uid=989 gid=987 euid=989 suid=989 fsuid=989 egid=987 
sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="diagnostic_con*" exe="/opt/rh/rh-ruby22/root/usr/bin/ruby" subj=system_u:system_r:passenger_t:s0 key=(null)
type=AVC msg=audit(1470757965.631:4582): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123794 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir
type=SYSCALL msg=audit(1470757965.631:4582): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=7fe959f97b30 a2=90800 a3=0 items=0 ppid=22898 pid=23336 auid=4294967295 uid=989 gid=987 euid=989 suid=989 fsuid=989 egid=987 
sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="diagnostic_con*" exe="/opt/rh/rh-ruby22/root/usr/bin/ruby" subj=system_u:system_r:passenger_t:s0 key=(null)
type=AVC msg=audit(1470757993.331:4583): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123793 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir
type=SYSCALL msg=audit(1470757993.331:4583): arch=c000003e syscall=257 success=no exit=-13 a0=ffffffffffffff9c a1=7fe96d2effd0 a2=90800 a3=0 items=0 ppid=22898 pid=23336 auid=4294967295 uid=989 gid=987 euid=989 suid=989 fsuid=989 egid=987 
sgid=987 fsgid=987 tty=(none) ses=4294967295 comm="diagnostic_con*" exe="/opt/rh/rh-ruby22/root/usr/bin/ruby" subj=system_u:system_r:passenger_t:s0 key=(null)
type=AVC msg=audit(1470757993.421:4584): avc:  denied  { open } for  pid=23336 comm="diagnostic_con*" path="/tmp/fusor-test-mount-1" dev="0:40" ino=2123794 scontext=system_u:system_r:passenger_t:s0 tcontext=system_u:object_r:nfs_t:s0 tclass=dir

Comment 14 Thom Carlin 2016-08-10 16:14:58 UTC
Note the AVC's appear on the Sat 6 server (acting as the NFS server)

Comment 15 Dylan Murray 2016-08-10 19:59:51 UTC
Added open capabilities for passenger_t to resolve the above selinux problems. https://github.com/fusor/fusor-selinux/pull/29

Comment 16 John Matthews 2016-08-15 17:56:59 UTC
Expected to be in QCI-1.0-RHEL-7-20160815.t.0-QCI-x86_64-dvd1.iso

Comment 17 bmorriso 2016-08-17 18:47:27 UTC
Verified in compose QCI-1.0-RHEL-7-20160817.t.0

Using a non-empty NFS share, I received a warning on "2D. Storage":

'Storage domain my_storage is not empty. This could cause deployment problems.'

Comment 19 errata-xmlrpc 2016-09-13 16:27:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1862


Note You need to log in before you can comment on or make changes to this bug.