Description of problem: After blocking connection between master domain, other domain(iscsi) does not become as new master domain Blocking Master data domain (NFS) connection on the SPM, the available iSCSI domain did not become the new Master data domain Version-Release number of selected component (if applicable): 3.5 vt3.1 How reproducible: 100% Steps to Reproduce: 1. On a setup with a single vdsm host, create 1 NFS Data storage domain (Master) 2. Create 2 iSCSI domains 3. Block the connection between vdsm host and the current master domain (NFS) Actual results: The oVirt engine does not migrate the Master domain to use iSCSI which is available and accessible Expected results: One of the available iSCSI domains should be selected as the new Master data domain
Created attachment 943988 [details] image and logs
lkuchlan, the other ISCSI domain in your setup (called 'iscsi') is in maintenance so it's not available to become the new master anyway. The main issue here seems to be that vdsm isn't being fenced by sanlock after the connection to the master domain is blocked, this leads to having the engine/vdsm constantly fail with connection time out errors instead of failing with StoragePoolMasterNotFound exception which will lead to a reconstruct. lkuchlan, can you please try to reproduce and attach also the sanlock logs? nir, have you handled similar bug recently? repoStats/getStoragePoolInfo: Thread-705::INFO::2014-10-02 13:10:01,033::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {u'ea54e2ae-e283-460f-97e2-77af7737de7e': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '69.2', 'valid': False}, 'f24f6d96-51eb-495f-b5ec-6bd1be521de5': {'code': 200, 'version': -1, 'acquired': False, 'delay': '0.00613357', 'lastCheck': '60.4', 'valid': False}, u'96267cb4-280f-43c6-b530-afbf8852fed9': {'code': 358, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '61.2', 'valid': False}} Thread-758::DEBUG::2014-10-02 13:13:23,677::task::993::Storage.TaskManager.Task::(_decref) Task=`a19c73d0-f572-40ce-be4c-c5549f35e727`::ref 1 aborting False Thread-758::INFO::2014-10-02 13:13:23,730::logUtils::47::dispatcher::(wrapper) Run and protect: getStoragePoolInfo, Return response: {'info': {'name': 'No Des cription', 'isoprefix': '', 'pool_status': 'connected', 'lver': 47L, 'domains': '96267cb4-280f-43c6-b530-afbf8852fed9:Active,f24f6d96-51eb-495f-b5ec-6bd1be521 de5:Active,847b74d3-f092-40cf-9bb8-19c7ae498e62:Attached,ea54e2ae-e283-460f-97e2-77af7737de7e:Active', 'master_uuid': 'f24f6d96-51eb-495f-b5ec-6bd1be521de5', 'version': '3', 'spm_id': 1, 'type': 'ISCSI', 'master_ver': 5}, 'dominfo': {'96267cb4-280f-43c6-b530-afbf8852fed9': {'status': 'Active', 'isoprefix': '', 'ale rts': [], 'version': -1}, 'f24f6d96-51eb-495f-b5ec-6bd1be521de5': {'status': 'Active', 'diskfree': '37044092928', 'isoprefix': '', 'alerts': [], 'disktotal': '53284438016', 'version': -1}, '847b74d3-f092-40cf-9bb8-19c7ae498e62': {'status': 'Attached', 'isoprefix': '', 'alerts': []}, 'ea54e2ae-e283-460f-97e2-77af773 7de7e': {'status': 'Active', 'isoprefix': '', 'alerts': [], 'version': -1}}}
Looks like a duplicate of bug 1141658. Please provide the version these packages: - selinux-policy - selinux-policy-targeted
Additionally, attach these files: - /var/log/messages - /var/log/sanlock.log - /var/log/audit/audit.log Add the file showing the timeframe of this error, it may be one of the rotated files (e.g. audit.log.3.gz)
Nir, I don't think it is a DUP of bug 1141658 because as we can see in the attached screenshot and in vdsm.log, the reconstruct statrs (and fails), but the host is not rebooted. As opposed to 1141658, in which the host is rebooted ~2 minutes after the connection to the master domain had lost. in 1141658, when the host becomes up again, the connection to the storage is resumed and the DC becomes active because the iptables rules are cleaned.
Liron, i can not reproduce it on rhel 6.6
(In reply to lkuchlan from comment #6) > Liron, i can not reproduce it on rhel 6.6 So what OS does this reproduce on? RHEL 7? RHEL 6.5?
(In reply to Allon Mureinik from comment #7) > (In reply to lkuchlan from comment #6) > > Liron, i can not reproduce it on rhel 6.6 > So what OS does this reproduce on? RHEL 7? RHEL 6.5? The bug reproduces on RHEL 7
(In reply to Nir Soffer from comment #3) > Looks like a duplicate of bug 1141658. > > Please provide the version these packages: > > - selinux-policy > - selinux-policy-targeted (In reply to Nir Soffer from comment #4) > Additionally, attach these files: > > - /var/log/messages > - /var/log/sanlock.log > - /var/log/audit/audit.log > Add the file showing the timeframe of this error, it may be one > of the rotated files (e.g. audit.log.3.gz) These requests were somehow missed between all the comments here. Liron, can you please provide this info?
Created attachment 964217 [details] logs packages version: libselinux-2.2.2-6.el7.x86_64 libselinux-ruby-2.2.2-6.el7.x86_64 selinux-policy-3.12.1-153.el7.noarch libselinux-utils-2.2.2-6.el7.x86_64 selinux-policy-targeted-3.12.1-153.el7.noarch libselinux-python-2.2.2-6.el7.x86_64 please find attached the logs
Created attachment 970134 [details] logs and image It is still reproduce please find attached the logs
Can you recreate this on RHEL 7.1?
(In reply to Yaniv Dary from comment #13) > Can you recreate this on RHEL 7.1? Hi Yaniv, It is not reproduced on RHEL 7.1
lkuchlan, can you please add the SELINUX logs on the target host? Bronce, can we bring this up with the RHEL PM to try to understand why this is happening and try to get the fix for thr policy to 7.0.z?
(In reply to Yaniv Dary from comment #18) > lkuchlan, can you please add the SELINUX logs on the target host? > Bronce, can we bring this up with the RHEL PM to try to understand why this > is happening and try to get the fix for thr policy to 7.0.z? Is there a rhel bz for this w/ the build that fixed it in 7.1?
(In reply to Bronce McClain from comment #19) > (In reply to Yaniv Dary from comment #18) > > lkuchlan, can you please add the SELINUX logs on the target host? > > Bronce, can we bring this up with the RHEL PM to try to understand why this > > is happening and try to get the fix for thr policy to 7.0.z? > > Is there a rhel bz for this w/ the build that fixed it in 7.1? They made extensive changes to selinux policy, so we don't know the bug number. lkuchlan, what 7.1 build did you test with?
vt13.4 contains a newer sanlock (selinux-policy-targeted >= 3.12.1-153.el7_0.13 ) than was checked here (see bug 1141658). We're unable to reproduce this in dev - can you please retry with the latest 3.5.0 on RHEL7, and confirm this is closed? Thanks!
Created attachment 985054 [details] selinux log Tested using vdsm-4.16.8.1-6.el7ev.x86_64, rhevm-3.5.0-0.30.el6ev.noarch Please find attached the selinux log on the target host
(In reply to lkuchlan from comment #22) > Created attachment 985054 [details] > selinux log > > Tested using vdsm-4.16.8.1-6.el7ev.x86_64, rhevm-3.5.0-0.30.el6ev.noarch > Please find attached the selinux log on the target host What are the results of this test? Did you reproduce this error or not? If it cannot be reproduced, this should be move to VERIFIED. The bug is still in ON_QA - do you need additional testing?
Adding back needinfo for Bronce, added in comment 18.
Tested using vdsm-4.16.8.1-6.el7ev.x86_64, rhevm-3.5.0-0.30.el6ev.noarch
RHEV-M 3.5.0 has been released, closing this bug.