Bug 1182616 - [ISO] iso domain state is stuck on "Preparing For Maintenance" when using json-rpc
Summary: [ISO] iso domain state is stuck on "Preparing For Maintenance" when using jso...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-webadmin-portal
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0-1
Assignee: Liron Aravot
QA Contact: Ori Gofen
URL:
Whiteboard: storage
Depends On: 1199835
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-15 15:00 UTC by Ori Gofen
Modified: 2016-02-10 18:05 UTC (History)
16 users (show)

Fixed In Version: org.ovirt.engine-root-3.5.0-31
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1199835 (view as bug list)
Environment:
Last Closed: 2015-02-16 14:50:45 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine/vdsm.log (1.81 MB, text/plain)
2015-01-26 13:35 UTC, mauro fattore
no flags Details
new traces (8.37 MB, application/x-tar)
2015-01-27 14:10 UTC, mauro fattore
no flags Details
image (123.72 KB, image/png)
2015-01-27 15:39 UTC, mauro fattore
no flags Details
storage tab (104.70 KB, image/png)
2015-01-27 15:40 UTC, mauro fattore
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0230 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.5.0-1 ASYNC 2015-02-16 19:50:27 UTC

Description Ori Gofen 2015-01-15 15:00:18 UTC
Description of problem:

oVirt web-admin doesn't change iso domain state to, 'maintenance', upon domain's deactivation, instead the ui shows the domain as "Preparing For Maintenance",
forever, therefore doesn't grant the domain the available, "maintenance mode", operation options, and that's bad.
as descripted above happens with json-rpc, does not reproduce with xml-rpc

Version-Release number of selected component (if applicable):
13.7

How reproducible:
100%

Steps to Reproduce:
1.deactivate an iso domain (make sure host's rpc flag is set to, 'json-rpc'

Actual results:
oVirt webadmin reports the domain as being, "Preparing For Maintenance" forever

Expected results:
domain status should change to, 'maintenance', the options available to a 'maintained domain' should appear otherwise it's useless to me.

Additional info:

Comment 1 Ori Gofen 2015-01-15 15:32:16 UTC
revising my decision, not an infra bug(just a strange coincidence), sorry, I'll update this asap

Comment 2 Ori Gofen 2015-01-15 15:56:07 UTC
found this Traceback on vdsm.log
the operation of deactivating iso domain do work from time to time, can't tell what's really causing this, please advise.

Thread-8665::INFO::2015-01-15 17:00:19,412::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'connection': u'10.35.160.108:/RHEV/ogofen/iso-domain', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'password': '******', u'id': u'7fd8d5b4-9bf9-467f-a160-e109fbcf903a', u'port': u''}], options=None) 
Thread-8665::DEBUG::2015-01-15 17:00:19,413::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain (cwd None) 
Thread-8665::ERROR::2015-01-15 17:00:19,444::hsm::2522::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer 
Traceback (most recent call last): 
  File "/usr/share/vdsm/storage/hsm.py", line 2518, in disconnectStorageServer 
    conObj.disconnect() 
  File "/usr/share/vdsm/storage/storageServer.py", line 334, in disconnect 
    return self._mountCon.disconnect() 
  File "/usr/share/vdsm/storage/storageServer.py", line 235, in disconnect 
    self._mount.umount(True, True) 
  File "/usr/share/vdsm/storage/mount.py", line 254, in umount 
    return self._runcmd(cmd, timeout) 
  File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd 
    raise MountError(rc, ";".join((out, err))) 
MountError: (32, ';umount: /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain: mountpoint not found\n')

Comment 3 Allon Mureinik 2015-01-15 21:37:35 UTC
Liron, can you take a look please?

Comment 4 Liron Aravot 2015-01-18 11:34:16 UTC
Ori, how much Data Centers is the domain attached to?
have you checked that this isn't a UI refresh issue?

Furthermore, please try to reproduce (as the reproducer here is quick) and attach the logs.

thanks.

Comment 5 Liron Aravot 2015-01-18 18:56:07 UTC
Ori, if it was somehow unclear from my last comment, this bug lacks the logs.

Comment 6 Allon Mureinik 2015-01-20 16:40:29 UTC
(In reply to Ori Gofen from comment #0)
> as descripted above happens with json-rpc, does not reproduce with xml-rpc
> 

(In reply to Ori Gofen from comment #1)
> revising my decision, not an infra bug(just a strange coincidence), sorry,
> I'll update this asap

Not sure I'm following.
Ori - is this bug JSON-RPC specific or not?

Comment 7 Ori Gofen 2015-01-21 12:20:07 UTC
(In reply to Allon Mureinik from comment #6)
> (In reply to Ori Gofen from comment #0)
> > as descripted above happens with json-rpc, does not reproduce with xml-rpc
> > 
> 
> (In reply to Ori Gofen from comment #1)
> > revising my decision, not an infra bug(just a strange coincidence), sorry,
> > I'll update this asap
> 
> Not sure I'm following.
> Ori - is this bug JSON-RPC specific or not?

(In reply to Liron Aravot from comment #5)
> Ori, if it was somehow unclear from my last comment, this bug lacks the logs.

Hey guys, sorry about the long interval, I still don't know what is the cause for this, probably not a json-rpc bug but as mentioned at comment #1, I still can't point out what caused this, but I tend to think it's related to storage timeout we had, I need more time to explore this as I'm loaded right now

Comment 8 Allon Mureinik 2015-01-21 14:29:16 UTC
Returning the needinfo - unless we have a clear reproducer with clear parameters, we cannot proceed.

Comment 9 Allon Mureinik 2015-01-25 16:36:52 UTC
Liron, this seems to reproduce on our dev-env too.
Please take a look?

Comment 10 mauro fattore 2015-01-26 13:12:58 UTC
Hi,
   the described  bug appears in 3.5.1 too (oVirt Engine Version: 3.5.1-1.el6).

if ISO remains locked it s not possible to put in maintenance any other storage domains..
Please advice for any patch or workaround

regards

Comment 11 Ori Gofen 2015-01-26 13:23:33 UTC
(In reply to mauro from comment #10)
> Hi,
>    the described  bug appears in 3.5.1 too (oVirt Engine Version:
> 3.5.1-1.el6).
> 
> if ISO remains locked it s not possible to put in maintenance any other
> storage domains..
> Please advice for any patch or workaround
> 
> regards

Hey Mauro, this behavior is well known for oVirt >= 3.5, but as I browse through my old server logs, it seems quite random as well, please try to attach engine+vdsm logs here if possible.
thanx

Comment 12 mauro fattore 2015-01-26 13:35:14 UTC
Created attachment 984215 [details]
engine/vdsm.log

Comment 13 Yaniv Lavi 2015-01-26 13:37:29 UTC
Seems basic flow broken, moving to consider for async release.

Comment 14 mauro fattore 2015-01-26 14:08:46 UTC
At my side the bug is almost persistent


traces:
i guess only DEBUG level is active.
in attachment engine.log + vdsm.log of storage node s20gfs.ovirt.prisma


- If you need deeper trace level please inform me how to set up ( config file, process restart etc)


- Is there any way to trigger maintenance mode of ISO storagedomain by CLI command???


[oVirt shell (connected)]# list datacenters --show-all 

id                                         : 00000002-0002-0002-0002-00000000001a
name                                       : Default
description                                : The default Data Center
local                                      : False
status-state                               : up
storage_format                             : v3
supported_versions-version-major           : 3
supported_versions-version-minor           : 5
version-major                              : 3
version-minor                              : 5



list storagedomains --datacenter-identifier 00000002-0002-0002-0002-00000000001a

id         : 6abdee28-5a1a-458a-93c5-e1081f7feac7
name       : DATA

id         : 2ba95789-fc29-401f-a3c1-57e6526b4983
name       : ISO

[oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a 
async                correlation_id       grace_period-expiry  
[oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a

Comment 15 Ori Gofen 2015-01-27 13:01:27 UTC
Mauro, there is a way of-course, I will check it out for you please e-mail me at ogofen.

btw from py-sdk this flow should look like:

dc = api.datacenters.get(YOUR_DC_NAME_HERE)
isod = dc.storagedomains.get(YOUR_ISO_DOMAIN_NAME_HERE)
isod.deactivate()

please before you try to deactivate from api,
copy /var/log/ovirt-engine/engine.log && /var/log/vdsm/vdsm.log to one folder, tar them to one file, and upload it again, the current attachment is not so clear. 
thanx

Comment 16 mauro fattore 2015-01-27 14:10:11 UTC
Created attachment 984682 [details]
new traces

Hi ,in attachment vdsm/engine traces of 26 27 January

Thanks for the workaround but i don 't feel comfortable to modify APIs  
for  administrative topics

Comment 17 Liron Aravot 2015-01-27 15:06:25 UTC
Hi Mauro,
how many hosts do you have in the data center? how many datacenters is the domain attached to?

can you please provide also screen shots of the hosts and storage tabs?

thanks.

Comment 18 Liron Aravot 2015-01-27 15:20:52 UTC
mauro,
asked for the info for verification, the issue seems related for having a gluster cluster.

Comment 19 Liron Aravot 2015-01-27 15:30:25 UTC
please also attach the exact rpm version, thanks.

Comment 20 Liron Aravot 2015-01-27 15:37:54 UTC
the problem in maorus case seems to be related to the fact that he has gluster nodes, we do not collect domain monitoring information from gluster nodes (See bug 1105513) and we do not consider them when loading hosts by status for the data center..but when moving domain to maintenance we load all the hosts, including the gluster one to see if they don't see the domain.
as we do not collect monitoring information from that hosts, it should prevent from the domain to move to maintenance.

anyway, that's for mauro case (and it obviously should be fixed..but afaik it's not the same issue as described to happen on Ori/Allon envs) - so perhaps we should seperate to two different bugs (pending for more info on their cases).

Comment 21 mauro fattore 2015-01-27 15:39:46 UTC
Created attachment 984723 [details]
image

only data center default is used. ISO domain aand DATA domain are defined and attached to this domain.


v10 and v11 are hypervisor hosts    s10 s11 are storage ( gluster) hosts

Comment 22 mauro fattore 2015-01-27 15:40:35 UTC
Created attachment 984724 [details]
storage tab

Comment 23 mauro fattore 2015-01-27 15:49:01 UTC
[root@s20 tmp]# rpm -qa | grep gluster
glusterfs-libs-3.6.1-1.el6.x86_64
glusterfs-fuse-3.6.1-1.el6.x86_64
glusterfs-3.6.1-1.el6.x86_64
glusterfs-cli-3.6.1-1.el6.x86_64
vdsm-gluster-4.16.10-8.gitc937927.el6.noarch
glusterfs-api-3.6.1-1.el6.x86_64
glusterfs-rdma-3.6.1-1.el6.x86_64
glusterfs-server-3.6.1-1.el6.x86_64

Comment 24 Allon Mureinik 2015-01-27 15:57:47 UTC
Mauro, Liron, let's handle one issue at a time.
If this issue isn't EXACTLY the same issue that Ori is reporting, please open a new BZ.

Comment 25 Tal Nisan 2015-01-28 14:17:50 UTC
BZ 1186687 opened for Mauro's issue, as for the original issue we could not reproduce it even with the help of QE, it seems that the fix to to BZ 1184454 has fixed the issue, according to that moving to modified so it can be tested in the next version.

Comment 26 Eyal Edri 2015-01-28 14:30:16 UTC
tal, is this a test only bug? 
if not, please attached merged patch to the tracker if this is fixed for 3.5.0-1.

Comment 27 Allon Mureinik 2015-01-28 14:40:57 UTC
(In reply to Eyal Edri from comment #26)
> tal, is this a test only bug? 
Yes, it is, thanks.
I've added the appropriate keyword.

Comment 28 Julie 2015-02-02 00:55:23 UTC
If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format:

Cause:
Consequence:
Fix:
Result:

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Comment 29 Ori Gofen 2015-02-02 12:07:11 UTC
Couldn't reproduce this bug with vt13.9, the fix have probably been merged as previous json issues also been fixed. verifying this bug (with a small doubt).
If the same behavior will be encountered with an appropriate steps to reproduce, I will reopen.

Comment 31 errata-xmlrpc 2015-02-16 14:50:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0230.html


Note You need to log in before you can comment on or make changes to this bug.