Description of problem: oVirt web-admin doesn't change iso domain state to, 'maintenance', upon domain's deactivation, instead the ui shows the domain as "Preparing For Maintenance", forever, therefore doesn't grant the domain the available, "maintenance mode", operation options, and that's bad. as descripted above happens with json-rpc, does not reproduce with xml-rpc Version-Release number of selected component (if applicable): 13.7 How reproducible: 100% Steps to Reproduce: 1.deactivate an iso domain (make sure host's rpc flag is set to, 'json-rpc' Actual results: oVirt webadmin reports the domain as being, "Preparing For Maintenance" forever Expected results: domain status should change to, 'maintenance', the options available to a 'maintained domain' should appear otherwise it's useless to me. Additional info:
revising my decision, not an infra bug(just a strange coincidence), sorry, I'll update this asap
found this Traceback on vdsm.log the operation of deactivating iso domain do work from time to time, can't tell what's really causing this, please advise. Thread-8665::INFO::2015-01-15 17:00:19,412::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'connection': u'10.35.160.108:/RHEV/ogofen/iso-domain', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'password': '******', u'id': u'7fd8d5b4-9bf9-467f-a160-e109fbcf903a', u'port': u''}], options=None) Thread-8665::DEBUG::2015-01-15 17:00:19,413::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain (cwd None) Thread-8665::ERROR::2015-01-15 17:00:19,444::hsm::2522::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 2518, in disconnectStorageServer conObj.disconnect() File "/usr/share/vdsm/storage/storageServer.py", line 334, in disconnect return self._mountCon.disconnect() File "/usr/share/vdsm/storage/storageServer.py", line 235, in disconnect self._mount.umount(True, True) File "/usr/share/vdsm/storage/mount.py", line 254, in umount return self._runcmd(cmd, timeout) File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd raise MountError(rc, ";".join((out, err))) MountError: (32, ';umount: /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain: mountpoint not found\n')
Liron, can you take a look please?
Ori, how much Data Centers is the domain attached to? have you checked that this isn't a UI refresh issue? Furthermore, please try to reproduce (as the reproducer here is quick) and attach the logs. thanks.
Ori, if it was somehow unclear from my last comment, this bug lacks the logs.
(In reply to Ori Gofen from comment #0) > as descripted above happens with json-rpc, does not reproduce with xml-rpc > (In reply to Ori Gofen from comment #1) > revising my decision, not an infra bug(just a strange coincidence), sorry, > I'll update this asap Not sure I'm following. Ori - is this bug JSON-RPC specific or not?
(In reply to Allon Mureinik from comment #6) > (In reply to Ori Gofen from comment #0) > > as descripted above happens with json-rpc, does not reproduce with xml-rpc > > > > (In reply to Ori Gofen from comment #1) > > revising my decision, not an infra bug(just a strange coincidence), sorry, > > I'll update this asap > > Not sure I'm following. > Ori - is this bug JSON-RPC specific or not? (In reply to Liron Aravot from comment #5) > Ori, if it was somehow unclear from my last comment, this bug lacks the logs. Hey guys, sorry about the long interval, I still don't know what is the cause for this, probably not a json-rpc bug but as mentioned at comment #1, I still can't point out what caused this, but I tend to think it's related to storage timeout we had, I need more time to explore this as I'm loaded right now
Returning the needinfo - unless we have a clear reproducer with clear parameters, we cannot proceed.
Liron, this seems to reproduce on our dev-env too. Please take a look?
Hi, the described bug appears in 3.5.1 too (oVirt Engine Version: 3.5.1-1.el6). if ISO remains locked it s not possible to put in maintenance any other storage domains.. Please advice for any patch or workaround regards
(In reply to mauro from comment #10) > Hi, > the described bug appears in 3.5.1 too (oVirt Engine Version: > 3.5.1-1.el6). > > if ISO remains locked it s not possible to put in maintenance any other > storage domains.. > Please advice for any patch or workaround > > regards Hey Mauro, this behavior is well known for oVirt >= 3.5, but as I browse through my old server logs, it seems quite random as well, please try to attach engine+vdsm logs here if possible. thanx
Created attachment 984215 [details] engine/vdsm.log
Seems basic flow broken, moving to consider for async release.
At my side the bug is almost persistent traces: i guess only DEBUG level is active. in attachment engine.log + vdsm.log of storage node s20gfs.ovirt.prisma - If you need deeper trace level please inform me how to set up ( config file, process restart etc) - Is there any way to trigger maintenance mode of ISO storagedomain by CLI command??? [oVirt shell (connected)]# list datacenters --show-all id : 00000002-0002-0002-0002-00000000001a name : Default description : The default Data Center local : False status-state : up storage_format : v3 supported_versions-version-major : 3 supported_versions-version-minor : 5 version-major : 3 version-minor : 5 list storagedomains --datacenter-identifier 00000002-0002-0002-0002-00000000001a id : 6abdee28-5a1a-458a-93c5-e1081f7feac7 name : DATA id : 2ba95789-fc29-401f-a3c1-57e6526b4983 name : ISO [oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a async correlation_id grace_period-expiry [oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a
Mauro, there is a way of-course, I will check it out for you please e-mail me at ogofen. btw from py-sdk this flow should look like: dc = api.datacenters.get(YOUR_DC_NAME_HERE) isod = dc.storagedomains.get(YOUR_ISO_DOMAIN_NAME_HERE) isod.deactivate() please before you try to deactivate from api, copy /var/log/ovirt-engine/engine.log && /var/log/vdsm/vdsm.log to one folder, tar them to one file, and upload it again, the current attachment is not so clear. thanx
Created attachment 984682 [details] new traces Hi ,in attachment vdsm/engine traces of 26 27 January Thanks for the workaround but i don 't feel comfortable to modify APIs for administrative topics
Hi Mauro, how many hosts do you have in the data center? how many datacenters is the domain attached to? can you please provide also screen shots of the hosts and storage tabs? thanks.
mauro, asked for the info for verification, the issue seems related for having a gluster cluster.
please also attach the exact rpm version, thanks.
the problem in maorus case seems to be related to the fact that he has gluster nodes, we do not collect domain monitoring information from gluster nodes (See bug 1105513) and we do not consider them when loading hosts by status for the data center..but when moving domain to maintenance we load all the hosts, including the gluster one to see if they don't see the domain. as we do not collect monitoring information from that hosts, it should prevent from the domain to move to maintenance. anyway, that's for mauro case (and it obviously should be fixed..but afaik it's not the same issue as described to happen on Ori/Allon envs) - so perhaps we should seperate to two different bugs (pending for more info on their cases).
Created attachment 984723 [details] image only data center default is used. ISO domain aand DATA domain are defined and attached to this domain. v10 and v11 are hypervisor hosts s10 s11 are storage ( gluster) hosts
Created attachment 984724 [details] storage tab
[root@s20 tmp]# rpm -qa | grep gluster glusterfs-libs-3.6.1-1.el6.x86_64 glusterfs-fuse-3.6.1-1.el6.x86_64 glusterfs-3.6.1-1.el6.x86_64 glusterfs-cli-3.6.1-1.el6.x86_64 vdsm-gluster-4.16.10-8.gitc937927.el6.noarch glusterfs-api-3.6.1-1.el6.x86_64 glusterfs-rdma-3.6.1-1.el6.x86_64 glusterfs-server-3.6.1-1.el6.x86_64
Mauro, Liron, let's handle one issue at a time. If this issue isn't EXACTLY the same issue that Ori is reporting, please open a new BZ.
BZ 1186687 opened for Mauro's issue, as for the original issue we could not reproduce it even with the help of QE, it seems that the fix to to BZ 1184454 has fixed the issue, according to that moving to modified so it can be tested in the next version.
tal, is this a test only bug? if not, please attached merged patch to the tracker if this is fixed for 3.5.0-1.
(In reply to Eyal Edri from comment #26) > tal, is this a test only bug? Yes, it is, thanks. I've added the appropriate keyword.
If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format: Cause: Consequence: Fix: Result: The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -.
Couldn't reproduce this bug with vt13.9, the fix have probably been merged as previous json issues also been fixed. verifying this bug (with a small doubt). If the same behavior will be encountered with an appropriate steps to reproduce, I will reopen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0230.html