Bug 1182616
| Summary: | [ISO] iso domain state is stuck on "Preparing For Maintenance" when using json-rpc | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ori Gofen <ogofen> | ||||||||||
| Component: | ovirt-engine-webadmin-portal | Assignee: | Liron Aravot <laravot> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Ori Gofen <ogofen> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 3.5.0 | CC: | acanan, amureini, ecohen, eedri, gklein, iheim, juwu, laravot, lsurette, maurof, ogofen, rbalakri, Rhev-m-bugs, tnisan, yeylon, ylavi | ||||||||||
| Target Milestone: | --- | Keywords: | TestOnly | ||||||||||
| Target Release: | 3.5.0-1 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | storage | ||||||||||||
| Fixed In Version: | org.ovirt.engine-root-3.5.0-31 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 1199835 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2015-02-16 14:50:45 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | 1199835 | ||||||||||||
| Bug Blocks: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Ori Gofen
2015-01-15 15:00:18 UTC
revising my decision, not an infra bug(just a strange coincidence), sorry, I'll update this asap found this Traceback on vdsm.log
the operation of deactivating iso domain do work from time to time, can't tell what's really causing this, please advise.
Thread-8665::INFO::2015-01-15 17:00:19,412::logUtils::44::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=1, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'connection': u'10.35.160.108:/RHEV/ogofen/iso-domain', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'password': '******', u'id': u'7fd8d5b4-9bf9-467f-a160-e109fbcf903a', u'port': u''}], options=None)
Thread-8665::DEBUG::2015-01-15 17:00:19,413::mount::227::Storage.Misc.excCmd::(_runcmd) /usr/bin/sudo -n /usr/bin/umount -f -l /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain (cwd None)
Thread-8665::ERROR::2015-01-15 17:00:19,444::hsm::2522::Storage.HSM::(disconnectStorageServer) Could not disconnect from storageServer
Traceback (most recent call last):
File "/usr/share/vdsm/storage/hsm.py", line 2518, in disconnectStorageServer
conObj.disconnect()
File "/usr/share/vdsm/storage/storageServer.py", line 334, in disconnect
return self._mountCon.disconnect()
File "/usr/share/vdsm/storage/storageServer.py", line 235, in disconnect
self._mount.umount(True, True)
File "/usr/share/vdsm/storage/mount.py", line 254, in umount
return self._runcmd(cmd, timeout)
File "/usr/share/vdsm/storage/mount.py", line 239, in _runcmd
raise MountError(rc, ";".join((out, err)))
MountError: (32, ';umount: /rhev/data-center/mnt/10.35.160.108:_RHEV_ogofen_iso-domain: mountpoint not found\n')
Liron, can you take a look please? Ori, how much Data Centers is the domain attached to? have you checked that this isn't a UI refresh issue? Furthermore, please try to reproduce (as the reproducer here is quick) and attach the logs. thanks. Ori, if it was somehow unclear from my last comment, this bug lacks the logs. (In reply to Ori Gofen from comment #0) > as descripted above happens with json-rpc, does not reproduce with xml-rpc > (In reply to Ori Gofen from comment #1) > revising my decision, not an infra bug(just a strange coincidence), sorry, > I'll update this asap Not sure I'm following. Ori - is this bug JSON-RPC specific or not? (In reply to Allon Mureinik from comment #6) > (In reply to Ori Gofen from comment #0) > > as descripted above happens with json-rpc, does not reproduce with xml-rpc > > > > (In reply to Ori Gofen from comment #1) > > revising my decision, not an infra bug(just a strange coincidence), sorry, > > I'll update this asap > > Not sure I'm following. > Ori - is this bug JSON-RPC specific or not? (In reply to Liron Aravot from comment #5) > Ori, if it was somehow unclear from my last comment, this bug lacks the logs. Hey guys, sorry about the long interval, I still don't know what is the cause for this, probably not a json-rpc bug but as mentioned at comment #1, I still can't point out what caused this, but I tend to think it's related to storage timeout we had, I need more time to explore this as I'm loaded right now Returning the needinfo - unless we have a clear reproducer with clear parameters, we cannot proceed. Liron, this seems to reproduce on our dev-env too. Please take a look? Hi, the described bug appears in 3.5.1 too (oVirt Engine Version: 3.5.1-1.el6). if ISO remains locked it s not possible to put in maintenance any other storage domains.. Please advice for any patch or workaround regards (In reply to mauro from comment #10) > Hi, > the described bug appears in 3.5.1 too (oVirt Engine Version: > 3.5.1-1.el6). > > if ISO remains locked it s not possible to put in maintenance any other > storage domains.. > Please advice for any patch or workaround > > regards Hey Mauro, this behavior is well known for oVirt >= 3.5, but as I browse through my old server logs, it seems quite random as well, please try to attach engine+vdsm logs here if possible. thanx Created attachment 984215 [details]
engine/vdsm.log
Seems basic flow broken, moving to consider for async release. At my side the bug is almost persistent traces: i guess only DEBUG level is active. in attachment engine.log + vdsm.log of storage node s20gfs.ovirt.prisma - If you need deeper trace level please inform me how to set up ( config file, process restart etc) - Is there any way to trigger maintenance mode of ISO storagedomain by CLI command??? [oVirt shell (connected)]# list datacenters --show-all id : 00000002-0002-0002-0002-00000000001a name : Default description : The default Data Center local : False status-state : up storage_format : v3 supported_versions-version-major : 3 supported_versions-version-minor : 5 version-major : 3 version-minor : 5 list storagedomains --datacenter-identifier 00000002-0002-0002-0002-00000000001a id : 6abdee28-5a1a-458a-93c5-e1081f7feac7 name : DATA id : 2ba95789-fc29-401f-a3c1-57e6526b4983 name : ISO [oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a async correlation_id grace_period-expiry [oVirt shell (connected)]# action storagedomain deactivate --datacenter-identifier 00000002-0002-0002-0002-00000000001a Mauro, there is a way of-course, I will check it out for you please e-mail me at ogofen. btw from py-sdk this flow should look like: dc = api.datacenters.get(YOUR_DC_NAME_HERE) isod = dc.storagedomains.get(YOUR_ISO_DOMAIN_NAME_HERE) isod.deactivate() please before you try to deactivate from api, copy /var/log/ovirt-engine/engine.log && /var/log/vdsm/vdsm.log to one folder, tar them to one file, and upload it again, the current attachment is not so clear. thanx Created attachment 984682 [details]
new traces
Hi ,in attachment vdsm/engine traces of 26 27 January
Thanks for the workaround but i don 't feel comfortable to modify APIs
for administrative topics
Hi Mauro, how many hosts do you have in the data center? how many datacenters is the domain attached to? can you please provide also screen shots of the hosts and storage tabs? thanks. mauro, asked for the info for verification, the issue seems related for having a gluster cluster. please also attach the exact rpm version, thanks. the problem in maorus case seems to be related to the fact that he has gluster nodes, we do not collect domain monitoring information from gluster nodes (See bug 1105513) and we do not consider them when loading hosts by status for the data center..but when moving domain to maintenance we load all the hosts, including the gluster one to see if they don't see the domain. as we do not collect monitoring information from that hosts, it should prevent from the domain to move to maintenance. anyway, that's for mauro case (and it obviously should be fixed..but afaik it's not the same issue as described to happen on Ori/Allon envs) - so perhaps we should seperate to two different bugs (pending for more info on their cases). Created attachment 984723 [details]
image
only data center default is used. ISO domain aand DATA domain are defined and attached to this domain.
v10 and v11 are hypervisor hosts s10 s11 are storage ( gluster) hosts
Created attachment 984724 [details]
storage tab
[root@s20 tmp]# rpm -qa | grep gluster glusterfs-libs-3.6.1-1.el6.x86_64 glusterfs-fuse-3.6.1-1.el6.x86_64 glusterfs-3.6.1-1.el6.x86_64 glusterfs-cli-3.6.1-1.el6.x86_64 vdsm-gluster-4.16.10-8.gitc937927.el6.noarch glusterfs-api-3.6.1-1.el6.x86_64 glusterfs-rdma-3.6.1-1.el6.x86_64 glusterfs-server-3.6.1-1.el6.x86_64 Mauro, Liron, let's handle one issue at a time. If this issue isn't EXACTLY the same issue that Ori is reporting, please open a new BZ. BZ 1186687 opened for Mauro's issue, as for the original issue we could not reproduce it even with the help of QE, it seems that the fix to to BZ 1184454 has fixed the issue, according to that moving to modified so it can be tested in the next version. tal, is this a test only bug? if not, please attached merged patch to the tracker if this is fixed for 3.5.0-1. (In reply to Eyal Edri from comment #26) > tal, is this a test only bug? Yes, it is, thanks. I've added the appropriate keyword. If this bug requires doc text for errata release, please provide draft text in the doc text field in the following format: Cause: Consequence: Fix: Result: The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Couldn't reproduce this bug with vt13.9, the fix have probably been merged as previous json issues also been fixed. verifying this bug (with a small doubt). If the same behavior will be encountered with an appropriate steps to reproduce, I will reopen. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0230.html |