Bug 918915
| Summary: | [vdsm] | Negative Flow | VDSM service is not operational (fails to commit storage actions) in case LVM operation disturbed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Elad <ebenahar> | ||||
| Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Elad <ebenahar> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 3.2.0 | CC: | abaron, acathrow, bazulay, hateya, iheim, jkt, lpeer, pstehlik | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 3.3.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | infra | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-07-29 13:38:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I don't see any malfunction in supervdsm in that log. Looks like vdsm never restarted in this log, and each operation that failed in this log has right reason for the failure. Can you aim me to specific location where you think that supervdsm gets stuck?
You get repeated error about wrong operator that vdsm receives, can it be that you send the request wrongly ?
421ec672-0978-4d7b-8973-e9f83d9eb99a::WARNING::2013-03-06 15:41:59,867::task::633::TaskManager.Task::(_dump) Task._dump: object zeroImage_e1d7d91d-3132-4001-942f-9ec412c268b2: <bound method BlockStorageDomain.zeroImage of <storage.blockSD.BlockStorageDomain instance at 0x7f524c0bf8c0>> (args: ('ad3962a4-30b8-47b1-a3df-cf3bd852cb20', 'e1d7d91d-3132-4001-942f-9ec412c268b2', {'a1b84906-73b8-4a75-a598-7c254c19c274': ImgsPar(imgs=('e1d7d91d-3132-4001-942f-9ec412c268b2',), parent='00000000-0000-0000-0000-000000000000')}) kwargs: {}) skipping field runcmd
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 629, in _dump
"character" % KEY_SEPERATOR)
ValueError: field and value cannot include = character
Do you use the engine to run this test?
About the error with CMD_LOWPRIO, I think it was fixed as part of bug 918469
And about:
Thread-111441::DEBUG::2013-03-06 15:07:55,971::misc::83::Storage.Misc.excCmd::(<lambda>) '/usr/bin/sudo -n /sbin/iscsiadm -m session -r 11 -u' (cwd None)
Thread-111441::DEBUG::2013-03-06 15:07:56,505::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-111441::ERROR::2013-03-06 15:07:56,506::supervdsmServer::81::SuperVdsm.ServerCallback::(wrapper) Error in readSessionInfo
Traceback (most recent call last):
File "/usr/share/vdsm/supervdsmServer.py", line 79, in wrapper
return func(*args, **kwargs)
File "/usr/share/vdsm/supervdsmServer.py", line 145, in readSessionInfo
return _readSessionInfo(sessionID)
File "/usr/share/vdsm/storage/iscsi.py", line 83, in readSessionInfo
raise OSError(errno.ENOENT, "No such session")
OSError: [Errno 2] No such session
This is probably because you killed the lvm processes and the iscsi session was not created.. it sounds pretty reasonable
Ayal, Is the above scenario interesting ? Have we ever encountered such a phenomena in the past ? Elad, Is this a new test ? (In reply to comment #3) > Elad, > > Is this a new test ? we always adding new tests, so for your question, yes, its a new test. > 421ec672-0978-4d7b-8973-e9f83d9eb99a::WARNING::2013-03-06 > 15:41:59,867::task::633::TaskManager.Task::(_dump) Task._dump: object > zeroImage_e1d7d91d-3132-4001-942f-9ec412c268b2: <bound method > BlockStorageDomain.zeroImage of <storage.blockSD.BlockStorageDomain instance > at 0x7f524c0bf8c0>> (args: ('ad3962a4-30b8-47b1-a3df-cf3bd852cb20', > 'e1d7d91d-3132-4001-942f-9ec412c268b2', > {'a1b84906-73b8-4a75-a598-7c254c19c274': > ImgsPar(imgs=('e1d7d91d-3132-4001-942f-9ec412c268b2',), > parent='00000000-0000-0000-0000-000000000000')}) kwargs: {}) skipping field > runcmd > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 629, in _dump > "character" % KEY_SEPERATOR) > ValueError: field and value cannot include = character This is a logging issue that has been fixed and is irrelevant here. (In reply to comment #2) > Ayal, > > Is the above scenario interesting ? > Have we ever encountered such a phenomena in the past ? *If* supervdsm hangs then we should understand why (and see if worth fixing), otherwise I wouldn't bother with it. The above scenario is synthetic (and is not likely to happen in a real deployment). Hence moving it to 3.3 Same comment as #6, any reason not to close this? moving to 3.4 for now |
Created attachment 706421 [details] logs Description of problem: When LVM operation is disturbed and get into zombie state, superVDSM get stuck and don't come back to normal when operation continue, hence vdsm service fails to commit any storage action such as getDeviceList, getStoragePoolInfo, hence pool becomes non-operational. Version-Release number of selected component (if applicable): vdsm-4.10.2-10.0.el6ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Run one or more LVM operations on VDSM (like CreateVolume or DeleteVolume) 2. run the following script: while true; do kill -STOP `pgrep lvm` && sleep 10 && kill -CONT `pgrep lvm`; done Actual results: superVDSM will stuck because LVM operation stoped but superVDSM won't go back to normal when the LVM operation will continue. note that lvm processes remaind in zombie state. Expected results: superVDSM should go back to normal when LVM operation get continued Additional info: see logs attatched