| Summary: | Clearing deleteImage task leave zombie dd process | ||
|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Nir Soffer <nsoffer> |
| Component: | Core | Assignee: | Idan Shaby <ishaby> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lilach Zitnitski <lzitnits> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.18.0 | CC: | amureini, bugs, nsoffer, ratamir, tnisan, ylavi |
| Target Milestone: | ovirt-4.2.0 | Flags: | ylavi:
ovirt-4.2+
|
| Target Release: | 4.20.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-12-20 11:04:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Is this still relevant with the new blkdiscard implementation? (In reply to Allon Mureinik from comment #1) > Is this still relevant with the new blkdiscard implementation? It is not related to the new implementation, but it may be fixed in master since https://gerrit.ovirt.org/69782 removed usage of ddWatchCopy. Should be easy to test as explained in comment 0. I suggest to move ask QE to test this. -------------------------------------- Tested with the following code: ---------------------------------------- ovirt-engine-4.2.0-0.0.master.20171030210714.gitef6bb9c.el7.centos.noarch vdsm-4.20.5-5.git3d47978.el7.centos.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Start live storage migration with a running vm 2. Stop the vm 3. Put host to maintenance Actual results: No zombie processes left Expected results: Moving to VERIFIED! This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |
Description of problem: After clearing a deleteImage task, we leave a zombie dd process: 2016-12-07 13:01:44,531 INFO (jsonrpc/0) [dispatcher] Run and protect: deleteImage(sdUUID=u'29eb00cc-4d39-44c9-90f3-aa4b88141eb5', spUUID=u'f9374c0e-ae24-4bc1-a596-f61d5f05bc5f', imgUUID=u'dda1d164-427a-48a6-80d2-8756da292814', postZero=u'true', force=u'false') (logUtils:49) 2016-12-07 13:01:44,987 INFO (jsonrpc/0) [dispatcher] Run and protect: deleteImage, Return response: None (logUtils:52) 2016-12-07 13:01:45,445 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Image.delete succeeded in 0.92 seconds (__init__:515) 2016-12-07 13:01:46,686 INFO (jsonrpc/7) [dispatcher] Run and protect: stopTask(taskID=u'55c4f367-bf64-4128-a056-9d12bcf3b0c5', spUUID=None, options=None) (logUtils:49) 2016-12-07 13:01:46,687 INFO (jsonrpc/7) [dispatcher] Run and protect: stopTask, Return response: True (logUtils:52) 2016-12-07 13:01:46,687 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call Task.stop succeeded in 0.00 seconds (__init__:515) 2016-12-07 13:01:47,343 ERROR (tmap/0) [storage.BlockSD] Zero volume 4958300c-e73b-4cf5-bd0a-9d16fd9b1c99 task 55c4f367-bf64-4128-a056-9d12bcf3b0c5 failed (blockSD:233) Traceback (most recent call last): File "/usr/share/vdsm/storage/blockSD.py", line 230, in zeroVolume misc.ddWatchCopy("/dev/zero", path, aborting, size) File "/usr/lib/python2.7/site-packages/vdsm/storage/misc.py", line 223, in ddWatchCopy ioclass=utils.IOCLASS.IDLE) File "/usr/lib/python2.7/site-packages/vdsm/commands.py", line 362, in watchCmd raise ActionStopped() ActionStopped: Action was stopped: () 2016-12-07 13:01:47,344 ERROR (tmap/1) [storage.BlockSD] Zero volume dbe2a240-4ed5-4282-8580-ebb66d489e10 task 55c4f367-bf64-4128-a056-9d12bcf3b0c5 failed (blockSD:233) Traceback (most recent call last): File "/usr/share/vdsm/storage/blockSD.py", line 230, in zeroVolume misc.ddWatchCopy("/dev/zero", path, aborting, size) File "/usr/lib/python2.7/site-packages/vdsm/storage/misc.py", line 223, in ddWatchCopy ioclass=utils.IOCLASS.IDLE) File "/usr/lib/python2.7/site-packages/vdsm/commands.py", line 362, in watchCmd raise ActionStopped() ActionStopped: Action was stopped: () 2016-12-07 13:01:47,346 ERROR (tasks/0) [storage.TaskManager.Task] (Task='55c4f367-bf64-4128-a056-9d12bcf3b0c5') Unexpected error (task:870) Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 877, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 333, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/blockSD.py", line 1141, in zeroImage zeroImgVolumes(sdUUID, imgUUID, toZero) File "/usr/share/vdsm/storage/blockSD.py", line 257, in zeroImgVolumes raise se.VolumesZeroingError(errors) VolumesZeroingError: Cannot zero out volume: (['Action was stopped: ()', 'Action was stopped: ()'],) # ps auxf vdsm 2742 2.1 2.2 4809412 83332 ? S<sl 09:58 3:54 /usr/bin/python2 /usr/share/vdsm/vdsm vdsm 3707 0.0 0.0 382608 1044 ? S<l 10:05 0:01 \_ /usr/libexec/ioprocess --read-pipe-fd 131 --write-pipe-fd 130 --max-threads 10 --max-queued-requests 10 vdsm 3762 0.0 0.0 382580 888 ? S<l 10:05 0:00 \_ /usr/libexec/ioprocess --read-pipe-fd 154 --write-pipe-fd 153 --max-threads 10 --max-queued-requests 10 vdsm 3777 0.0 0.0 382580 880 ? S<l 10:05 0:00 \_ /usr/libexec/ioprocess --read-pipe-fd 162 --write-pipe-fd 161 --max-threads 10 --max-queued-requests 10 vdsm 3793 0.0 0.0 382608 1020 ? S<l 10:05 0:00 \_ /usr/libexec/ioprocess --read-pipe-fd 178 --write-pipe-fd 177 --max-threads 10 --max-queued-requests 10 vdsm 20876 0.0 0.0 456340 1088 ? S<l 12:07 0:00 \_ /usr/libexec/ioprocess --read-pipe-fd 191 --write-pipe-fd 190 --max-threads 10 --max-queued-requests 10 vdsm 2441 0.0 0.0 382608 980 ? S<l 13:01 0:00 \_ /usr/libexec/ioprocess --read-pipe-fd 205 --write-pipe-fd 199 --max-threads 10 --max-queued-requests 10 vdsm 2596 0.0 0.0 0 0 ? Z< 13:01 0:00 \_ [dd] <defunct> vdsm 2597 0.0 0.0 0 0 ? Z< 13:01 0:00 \_ [dd] <defunct> This looks like an issue in vdsm.stoage.misc.ddWatchCopy Version-Release number of selected component (if applicable): master How reproducible: Not sure Steps to Reproduce: 1. Start live storage migration with a running vm 2. Stop the vm 3. Put host to maintenance Actual results: Zombie dd processes Expected results: No zombies left