RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 836161 - 3.1 - vdsm: move of 20-30 disks will cause image corruption
Summary: 3.1 - vdsm: move of 20-30 disks will cause image corruption
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Eduardo Warszawski
QA Contact: vvyazmin@redhat.com
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-28 09:12 UTC by Dafna Ron
Modified: 2022-07-09 05:36 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.9.6-42.0
Doc Type: Bug Fix
Doc Text:
Previously, process limit errors would cause image corruption in some disks when migrating multiple disks from one domain to another. This has been corrected so that migrating multiple disks simultaneously between domains does not cause image corruption.
Clone Of:
Environment:
Last Closed: 2012-12-04 19:01:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log (3.04 MB, application/octet-stream)
2012-06-28 09:15 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:1508 0 normal SHIPPED_LIVE Important: rhev-3.1.0 vdsm security, bug fix, and enhancement update 2012-12-04 23:48:05 UTC

Description Dafna Ron 2012-06-28 09:12:42 UTC
Description of problem:

move of 20-30 disks from one domain to a second/third domain will cause image corruption. 
this is caused because of rocess limit errors and happens in both iscsi and NFS domains. 

Version-Release number of selected component (if applicable):

si7
vdsm-4.9.6-17.0.el6.x86_64
libvirt-0.9.10-21.el6.x86_64
qemu-img-rhev-0.12.1.2-2.295.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create several vm's with several images in a multiple domain pool
2. move images from one domain to a second/third domain (so there will be multiple moveImage tasks running)
3.
  
Actual results:

some of the images will get corrupted (status in UI will be illegal). 

Expected results:

we should be able to move several images at once without getting corruption. 

Additional info: vdsm log attached

Comment 1 Dafna Ron 2012-06-28 09:15:37 UTC
Created attachment 594978 [details]
log

Comment 2 Yaniv Kaul 2012-06-28 09:18:02 UTC
Is this a regression from RHEVM3.0/VDSM6.3?

Comment 3 Dafna Ron 2012-06-28 09:22:11 UTC
I am not sure - this needs to be tested in 3.0 as well.

Comment 4 Ayal Baron 2012-08-01 10:16:36 UTC
Edu, please update bug with status

Comment 5 Eduardo Warszawski 2012-09-19 18:25:54 UTC
41 moveImage tasks.

All of them from 
src SD: 
bb4250b5-df22-4a03-856a-0cca6ec54331
to dst SD:
ff6714d9-94f8-452f-92dc-40f0464ace40
op=2, postZero=false, force=false

# Start time, thread -> task, img

2012-06-27 18:50:38.804000      Thread-5922 -> 620ded5b-f96c-483d-991c-87156603b675     fff444e2-a8f8-43f3-a212-5d93e4a8a702
2012-06-27 18:50:39.902000      Thread-5924 -> 9e3b7863-1994-4a4d-a38b-6032f2de6b74     8884f137-86a0-41be-86ee-a802a45236ca
2012-06-27 18:50:41.956000      Thread-5926 -> 68b6113f-bfb1-436c-8a69-13acb569ff1f     79df821b-7996-4d8b-a352-be1b2d3d9c4e
2012-06-27 18:50:43.225000      Thread-5928 -> 907dd7b5-8dc5-47c5-a9b2-2814e316e334     29abab31-0e0a-4d9c-b616-32d1d19fbaf8
2012-06-27 18:50:45.509000      Thread-5930 -> c701c4da-d0da-415e-9207-b7a5eca07a95     0f58331a-c370-4004-b605-7f8bb5293d84
2012-06-27 18:50:46.736000      Thread-5933 -> 66451e6c-7c0c-4beb-91e3-1464a7688866     12eca447-0be5-4aec-a120-a6001e1a26c2
2012-06-27 18:50:48.148000      Thread-5937 -> 035e5256-2fd4-41eb-9e50-1d46694d9b27     40614e39-238b-4517-a246-93baa674483a
2012-06-27 18:50:49.559000      Thread-5938 -> f1a6f35c-1b36-4b18-bece-137c8948717a     a2546e4d-1b53-4bba-89ad-d5805c4b454d
2012-06-27 18:50:50.974000      Thread-5940 -> 23acc4d4-6bfd-4b96-beee-4787b74eee0b     c079db36-d48a-4a47-b9e3-ac8f4b47baef
2012-06-27 18:50:54.880000      Thread-5944 -> 815105be-3f03-45e8-84a0-a3e24ad8299f     9d95b6b6-b005-4b8f-ac7b-d4bcef356e26
2012-06-27 18:51:06.254000      Thread-5952 -> 707da893-e0d2-411b-aa85-e46c684212ba     2b131078-f375-4824-9245-116f766da14b
2012-06-27 18:51:08.829000      Thread-5958 -> 83ea83cd-f04e-40d0-99e0-7c9565684240     b66cfb83-7b07-4315-8992-5699f7abaab5
2012-06-27 18:51:13.131000      Thread-5961 -> fac94ec5-ac19-43c8-9077-2179c63604f5     1e8d7e85-dc35-4530-8eba-5a8971d1da4e
2012-06-27 18:51:16.216000      Thread-5963 -> 29f95292-96e1-4080-bb28-9b3f609f2fa4     3864422d-8abc-4677-aa45-daa2f3715e87
2012-06-27 18:51:23.698000      Thread-5969 -> 1d507cbf-9c61-4117-a0c7-a9314e17143a     97ee7305-6dfc-4635-99e6-58396faeaba8
2012-06-27 18:51:26.203000      Thread-5971 -> 72c58057-019b-4338-a842-ab5e86843f6e     cf8aebf6-578a-4aad-a180-ecf854cb7ad0
2012-06-27 18:51:45.839000      Thread-5986 -> a4725c99-dab5-4224-93d0-54fe5c1ccca7     bf1c84ab-622c-4462-9cad-d92c0302c5e2
2012-06-27 18:51:48.558000      Thread-5990 -> 8cabcabd-03bf-4972-941a-c12cb9f0c112     083ea641-88d2-4b99-8efc-7d994acef4f1
2012-06-27 18:51:53.127000      Thread-5993 -> c106596a-3897-483b-80bc-e24096446885     55c10e40-10cc-428a-9bd5-bdf2d71aee13
2012-06-27 18:51:56.387000      Thread-5997 -> 54f3eb63-b3f0-4d92-8c27-6e22d237ec51     560714c5-7d3a-43a2-a239-b220ef670ba5
2012-06-27 18:52:13.740000      Thread-6008 -> 6cb43012-7e25-4eb8-b103-92fe3b3346e9     b3e13b0c-a5ac-47f4-8aa4-101af2c07cf3
2012-06-27 18:52:15.838000      Thread-6010 -> 60462b92-1a68-4bfa-bd70-94c660dbd6d9     77577699-3c54-4164-9109-3110ea68cc84
2012-06-27 18:52:19.819000      Thread-6017 -> eb443be2-d659-4cab-b46f-2143da4d5bab     baf411f0-6d7f-4fa1-b9f4-ed9e54aaedcb
2012-06-27 18:52:21.486000      Thread-6019 -> 9545536c-a0f5-47a9-b6ad-49b8fc65ce40     964f2e4c-ba39-4120-86b9-cafcca0d0245
2012-06-27 18:52:52.668000      Thread-6038 -> 6d1ddf0c-0fc7-4145-bc33-1c7f7561a38d     77f7c709-187e-49bc-b649-a37769427272
2012-06-27 18:53:08.704000      Thread-6048 -> e4d5c41b-bcff-4321-9dea-573db1068264     467a57f3-fa18-4656-b4a1-7794118698a6
2012-06-27 18:53:16.706000      Thread-6054 -> 12023e4a-febc-4a78-b625-0e427112c2a6     84cd4103-c6b9-4fef-a6c2-545e1bdd3acd
2012-06-27 18:53:36.457000      Thread-6069 -> 0857da21-22f1-4c6f-8881-584c1a8fb2b0     057d5877-b6c1-40b0-8cda-723940a3437f
2012-06-27 18:53:43.244000      Thread-6074 -> 7f969f6b-b599-4f18-861b-713d1b11f20c     8a688a35-bad2-47c7-b84c-8766edbab30a
2012-06-27 18:53:48.389000      Thread-6078 -> 88e04a72-a8fd-42d1-a3eb-1afe6aafdd40     e3191456-f074-44cd-97ff-76a5dcb2b43c
2012-06-27 18:53:52.098000      Thread-6080 -> 12f8a16d-879f-4535-ac9a-10488c430b74     16a132f9-e213-4ea9-b968-50688e727d79
2012-06-27 18:53:58.582000      Thread-6086 -> ffdcaaf5-192f-439d-921d-26a0c9d5ff4e     c683a521-d362-4952-bcf6-727dd1e4ae2c
2012-06-27 18:54:12.837000      Thread-6098 -> 46a8e236-1d40-4ddb-a40d-f5f170d216c9     1d089174-299b-4e36-8784-eebadc0b35c3
2012-06-27 18:54:14.037000      Thread-6101 -> d13d8cc4-5de7-46a3-aaea-77aef8f9f4a1     f013fb76-85ba-49d2-abd8-5fba8e0457fd
2012-06-27 18:54:41.592000      Thread-6118 -> 895687a8-4830-4189-bcb6-a9144ff58e37     c5f6c888-9824-4c32-988a-a76a718a640d
2012-06-27 18:54:42.943000      Thread-6119 -> 47a82f40-68a3-4311-9bee-fbbc32ff4ed5     02f8304f-fb10-4b79-a41a-245d2cb6c84d
2012-06-27 18:56:13.381000      Thread-6200 -> 576491fd-23c6-4903-9d46-1fca31cf052f     60771171-732f-49c4-aa7d-884239966c2a
2012-06-27 18:56:18.067000      Thread-6204 -> 0a41e715-1a4a-4ed6-8622-e29d8b061c7c     15a53512-1758-48ca-b3b1-a472f9e9093c
2012-06-27 18:56:40.904000      Thread-6222 -> b7914e73-242c-45db-899e-5c6219aa1539     8ef92528-dab7-4710-a5c0-198b651698fe
2012-06-27 18:56:49.625000      Thread-6228 -> 736ff763-65eb-46fe-87b7-524300db29fa     589dec2a-79f6-47ef-bf2a-d62030da45ae
2012-06-27 18:57:08.379000      Thread-6261 -> a0c543a6-71c5-42cd-9888-5c8af1dc8138     a9008996-5120-4408-97e7-825cff443838

Comment 6 Eduardo Warszawski 2012-09-19 19:37:55 UTC
The system is under stress.
Some of the tasks are failed or aborted due to:

  File "/usr/share/vdsm/storage/processPool.py", line 60, in runExternally
    raise NoFreeHelpersError("You reached the process limit")
NoFreeHelpersError: You reached the process limit

Failed moveImages tasks are:

# Raised tasks: time Thread -> task image

2012-06-27 18:50:49.559000      Thread-5938 -> f1a6f35c-1b36-4b18-bece-137c8948717a     a2546e4d-1b53-4bba-89ad-d5805c4b454d
2012-06-27 18:50:54.880000      Thread-5944 -> 815105be-3f03-45e8-84a0-a3e24ad8299f     9d95b6b6-b005-4b8f-ac7b-d4bcef356e26
2012-06-27 18:51:13.131000      Thread-5961 -> fac94ec5-ac19-43c8-9077-2179c63604f5     1e8d7e85-dc35-4530-8eba-5a8971d1da4e
2012-06-27 18:51:23.698000      Thread-5969 -> 1d507cbf-9c61-4117-a0c7-a9314e17143a     97ee7305-6dfc-4635-99e6-58396faeaba8
2012-06-27 18:51:45.839000      Thread-5986 -> a4725c99-dab5-4224-93d0-54fe5c1ccca7     bf1c84ab-622c-4462-9cad-d92c0302c5e2
2012-06-27 18:51:48.558000      Thread-5990 -> 8cabcabd-03bf-4972-941a-c12cb9f0c112     083ea641-88d2-4b99-8efc-7d994acef4f1
2012-06-27 18:51:53.127000      Thread-5993 -> c106596a-3897-483b-80bc-e24096446885     55c10e40-10cc-428a-9bd5-bdf2d71aee13
2012-06-27 18:51:56.387000      Thread-5997 -> 54f3eb63-b3f0-4d92-8c27-6e22d237ec51     560714c5-7d3a-43a2-a239-b220ef670ba5
2012-06-27 18:52:21.486000      Thread-6019 -> 9545536c-a0f5-47a9-b6ad-49b8fc65ce40     964f2e4c-ba39-4120-86b9-cafcca0d0245
2012-06-27 18:52:52.668000      Thread-6038 -> 6d1ddf0c-0fc7-4145-bc33-1c7f7561a38d     77f7c709-187e-49bc-b649-a37769427272

Comment 7 Eduardo Warszawski 2012-10-30 13:11:37 UTC
http://gerrit.ovirt.org/#/c/8507/

Comment 9 vvyazmin@redhat.com 2012-11-12 14:54:55 UTC
Tested with moving 50 disks in iSCSI DC & moving 50 disks in NFS DC

Verified on RHEVM 3.1 - SI24

RHEVM: rhevm-3.1.0-28.el6ev.noarch
VDSM: vdsm-4.9.6-42.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.5.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64
SANLOCK: sanlock-2.3-4.el6_3.x86_64

Comment 11 errata-xmlrpc 2012-12-04 19:01:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html


Note You need to log in before you can comment on or make changes to this bug.