Description of problem: This happen when the source volume is big enough (more than 500GB). With smaller volumes (~100GB) there's no problem. The full error and the logs I'll post in a new comment as private, since it contains customer sensitive information. Version-Release number of selected component (if applicable): openstack-cinder-12.0.6-3.el7ost.noarch Thu Jun 13 11:05:12 2019 openstack-nova-api-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:36 2019 openstack-nova-common-17.0.9-9.el7ost.noarch Thu Jun 13 11:03:27 2019 openstack-nova-compute-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:07 2019 openstack-nova-conductor-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:36 2019 openstack-nova-console-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:36 2019 openstack-nova-migration-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:07 2019 openstack-nova-novncproxy-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:36 2019 openstack-nova-placement-api-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:35 2019 openstack-nova-scheduler-17.0.9-9.el7ost.noarch Thu Jun 13 11:05:36 2019 puppet-cinder-12.4.1-4.el7ost.noarch Thu Jun 13 11:02:56 2019 puppet-nova-12.4.0-17.el7ost.noarch Thu Jun 13 11:02:56 2019 python2-cinderclient-3.5.0-1.el7ost.noarch Tue Dec 11 17:34:09 2018 python2-novaclient-10.1.0-1.el7ost.noarch Tue Dec 11 17:34:09 2018 python-cinder-12.0.6-3.el7ost.noarch Thu Jun 13 11:03:41 2019 python-nova-17.0.9-9.el7ost.noarch Thu Jun 13 11:03:27 2019 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: This seems to be similar to this bug[1] that was opened for OSP9. They both have similar error: ~~~ 2019-10-21 14:25:28.790 76 ERROR cinder.volume.manager ProcessExecutionError: Unexpected error while running command. 2019-10-21 14:25:28.790 76 ERROR cinder.volume.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /var/lib/cinder/mnt/[OMITTED] 2019-10-21 14:25:28.790 76 ERROR cinder.volume.manager Exit code: -9 ~~~ [1] https://bugzilla.redhat.com/show_bug.cgi?id=1402594
Hi, I was checking the code for customer version[1] and they have the workaround that increases the cpu_time to 30[2]. But on the command that raises the issue, we have that the cpu is set to 8[3], is this an issue? It seems that it should reflect the cpu_time set in the code, but that's not the case. [1] ~~~ $ grep -ir nova pollux-tds-controller-1/sos_commands/rpm/sh_-c_rpm_--nodigest_-qa_--qf_NAME_-_VERSION_-_RELEASE_._ARCH_INSTALLTIME_date_awk_-F_printf_-59s_s_n_1_2_sort_-V openstack-nova-api-17.0.9-9.el7ost.noarch openstack-nova-common-17.0.9-9.el7ost.noarch openstack-nova-compute-17.0.9-9.el7ost.noarch openstack-nova-conductor-17.0.9-9.el7ost.noarch openstack-nova-console-17.0.9-9.el7ost.noarch openstack-nova-migration-17.0.9-9.el7ost.noarch openstack-nova-novncproxy-17.0.9-9.el7ost.noarch openstack-nova-placement-api-17.0.9-9.el7ost.noarch openstack-nova-scheduler-17.0.9-9.el7ost.noarch ~~~ [2] ~~~ QEMU_IMG_LIMITS = processutils.ProcessLimits( cpu_time=30, address_space=1 * units.Gi) ~~~ [3] ~~~ 2019-10-21 14:25:28.790 76 ERROR cinder.volume.manager Command: /usr/bin/python2 -m oslo_concurrency.prlimit --as=1073741824 --cpu=8 -- env LC_ALL=C qemu-img info /var/lib/cinder/mnt/[OMITTED] ~~~
Are you looking at the code in nova or cinder? Both have QEMU_IMG_LIMITS that are applied for these calls.
I was checking this on nova code, do we have it in both places? If so, where the nova variable is being used? I wanna try to increase this value so customer can try it, how should we proceed?
(In reply to Andre from comment #3) > I was checking this on nova code, do we have it in both places? If so, where > the nova variable is being used? > I wanna try to increase this value so customer can try it, how should we > proceed? The error shown in the description and comment #1 was from the cinder volume manager so the nova code would not be relevant. The Nova code also already has higher limits than Cinder does. Changing it Cinder and restarting cinder-volume should help. If this works, we can work toward getting this patch to relevant branches: https://review.opendev.org/#/c/691901/
How should we proceed testing? Since it requires a change in the code, I need engineering patch, right?
Hi Eric, do you mind help us on this topic, please. The customer has a production environment, so, should we provide a hotfix or test package to the customer instead of manual changes?
I am trying to get https://review.opendev.org/#/c/691901/ merged into upstream master, will provide a hotfix package once we at least get this merged there.
Waiting for newer puddle 2020-01-15.3 Resulted in a pre-fixedin version openstack-cinder-12.0.8-3.el7ost < openstack-cinder-12.0.8-5.el7ost
Verfied on: openstack-cinder-12.0.10-2.el7ost.noarch Using K2 iscsi backed 500G volume, cinder create 500 --name K2_500G (overcloud) [stack@undercloud-0 ~]$ cinder show 7bace336-d2cc-4530-864e-e4f455e73eb1 +--------------------------------+------------------------------------------+ | Property | Value | +--------------------------------+------------------------------------------+ | attached_servers | ['2ff82e12-e95a-45e5-ad28-4bd6d840e9e7'] | | attachment_ids | ['def3d853-c19f-4ec6-82b8-92e09131873f'] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2020-02-09T15:52:44.000000 | | description | None | | encrypted | False | | id | 7bace336-d2cc-4530-864e-e4f455e73eb1 | | metadata | attached_mode : rw | | migration_status | None | | multiattach | False | | name | K2_500G | | os-vol-host-attr:host | controller-0@k2iscsi#k2iscsi | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 2cd01b0fe6c644a48cbfa6da5a03d25b | | replication_status | None | | size | 500 | | snapshot_id | None | | source_volid | None | | status | in-use | | updated_at | 2020-02-09T15:53:31.000000 | | user_id | b8c1ccd7e02c4f22ad8929f1cb5fcaba | | volume_type | tripleo | +--------------------------------+------------------------------------------+ Attached to instance and filled with random data nova volume-attach 2ff82e12-e95a-45e5-ad28-4bd6d840e9e7 7bace336-d2cc-4530-864e-e4f455e73eb1 # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 253:0 0 1G 0 disk |-vda1 253:1 0 1015M 0 part / `-vda15 253:15 0 8M 0 part vdb 253:16 0 500G 0 disk /root/kuku # df -h Filesystem Size Used Available Use% Mounted on /dev 240.1M 0 240.1M 0% /dev /dev/vda1 978.9M 23.9M 914.2M 3% / tmpfs 244.2M 0 244.2M 0% /dev/shm tmpfs 244.2M 88.0K 244.1M 0% /run /dev/vdb 492.0G 466.4G 652.5M 100% /root/kuku Now lets detach volume and clone it nova volume-detach 2ff82e12-e95a-45e5-ad28-4bd6d840e9e7 7bace336-d2cc-4530-864e-e4f455e73eb1 cinder create 501 --source-volid 7bace336-d2cc-4530-864e-e4f455e73eb1 --name 501G_ClonedVolume +--------------------------------+--------------------------------------+ | Property | Value | +--------------------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | consistencygroup_id | None | | created_at | 2020-02-10T05:36:30.000000 | | description | None | | encrypted | False | | id | b23cd5d0-2579-4ac4-a1dc-a04863319497 | | metadata | {} | | migration_status | None | | multiattach | False | | name | 501G_ClonedVolume | | os-vol-host-attr:host | controller-0@k2iscsi#k2iscsi | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 2cd01b0fe6c644a48cbfa6da5a03d25b | | replication_status | None | | size | 501 | | snapshot_id | None | | source_volid | 7bace336-d2cc-4530-864e-e4f455e73eb1 | | status | creating | | updated_at | 2020-02-10T05:36:31.000000 | | user_id | b8c1ccd7e02c4f22ad8929f1cb5fcaba | | volume_type | tripleo | +--------------------------------+--------------------------------------+ Wait a while for clone operation to finish (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+-------------------+------+-------------+----------+--------------------------------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+-------------------+------+-------------+----------+--------------------------------------+ | 7bace336-d2cc-4530-864e-e4f455e73eb1 | available | K2_500G | 500 | tripleo | false | | | b23cd5d0-2579-4ac4-a1dc-a04863319497 | in-use | 501G_ClonedVolume | 501 | tripleo | false | 2ff82e12-e95a-45e5-ad28-4bd6d840e9e7 | -> cloned volume Attach volume to instance and check we have same data on both #nova volume-attach 2ff82e12-e95a-45e5-ad28-4bd6d840e9e7 b23cd5d0-2579-4ac4-a1dc-a04863319497 Looking inside all data us there, successfully cloned a 500G, good to verify.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0764