Description of problem: Overcloud deployment fails on nodes with 4G of RAM with Unable to write image to /tmp/ec6cd4ad-e2ec-4f3a-a3fb-7b01a87440c1. Error: [Errno 28] No space left on device. It looks that /tmp is backed by the root tmpfs so if the image is filling up the free memory then deployment fails with no space left on device Version-Release number of selected component (if applicable): rhosp-director-images-14.0-20180713.3.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP14 virtual env with 4G of RAM overcloud nodes Actual results: Nodes provisioning fails with /var/log/containers/ironic/ironic-conductor.log showing Error: [Errno 28] No space left on device Expected results: Deployment succeeds without issues. Additional info: This is a regression compared to OSP13 where we didn't see this behavior. If it is expected we need to make sure it is properly documented.
As Marius pointed out, this looks similar to https://bugs.launchpad.net/ironic-python-agent/+bug/1661328 TheJulia bfournier, yup, it is totally valid too and the only way around it is to use a raw file type, not qcow2 raw gets streamed out. alternatively iscsi deploy is another option since it does not get held in memory on the node being deployed bfournier mcornea: are we using a different file type in 14 or⦠TheJulia bfournier, I suspect the default to direct deploy.. mcornea bfournier: TheJulia ok, so I can confirm that it passed after increasing the ceph memory from 4G to 6G
The direct deploy RFE is here - https://bugzilla.redhat.com/show_bug.cgi?id=1477713. Not sure if related or not.
Yes, I think it's because of the direct deploy. The options are: 1. Recommend low memory deployments with IronicDefaultDeployInterface=iscsi 2. Revert the default to iscsi, allow high-scale deployments to override 3. Store images as RAW to allow their streaming right to the disk (probably too late for Rocky, also will consume undercloud space). Discussed with shardy, he votes for #2. We can revisit the default again for Stein if bug 1607779 helps with streaming images. Ramon, thoughts?
I am voting for #2 as well. We want to have minimum requirements and low barrier entry for director node. If user wants to go big in production, then they should tweak config to allow so.
Jarda, this conversation is not quite about production, IIRC we don't support nodes with less than 8 GiB (12 or even 16 in practice). But I do agree with making it opt-in for now.
Direct deploy is supposed to allow the undercloud to deploy a larger number of nodes by default by reducing the load added per overcloud node deployed at a time. I wouldn't decide based on that it makes the VMs used for the Overcloud to go with 6GB instead of 4GB but instead on how much of an improvement it makes for operators in production environments. I propose to make it opt-in for OSP 14, test the improvements in performance in director and based on this make it default in OSP 15. This would also allow time for potential not yet uncovered issues.
The partial revert landed. We will look into making it less RAM-consuming in the next release.
Verified: Environment: openstack-tripleo-heat-templates-9.0.0-0.20180831204457.17bb71e.0rc1.el7ost.noarch Successfully deployed ceph nodes with 4GB of ram: [heat-admin@overcloud-cephstorage2-1 ~]$ free total used free shared buff/cache available Mem: 3880860 144940 3471580 680 264340 3467496 Swap: 0 0 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045