Bug 1541412

Summary: Ansible deployment should clean up files in /var once finished
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Yihui Zhao <yzhao>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: urgent Docs Contact:
Priority: medium    
Version: ---CC: bugs, cshao, huzhao, phbailey, qiyuan, rbarry, sbonazzo, stirabos, weiwang, yaniwang, ycui, yturgema
Target Milestone: ovirt-4.2.2Keywords: Triaged
Target Release: 2.2.10Flags: rule-engine: ovirt-4.2+
rule-engine: exception+
sbonazzo: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.10 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-29 11:09:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1458709    
Attachments:
Description Flags
issue
none
/var/log/* none

Description Yihui Zhao 2018-02-02 14:05:39 UTC
Created attachment 1390166 [details]
issue

Description of problem: 
There is no space(/var) for HE ansible deployment while setup about four times. 

from the CLI:
"""
[ INFO  ] TASK [Extract appliance to local vm dir]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "dest": "/var/tmp/localvmiMKbq4", "extract_results": {"cmd": ["/usr/bin/gtar", "--extract", "-C", "/var/tmp/localvmiMKbq4", "-z", "--show-transformed-names", "--sparse", "-f", "/root/.ansible/tmp/ansible-tmp-1517574118.41-79777751702578/source"], "err": "/usr/bin/gtar: images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5/55387735-bcd9-4eb6-89dc-d5a576b151ed: Wrote only 512 of 10240 bytes\n/usr/bin/gtar: Exiting with failure status due to previous errors\n", "out": "", "rc": 2}, "gid": 36, "group": "kvm", "handler": "TgzArchive", "mode": "0775", "msg": "failed to unpack /root/.ansible/tmp/ansible-tmp-1517574118.41-79777751702578/source to /var/tmp/localvmiMKbq4", "owner": "vdsm", "secontext": "unconfined_u:object_r:user_tmp_t:s0", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1517574118.41-79777751702578/source", "state": "directory", "uid": 36}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180202072316.conf'
[ ERROR ] Failed to execute stage 'Clean up': [Errno 28] No space left on device
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180202072040-rmwhj6.log
"""

And the /var is 100%:

[root@dell-per515-02 ~]# df -h
Filesystem                                                         Size  Used Avail Use% Mounted on
/dev/mapper/rhvh_bootp--73--75--130-rhvh--4.2.1.2--0.20180201.0+1  2.1T  4.6G  2.0T   1% /
devtmpfs                                                            16G     0   16G   0% /dev
tmpfs                                                               16G  4.0K   16G   1% /dev/shm
tmpfs                                                               16G   17M   16G   1% /run
tmpfs                                                               16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/rhvh_bootp--73--75--130-home                           976M  2.6M  907M   1% /home
/dev/sda2                                                          976M  202M  707M  23% /boot
/dev/mapper/rhvh_bootp--73--75--130-tmp                            976M  4.1M  905M   1% /tmp
/dev/mapper/rhvh_bootp--73--75--130-var                             15G   15G     0 100% /var
/dev/mapper/rhvh_bootp--73--75--130-var_log                        7.8G  109M  7.3G   2% /var/log
/dev/mapper/rhvh_bootp--73--75--130-var_log_audit                  2.0G  9.4M  1.8G   1% /var/log/audit
/dev/mapper/rhvh_bootp--73--75--130-var_crash                      9.8G   37M  9.2G   1% /var/crash
10.66.148.11:/home/yzhao/nfs3                                      237G  133G   92G  60% /rhev/data-center/mnt/10.66.148.11:_home_yzhao_nfs3
tmpfs                                                              3.2G     0  3.2G   0% /run/user/0



Version-Release number of selected component (if applicable): 
cockpit-ws-157-1.el7.x86_64
cockpit-bridge-157-1.el7.x86_64
cockpit-storaged-157-1.el7.noarch
cockpit-dashboard-157-1.el7.x86_64
cockpit-157-1.el7.x86_64
cockpit-ovirt-dashboard-0.11.9-0.1.el7ev.noarch
cockpit-system-157-1.el7.noarch
ovirt-hosted-engine-setup-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.4-1.el7ev.noarch
rhvh-4.2.1.2-0.20180201.0+1
rhvm-appliance-4.2-20180125.0.el7.noarch


How reproducible: 
Deploy HE based ansible deployment about four times.


Steps to Reproduce: 
1. Clean install latest RHVH4.2.1 with ks(rhvh-4.2.1.2-0.20180201.0+1)
2. Deploy HE via CLI based ansible deployment failed due to some reasons
3. Continue to redeploy the HE based ansible deployment about four times
4. Check the /var partition

Actual results: 
The same of the description.


Expected results: 
Redeploy HE successfully


Additional info:

Comment 1 Yihui Zhao 2018-02-02 14:17:33 UTC
Created attachment 1390170 [details]
/var/log/*

Comment 2 Yihui Zhao 2018-02-02 14:40:35 UTC
VM data stored in /var/tmp/

[root@hp-dl385pg8-11 tmp]# du -h /var/tmp/
3.2G	/var/tmp/localvmamUArL/images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5
3.2G	/var/tmp/localvmamUArL/images
8.0K	/var/tmp/localvmamUArL/master/vms/f84ed495-554f-4d03-a4a6-ca2b54435f38
12K	/var/tmp/localvmamUArL/master/vms
16K	/var/tmp/localvmamUArL/master
3.2G	/var/tmp/localvmamUArL
3.2G	/var/tmp/localvmOMA24F/images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5
3.2G	/var/tmp/localvmOMA24F/images
8.0K	/var/tmp/localvmOMA24F/master/vms/f84ed495-554f-4d03-a4a6-ca2b54435f38
12K	/var/tmp/localvmOMA24F/master/vms
16K	/var/tmp/localvmOMA24F/master
3.2G	/var/tmp/localvmOMA24F
3.1G	/var/tmp/localvm3qda9t/images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5
3.1G	/var/tmp/localvm3qda9t/images
8.0K	/var/tmp/localvm3qda9t/master/vms/f84ed495-554f-4d03-a4a6-ca2b54435f38
12K	/var/tmp/localvm3qda9t/master/vms
16K	/var/tmp/localvm3qda9t/master
3.1G	/var/tmp/localvm3qda9t
4.0K	/var/tmp/systemd-private-22f922f5ce70430ab5df7b87146c09b7-chronyd.service-oZPn20/tmp
8.0K	/var/tmp/systemd-private-22f922f5ce70430ab5df7b87146c09b7-chronyd.service-oZPn20
4.0K	/var/tmp/abrt
3.2G	/var/tmp/localvmc6biHf/images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5
3.2G	/var/tmp/localvmc6biHf/images
8.0K	/var/tmp/localvmc6biHf/master/vms/f84ed495-554f-4d03-a4a6-ca2b54435f38
12K	/var/tmp/localvmc6biHf/master/vms
16K	/var/tmp/localvmc6biHf/master
3.2G	/var/tmp/localvmc6biHf
2.2G	/var/tmp/localvmSsPUt5/images/d73f231b-d1ec-43bd-bf4f-622cd3d4f1f5
2.2G	/var/tmp/localvmSsPUt5/images
8.0K	/var/tmp/localvmSsPUt5/master/vms/f84ed495-554f-4d03-a4a6-ca2b54435f38
12K	/var/tmp/localvmSsPUt5/master/vms
16K	/var/tmp/localvmSsPUt5/master
2.2G	/var/tmp/localvmSsPUt5
15G	/var/tmp/

Comment 3 Yihui Zhao 2018-03-16 06:52:25 UTC
Also met this issue.

From the cockpit: 
[ INFO ] TASK [Extract appliance to local vm dir]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "dest": "/var/tmp/localvmNNzO7q", "extract_results": {"cmd": ["/bin/gtar", "--extract", "-C", "/var/tmp/localvmNNzO7q", "-z", "--show-transformed-names", "--sparse", "-f", "/root/.ansible/tmp/ansible-tmp-1521182717.81-74224666147249/source"], "err": "/bin/gtar: images/44b5a089-6ce7-4a23-b4d1-b9dab3fe2687/5ef59632-5a32-4285-95d9-54c59673e2f3: Wrote only 512 of 10240 bytes\n/bin/gtar: Exiting with failure status due to previous errors\n", "out": "", "rc": 2}, "gid": 36, "group": "kvm", "handler": "TgzArchive", "mode": "0775", "msg": "failed to unpack /root/.ansible/tmp/ansible-tmp-1521182717.81-74224666147249/source to /var/tmp/localvmNNzO7q", "owner": "vdsm", "secontext": "unconfined_u:object_r:user_tmp_t:s0", "size": 4096, "src": "/root/.ansible/tmp/ansible-tmp-1521182717.81-74224666147249/source", "state": "directory", "uid": 36}
[ INFO ] TASK [Remove local vm dir]
[ INFO ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
CancelBackPrepare VM


[root@ibm-x3650m5-05 tmp]# df -h
Filesystem                                                       Size  Used Avail Use% Mounted on
/dev/mapper/rhvh_ibm--x3650m5--05-rhvh--4.2.1.4--0.20180305.0+1  774G  4.7G  730G   1% /
devtmpfs                                                          16G     0   16G   0% /dev
tmpfs                                                             16G  204K   16G   1% /dev/shm
tmpfs                                                             16G   18M   16G   1% /run
tmpfs                                                             16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/rhvh_ibm--x3650m5--05-var                             15G   13G  1.6G  90% /var
/dev/sda1                                                        976M  209M  701M  23% /boot
/dev/mapper/rhvh_ibm--x3650m5--05-tmp                            976M  3.3M  906M   1% /tmp
/dev/mapper/rhvh_ibm--x3650m5--05-var_crash                      9.8G   37M  9.2G   1% /var/crash
/dev/mapper/rhvh_ibm--x3650m5--05-var_log                        7.8G   57M  7.3G   1% /var/log
/dev/mapper/rhvh_ibm--x3650m5--05-home                           976M  2.6M  907M   1% /home
/dev/mapper/rhvh_ibm--x3650m5--05-var_log_audit                  2.0G  7.5M  1.8G   1% /var/log/audit
tmpfs                                                            3.1G     0  3.1G   0% /run/user/0
10.66.148.11:/home/yzhao1/nfs5                                   237G   88G  137G  39% /rhev/data-center/mnt/10.66.148.11:_home_yzhao1_nfs5



[root@ibm-x3650m5-05 ~]# ll /var/tmp/
total 28
drwxr-xr-x. 2 abrt abrt 4096 Mar 16 11:29 abrt
drwxrwxr-x. 4 vdsm kvm  4096 Mar 16 12:12 localvm5FZSH7
drwxrwxr-x. 4 vdsm kvm  4096 Mar 16 11:46 localvma3A2QG
drwxrwxr-x. 4 vdsm kvm  4096 Mar 16 14:38 localvmgoqiaw
drwxrwxr-x. 4 vdsm kvm  4096 Mar 16 14:29 localvmzeL1ug
-rw-r--r--. 1 root root    0 Mar  5 16:30 sssd_is_running
drwx------. 3 root root 4096 Mar 16 11:29 systemd-private-ddaef151e49e4cf0814729251cde71f5-chronyd.service-iZ3Yyq
drwx------. 3 root root 4096 Mar 16 11:43 systemd-private-ddaef151e49e4cf0814729251cde71f5-systemd-timedated.service-5xU3Zu

Test version:

cockpit-dashboard-160-3.el7.x86_64
cockpit-system-160-3.el7.noarch
cockpit-ovirt-dashboard-0.11.17-1.el7ev.noarch
cockpit-bridge-160-3.el7.x86_64
cockpit-ws-160-3.el7.x86_64
cockpit-storaged-160-3.el7.noarch
cockpit-160-3.el7.x86_64
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch

vdsm-http-4.20.22-1.el7ev.noarch
vdsm-hook-ethtool-options-4.20.22-1.el7ev.noarch
vdsm-network-4.20.22-1.el7ev.x86_64
vdsm-api-4.20.22-1.el7ev.noarch
vdsm-python-4.20.22-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.20.22-1.el7ev.noarch
vdsm-hook-vhostmd-4.20.22-1.el7ev.noarch
vdsm-yajsonrpc-4.20.22-1.el7ev.noarch
vdsm-client-4.20.22-1.el7ev.noarch
vdsm-4.20.22-1.el7ev.x86_64
vdsm-gluster-4.20.22-1.el7ev.noarch
vdsm-hook-vfio-mdev-4.20.22-1.el7ev.noarch
vdsm-common-4.20.22-1.el7ev.noarch
vdsm-hook-openstacknet-4.20.22-1.el7ev.noarch
vdsm-jsonrpc-4.20.22-1.el7ev.noarch
vdsm-hook-fcoe-4.20.22-1.el7ev.noarch

rhvm-appliance-4.2-20180202.0.el7.noarch
OS tree: rhvh-4.2.1.4-0.20180305.0+1

Comment 4 Nikolai Sednev 2018-03-21 14:37:19 UTC
After numerous re-deployments I see that /var/tmp/localvm5V6Avl/ is empty:
alma03 ~]# ll   /var/tmp/localvm5V6Avl/
total 0
4.0K drwxrwxr-x.  2 vdsm kvm  4.0K Mar 21 16:31 localvm5V6Avl

Works for me on these components on host:
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Moving to verified.

Comment 5 Sandro Bonazzola 2018-03-29 11:09:35 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.