Bug 1903649

Summary: Automated cleaning is disabled by default
Product: OpenShift Container Platform Reporter: Sai Sindhur Malleni <smalleni>
Component: Bare Metal Hardware ProvisioningAssignee: Derek Higgins <derekh>
Bare Metal Hardware Provisioning sub component: ironic QA Contact: Ori Michaeli <omichael>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: dblack, derekh, djuran, jlema, omichael, shardy, tsedovic
Version: 4.6.zKeywords: Triaged
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-08 13:50:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1904064    
Bug Blocks:    

Description Sai Sindhur Malleni 2020-12-02 15:04:02 UTC
Description of problem:
Automated node cleaning is disabled by default. We want this enabled by default to make sure we do not end up with situations like

NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT                                                                                                   
sda                                                                                                     8:0    0 446.6G  0 disk                                                                                                              
├─sda1                                                                                                  8:1    0   384M  0 part /boot                                                                                                        
├─sda2                                                                                                  8:2    0   127M  0 part /boot/efi                                                                                                    
├─sda3                                                                                                  8:3    0     1M  0 part                                                                                                              
├─sda4                                                                                                  8:4    0 446.1G  0 part                                                                                                              
│ └─coreos-luks-root-nocrypt                                                                          253:0    0   446G  0 dm   /sysroot                                                                                                     
└─sda5                                                                                                  8:5    0    65M  0 part                                                                                                              
sdb                                                                                                     8:16   0  14.9G  0 disk                                                                                                              
nvme0n1                                                                                               259:0    0   1.5T  0 disk                                                                                                              
nvme3n1                                                                                               259:1    0   1.5T  0 disk                                                                                                              
nvme1n1                                                                                               259:2    0   1.5T  0 disk                                                                                                              
└─ceph--342e23cd--4600--4c23--aaab--5e5f9524aa90-osd--block--24ec8bdd--1f5d--4af7--8dea--0a988f294bfd 253:1    0   1.5T  0 lvm                                                                                                               
nvme2n1         


on the worker nodes which have been used for other things in the past and still carry disk data/metadata.

[kni@e16-h18-b03-fc640 ansible]$ oc exec -it metal3-568449f7fc-79mkm  -c metal3-ironic-conductor cat /etc/ironic/ironic.conf -n openshift-machine-api | grep automated_clean                                                                 
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.                                                                                                             
automated_clean = false
#automated_clean = true


Version-Release number of selected component (if applicable):
4.6.4

How reproducible:
100%

Steps to Reproduce:
1. Deploy a 4.6 build
2. oc exec -it metal3-568449f7fc-79mkm  -c metal3-ironic-conductor cat /etc/ironic/ironic.conf -n openshift-machine-api | grep automated_clean                 
3.

Actual results:
automated_clean = false

Expected results:
automated_clean = true

Additional info:

Comment 1 Steven Hardy 2020-12-02 15:08:09 UTC
Note that cleaning is enabled by default on 4.7, but it wasn't in 4.6:

https://github.com/openshift/ironic-image/blob/release-4.6/ironic.conf#L33

Comment 2 Derek Higgins 2020-12-02 16:07:32 UTC
Automated clean was disabled because of a bug, https://storyboard.openstack.org/#!/story/2007229,
4.6 is using IPA python3-ironic-python-agent-6.4.1-0.20201103152810.7306c73.el8.noarch which contains a fix for the bug, https://review.opendev.org/c/openstack/ironic-python-agent/+/705062
So should be safe to re-enable auto clean, PR here https://github.com/openshift/ironic-image/pull/128

Comment 3 Derek Higgins 2021-01-18 09:49:54 UTC
increasing priority/severity as we're getting additional reports of this

Comment 8 errata-xmlrpc 2021-02-08 13:50:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.16 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0308