Created attachment 1759463 [details] pdb data Description of problem: On nodes with existing software raid disks, ipa tends to use /dev/md? disks by default. For example in this case, from /dev/md125 , /dev/md126, /dev/md127 and another 20-30 disks, ipa selects /dev/md125 as default root_partition. But the _get_partition function in ironic_python_agent/extensions/image.py does not recognize the correct partition with "img-rootfs", which is /dev/md125p2. IPA also makes a mistake of then ending up with /dev/md125p1 as the return value of this function. In effect, the deployment fails as /dev/md125p1 is mounted in tmp directory and ipa does not find /dev in this mount-point. Version-Release number of selected component (if applicable): SUPERMICRO 6049P RHOSP 16.1.3 GA How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: For detailed investigation, I logged into the IPA environment (via ssh) and manually started the IPA with pdb at some points. After tmp directory was created I confirmed the mounted partition: --------------------------------------------------------__ [root@host-192-168-24-38 ~]# df | grep tmp devtmpfs 197159460 0 197159460 0% /dev tmpfs 197436444 0 197436444 0% /dev/shm tmpfs 197436444 10448 197425996 1% /run tmpfs 197436444 0 197436444 0% /sys/fs/cgroup tmpfs 39487288 0 39487288 0% /run/user/0 /dev/md125p1 492 492 0 100% /tmp/tmpt94qs6e2 [root@host-192-168-24-38 ~]# ls -l /dev/md125 md125 md125p1 md125p2 [root@host-192-168-24-38 ~]# tune2fs -l /dev/md125p1 tune2fs 1.45.4 (23-Sep-2019) tune2fs: Bad magic number in super-block while trying to open /dev/md125p1 /dev/md125p1 contains a iso9660 file system labelled 'config-2' [root@host-192-168-24-38 ~]# tune2fs -l /dev/md125p2 tune2fs 1.45.4 (23-Sep-2019) tune2fs: Bad magic number in super-block while trying to open /dev/md125p2 /dev/md125p2 contains a xfs file system labelled 'img-rootfs' [root@host-192-168-24-38 ~]# blkid /dev/md125p2 /dev/md125p2: LABEL="img-rootfs" UUID="0ec3dea5-f293-4729-b676-5d38a611ce81" TYPE="xfs" PARTUUID="6722008c-02" [root@host-192-168-24-38 ~]# ls /tmp/tmpt94qs6e2/ -a . .. ec2 openstack -------------------------------------------------------- PFA detailed pdb data. Note: 1. "img-rootfs" is on /dev/md125p2 2. UUID of "img-rootfs" is "0ec3dea5-f293-4729-b676-5d38a611ce81" 3. IPA recognizes and passed this UUID correctly to _get_partition with disk.
I noticed something after adding pdb in _get_partition. It seems this function finds uuid on the wrong partition: ------------------------------------------------------------------------------------------------------------ 2021-02-26 05:48:42.829 2599 DEBUG ironic_lib.utils [-] Execution completed, command line is "mdadm --detail /dev/md125" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:101 2021-02-26 05:48:42.830 2599 DEBUG ironic_lib.utils [-] Command stdout is: "/dev/md125: Version : 1.2 Creation Time : Fri Feb 26 04:38:33 2021 Raid Level : raid1 Array Size : 971910144 (926.89 GiB 995.24 GB) Used Dev Size : 971910144 (926.89 GiB 995.24 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Feb 26 05:48:39 2021 State : clean, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Resync Status : 38% complete Name : 2 UUID : 2c6a5e72:fcf61094:02275431:b11b11c5 Events : 1160 Number Major Minor RaidDevice State 0 66 211 0 active sync /dev/sdat3 1 66 227 1 active sync /dev/sdau3 " execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:103 2021-02-26 05:48:42.830 2599 DEBUG ironic_lib.utils [-] Command stderr is: "" execute /usr/lib/python3.6/site-packages/ironic_lib/utils.py:104 2021-02-26 05:48:42.830 2599 DEBUG root [-] /dev/md125 is an md device is_md_device /usr/lib/python3.6/site-packages/ironic_python_agent/hardware.py:200 > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(70)_get_partition() -> md_partition = device + 'p1' (Pdb) locals() {'uuid': '0ec3dea5-f293-4729-b676-5d38a611ce81', 'device': '/dev/md125'} (Pdb) n > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(71)_get_partition() -> if (not os.path.exists(md_partition) or (Pdb) n > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(72)_get_partition() -> not stat.S_ISBLK(os.stat(md_partition).st_mode)): (Pdb) locals() {'uuid': '0ec3dea5-f293-4729-b676-5d38a611ce81', 'device': '/dev/md125', 'md_partition': '/dev/md125p1'} (Pdb) n > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(78)_get_partition() -> LOG.debug("Found md device with partition %s", md_partition) (Pdb) n 2021-02-26 05:49:35.248 2599 DEBUG ironic_python_agent.extensions.image [-] Found md device with partition /dev/md125p1 _get_partition /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py:78 > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(79)_get_partition() -> return md_partition (Pdb) 2021-02-26 05:50:11.463 2599 INFO ironic_python_agent.agent [-] heartbeat successful 2021-02-26 05:50:11.463 2599 INFO ironic_python_agent.agent [-] sleeping before next heartbeat, interval: 136.20341590356992 --Return-- > /usr/lib/python3.6/site-packages/ironic_python_agent/extensions/image.py(79)_get_partition()->'/dev/md125p1' -> return md_partition (Pdb) locals() {'uuid': '0ec3dea5-f293-4729-b676-5d38a611ce81', 'device': '/dev/md125', 'md_partition': '/dev/md125p1', '__return__': '/dev/md125p1'} ------------------------------------------------------------------------------------------------------------ [root@host-192-168-24-86 ~]# lsblk -PbioKNAME,UUID,PARTUUID,TYPE | grep md125 KNAME="md125" UUID="" PARTUUID="" TYPE="raid1" KNAME="md125" UUID="" PARTUUID="" TYPE="raid1" KNAME="md125p1" UUID="2021-02-26-10-38-08-00" PARTUUID="dc23973e-01" TYPE="md" KNAME="md125p1" UUID="2021-02-26-10-38-08-00" PARTUUID="dc23973e-01" TYPE="md" KNAME="md125p2" UUID="0ec3dea5-f293-4729-b676-5d38a611ce81" PARTUUID="dc23973e-02" TYPE="md" KNAME="md125p2" UUID="0ec3dea5-f293-4729-b676-5d38a611ce81" PARTUUID="dc23973e-02" TYPE="md" I won't be having this hardware next week, hence I am trying to get the most data required. Although I think this should be reproducible pretty easily on any node with software raid.
Could you attach the agent logs for this?
Given the complexity and the fact software raid is a feature in upstream ironic which was still being worked on when the Train release was cut upstream, and the fixes requires an API change between the Conductor<->Agent, we're going to go the route of letting this get pulled in with 17 as it is already merged instead of attempting to back port to 16.1. This also considers that the risk of backporting code in this critical, and complex path, is a higher risk path as opposed to just letting this come in with the next version release.
Given this issue will be fixed in 17, and the actual software raid usage is not a supported case in osp16.x, I'm closing this out as a next release item. If you have any questions or concerns, please let us know.