Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 2319581

Summary: [crimson] OSD deployment fails due to low value being set for fs.aio-max-nr
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harsh Kumar <hakumar>
Component: RADOSAssignee: Matan Breizman <mbreizma>
Status: CLOSED ERRATA QA Contact: Harsh Kumar <hakumar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: bhubbard, ceph-eng-bugs, cephqe-warriors, jcaratza, mbreizma, nojha, rzarzyns, vumrao
Target Milestone: ---Keywords: TechPreview
Target Release: 9.0   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-20.1.0-63 Doc Type: Technology Preview
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2026-01-29 06:53:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harsh Kumar 2024-10-18 05:24:07 UTC
Description of problem:
Crimson OSD deployment on freshly re-imaged nodes fail complaining about insufficient value of aio-max-nr.
Now as per BZ2035331, aio-max-nr used to be 65536 on OSD nodes during OSD deployment which caused problems before, this was fixed with BZ2035331 in RHCS 5.2 itself.

Crimson OSD is somehow again setting the aio-max-nr on OSD nodes to 65536, consquently OSDs fail to start with the following error:
    117 INFO  2024-10-14 15:58:00,328 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-2/) mkfs path /var/lib/ceph/osd/ceph-2/
    118 INFO  2024-10-14 15:58:00,328 [shard 0:main] bdev - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) open path /var/lib/ceph/osd/ceph-2//block
    119 WARN  2024-10-14 15:58:00,328 [shard 0:main] bdev - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-2//block failed: (22) Invalid     argument
    120 ERROR 2024-10-14 15:58:00,328 [shard 0:main] none - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr
    121 ERROR 2024-10-14 15:58:00,328 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
    122 INFO  2024-10-14 15:58:00,328 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-2/) mkfs using provided fsid 03ff578d-897d-477f-8bf8-e5c2915848b5
    123 INFO  2024-10-14 15:58:00,328 [shard 0:main] bdev - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) open path /var/lib/ceph/osd/ceph-2//block
    124 WARN  2024-10-14 15:58:00,328 [shard 0:main] bdev - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-2//block failed: (22) Invalid     argument
    125 ERROR 2024-10-14 15:58:00,329 [shard 0:main] none - bdev(0x5557ef03ae00 /var/lib/ceph/osd/ceph-2//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr

Once OSD deployment fails due to the above error, the recommended workaround is to manually increase the value of aio-max-nr to 1048576, but even with this value set as aio-max-nr, OSDs do not start.

It is only when aio-max-nr is set to an even higher value, say - 2097152, and OSD service is applied again, remaining OSDs get deployed. At this point the initial set of OSDs which had failed also try to restart, but now fail with a different error:
    953 INFO  2024-10-14 14:40:24,073 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
    954 WARN  2024-10-14 14:40:24,073 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-4/block failed: (22) Invalid a    rgument
    955 INFO  2024-10-14 14:40:24,073 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open size 959652560896 (0xdf6fc00000, 894 GiB) block_size 4096 (4 KiB) rotational     device, discard not supported
    956 ERROR 2024-10-14 14:40:24,074 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-4/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-4/block at offset 102: void b    luestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
    957 INFO  2024-10-14 14:40:24,074 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) close
    958 INFO  2024-10-14 14:40:24,335 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-4) _mount::NCB::calling open_db_and_around(read/write)
    959 INFO  2024-10-14 14:40:24,335 [shard 0:main] bluestore - bluestore(/var/lib/ceph/osd/ceph-4) _open_db_and_around::NCB::read_only=0, to_repair=0
    960 INFO  2024-10-14 14:40:24,335 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
    961 WARN  2024-10-14 14:40:24,336 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-4/block failed: (22) Invalid a    rgument
    962 INFO  2024-10-14 14:40:24,336 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open size 959652560896 (0xdf6fc00000, 894 GiB) block_size 4096 (4 KiB) rotational     device, discard not supported
    963 INFO  2024-10-14 14:40:24,336 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) close
    964 INFO  2024-10-14 14:40:24,358 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
    965 WARN  2024-10-14 14:40:24,358 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) ioctl(F_SET_FILE_RW_HINT) on /var/lib/ceph/osd/ceph-4/block failed: (22) Invalid a    rgument
    966 INFO  2024-10-14 14:40:24,358 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) open size 959652560896 (0xdf6fc00000, 894 GiB) block_size 4096 (4 KiB) rotational     device, discard not supported
    967 ERROR 2024-10-14 14:40:24,358 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-4/block) _read_bdev_label unable to decode label /var/lib/ceph/osd/ceph-4/block at offset 102: void b    luestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
    968 ERROR 2024-10-14 14:40:24,358 [shard 0:main] none - bluestore(/var/lib/ceph/osd/ceph-4) _check_main_bdev_label not all labels read properly
    969 INFO  2024-10-14 14:40:24,358 [shard 0:main] bdev - bdev(0x561a6abb2e00 /var/lib/ceph/osd/ceph-4/block) close


Basically value of fs aio-nr(not aio-max-nr) gets increased from 0 to 65536, which is the current value of fs.aio-max-nr
  aio-nr & aio-max-nr:
     aio-nr is the running total of the number of events specified on the
     io_setup system call for all currently active aio contexts.  If aio-nr
     reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
     raising aio-max-nr does not result in the pre-allocation or re-sizing
     of any kernel data structures.

Issue is reproducible everytime on a newly re-imaged baremetal machines

Version-Release number of selected component (if applicable):
ceph version 19.1.1-60.0.crimson.el9cp (c1b82983c7814645622bbf28e147fbb28708ba17) squid (rc)

How reproducible:
5/5

Steps to Reproduce:
1. Use baremetal machines were aio-max-nr has not been configured previously
2. Deploy Crimson cluster and OSD
3. OSDs deployment will fail, check logs for complain about low value of fs/aio-max-nr

Actual results:
Crimson OSD is setting fs aio-max-nr to 65536, which is too less for OSD deployment

Expected results:
Crimson OSD should set aio-max-nr to a value greater than 1048576, closer to 2097152 to enable smooth deployment of OSDs

Additional info:

>>> Value of fs aio-nr and fs aio-max-nr before OSD deployment
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-nr ; done
    folio01
    0
    folio02
    0
    folio03
    0
    folio04
    0
    folio05
    0
    folio09
    0
    folio10
    0
    folio15
    0
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-max-nr ; done
    folio01
    1048576
    folio02
    65536
    folio03
    65536
    folio04
    65536
    folio05
    65536
    folio09
    65536
    folio10
    65536
    folio15
    65536

>>> Using spec file to deploy OSDs
    [root@folio01 ubuntu]# cephadm shell --mount ~/osd_spec_crimson.yaml 
    Inferring fsid 548cb644-8cce-11ef-bd89-78ac44504f32
    Inferring config /var/lib/ceph/548cb644-8cce-11ef-bd89-78ac44504f32/mon.folio01/config
    Using ceph image with id '599d00243e97' and tag 'latest' created on 2024-09-18 19:54:38 +0000 UTC
    cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:aab02cb8cb2a143d61ef11ac230e7f04ee1252fb558f1234a8b0e6c29483ae18
    [ceph: root@folio01 /]# cat /mnt/osd_spec_crimson.yaml 
    service_type: osd
    service_id: osd_crimson_hdd
    placement:
      hosts:
        - folio02
        - folio03
        - folio04
        - folio05
        - folio09
    data_devices:
      rotational: 1
    db_devices:
      rotational: 0
     
    [ceph: root@folio01 /]# ceph orch ls
    2024-10-17T21:33:55.783+0000 7fa0f3baf640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:33:55.783+0000 7fa0f3baf640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT  
    mgr              3/3  7m ago     2m   label:mgr  
    mon              3/3  7m ago     2m   label:mon  
    
    [ceph: root@folio01 /]# ceph status
    2024-10-17T21:33:58.767+0000 7fbcecd4a640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:33:58.767+0000 7fbcecd4a640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
      cluster:
        id:     548cb644-8cce-11ef-bd89-78ac44504f32
        health: HEALTH_WARN
                OSD count 0 < osd_pool_default_size 3
     
      services:
        mon: 3 daemons, quorum folio01,folio10,folio05 (age 73s)
        mgr: folio01.ehcvql(active, since 7m), standbys: folio10.ygxvkp, folio05.olappc
        osd: 0 osds: 0 up, 0 in
     
      data:
        pools:   0 pools, 0 pgs
        objects: 0 objects, 0 B
        usage:   0 B used, 0 B / 0 B avail
        pgs:     

>>> OSD spec is applied
    [ceph: root@folio01 /]# ceph orch apply -i /mnt/osd_spec_crimson.yaml 
    2024-10-17T21:34:07.596+0000 7fde983ab640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:34:07.597+0000 7fde983ab640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    Scheduled osd.osd_crimson_hdd update...
    [ceph: root@folio01 /]# ceph orch ls
    2024-10-17T21:34:56.719+0000 7fcf34c34640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:34:56.719+0000 7fcf34c34640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                
    mgr                             3/3  8m ago     3m   label:mgr                                
    mon                             3/3  8m ago     3m   label:mon                                
    osd.osd_crimson_hdd               0  -          49s  folio02;folio03;folio04;folio05;folio09  

>>> Cluster tries to deploy one OSD on each OSD node, all of them fail
    [ceph: root@folio01 /]# ceph osd tree
    2024-10-17T21:35:02.558+0000 7f96dbcb9640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:35:02.558+0000 7f96dbcb9640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
    -1              0  root default                           
     0              0  osd.0           down   1.00000  1.00000
     1              0  osd.1           down   1.00000  1.00000
     2              0  osd.2           down   1.00000  1.00000
     3              0  osd.3           down   1.00000  1.00000
     4              0  osd.4           down   1.00000  1.00000

>>> Value of fs aio-max-nr and fs aio-nr immediately after attempt to deploy Crimson OSDs
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-max-nr ; done
    folio01
    1048576
    folio02
    65536
    folio03
    65536
    folio04
    65536
    folio05
    65536
    folio09
    65536
    folio10
    65536
    folio15
    65536

    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-nr ; done
    folio01
    0
    folio02
    65536
    folio03
    65536
    folio04
    65536
    folio05
    65536
    folio09
    65536
    folio10
    0
    folio15
    0

>>> Changing value of f.axio-max-nr to 1048576
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host 'echo "fs.aio-max-nr = 1048576" >> /etc/sysctl.d/99-sysctl.conf' ; done
    folio01
    folio02
    folio03
    folio04
    folio05
    folio09
    folio10
    folio15

    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do ssh root@$host sysctl -p /etc/sysctl.d/99-sysctl.conf ; done
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576

    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-nr ; done
    folio01
    0
    folio02
    65536
    folio03
    65536
    folio04
    65536
    folio05
    65536
    folio09
    65536
    folio10
    0
    folio15
    0

    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-max-nr ; done
    folio01
    1048576
    folio02
    1048576
    folio03
    1048576
    folio04
    1048576
    folio05
    1048576
    folio09
    1048576
    folio10
    1048576
    folio15
    1048576

>>> OSD re-deployment does not work with fs.aio-max-nr = 1048576
    [root@folio01 ubuntu]# cephadm shell --mount ~/osd_spec_crimson.yaml 
    Inferring fsid 548cb644-8cce-11ef-bd89-78ac44504f32
    Inferring config /var/lib/ceph/548cb644-8cce-11ef-bd89-78ac44504f32/mon.folio01/config
    Using ceph image with id '599d00243e97' and tag 'latest' created on 2024-09-18 19:54:38 +0000 UTC
    cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:aab02cb8cb2a143d61ef11ac230e7f04ee1252fb558f1234a8b0e6c29483ae18
    [ceph: root@folio01 /]# ceph orch ls
    2024-10-17T21:37:48.589+0000 7f641fdc9640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:37:48.590+0000 7f641fdc9640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                
    mgr                             3/3  11m ago    6m   label:mgr                                
    mon                             3/3  11m ago    6m   label:mon                                
    osd.osd_crimson_hdd               0  -          3m   folio02;folio03;folio04;folio05;folio09  
    [ceph: root@folio01 /]# ceph osd tree
    2024-10-17T21:37:51.803+0000 7faad5042640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:37:51.804+0000 7faad5042640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
    -1              0  root default                           
     0              0  osd.0           down   1.00000  1.00000
     1              0  osd.1           down   1.00000  1.00000
     2              0  osd.2           down   1.00000  1.00000
     3              0  osd.3           down   1.00000  1.00000
     4              0  osd.4           down   1.00000  1.00000
    [ceph: root@folio01 /]# ceph orch apply -i /mnt/osd_spec_crimson.yaml 
    2024-10-17T21:37:56.635+0000 7f6472706640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:37:56.636+0000 7f6472706640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    Scheduled osd.osd_crimson_hdd update...
    [ceph: root@folio01 /]# ceph osd tree
    2024-10-17T21:38:14.753+0000 7f8ceac98640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:38:14.754+0000 7f8ceac98640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
    -1              0  root default                           
     0              0  osd.0           down   1.00000  1.00000
     1              0  osd.1           down   1.00000  1.00000
     2              0  osd.2           down   1.00000  1.00000
     3              0  osd.3           down   1.00000  1.00000
     4              0  osd.4           down   1.00000  1.00000
    [ceph: root@folio01 /]# ceph orch ls
    2024-10-17T21:38:24.547+0000 7fb31ad8d640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:38:24.547+0000 7fb31ad8d640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                
    mgr                             3/3  12m ago    7m   label:mgr                                
    mon                             3/3  12m ago    6m   label:mon                                
    osd.osd_crimson_hdd               0  -          27s  folio02;folio03;folio04;folio05;folio09  
    [ceph: root@folio01 /]# ceph status
    2024-10-17T21:38:27.913+0000 7f0070271640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:38:27.913+0000 7f0070271640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
      cluster:
        id:     548cb644-8cce-11ef-bd89-78ac44504f32
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum folio01,folio10,folio05 (age 5m)
        mgr: folio01.ehcvql(active, since 12m), standbys: folio10.ygxvkp, folio05.olappc
        osd: 5 osds: 0 up, 5 in (since 4m)
     
      data:
        pools:   0 pools, 0 pgs
        objects: 0 objects, 0 B
        usage:   0 B used, 0 B / 0 B avail
        pgs:

>>> Increasing the value of fs.aio-max-nr to 2097152 (Refer: https://access.redhat.com/solutions/7040846)
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host 'echo "fs.aio-max-nr = 2097152" >> /etc/sysctl.d/99-sysctl.conf' ; done
    folio01
    folio02
    folio03
    folio04
    folio05
    folio09
    folio10
    folio15
    
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do ssh root@$host sysctl -p /etc/sysctl.d/99-sysctl.conf ; done
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    fs.aio-max-nr = 1048576
    fs.aio-max-nr = 2097152
    
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-nr ; done
    folio01
    0
    folio02
    339376
    folio03
    339376
    folio04
    339376
    folio05
    339376
    folio09
    339376
    folio10
    0
    folio15
    0
    
    [root@folio01 ubuntu]# for host in `cat ~/host_list` ; do echo $host ; ssh root@$host cat /proc/sys/fs/aio-max-nr ; done
    folio01
    2097152
    folio02
    2097152
    folio03
    2097152
    folio04
    2097152
    folio05
    2097152
    folio09
    2097152
    folio10
    2097152
    folio15
    2097152

>>> Redeploying Crimson OSDs
    [root@folio01 ubuntu]# cephadm shell --mount ~/osd_spec_crimson.yaml 
    Inferring fsid 548cb644-8cce-11ef-bd89-78ac44504f32
    Inferring config /var/lib/ceph/548cb644-8cce-11ef-bd89-78ac44504f32/mon.folio01/config
    Using ceph image with id '599d00243e97' and tag 'latest' created on 2024-09-18 19:54:38 +0000 UTC
    cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:aab02cb8cb2a143d61ef11ac230e7f04ee1252fb558f1234a8b0e6c29483ae18
    [ceph: root@folio01 /]# ceph orch apply -i /mnt/osd_spec_crimson.yaml 
    2024-10-17T21:55:10.555+0000 7f7468502640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:55:10.555+0000 7f7468502640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    Scheduled osd.osd_crimson_hdd update...
    
    [ceph: root@folio01 /]# ceph orch ls
    2024-10-17T21:55:14.160+0000 7f8051817640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:55:14.160+0000 7f8051817640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                
    mgr                             3/3  6m ago     23m  label:mgr                                
    mon                             3/3  6m ago     23m  label:mon                                
    osd.osd_crimson_hdd              15  2m ago     3s   folio02;folio03;folio04;folio05;folio09  

>>> New OSDs get deployed successfully, originally failed OSDs are also restarted but they fail due to not being able to read all bdev labels properly
    [ceph: root@folio01 /]# ceph osd tree
    2024-10-17T21:55:34.990+0000 7f10f0310640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-17T21:55:34.990+0000 7f10f0310640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    ID  CLASS  WEIGHT    TYPE NAME         STATUS  REWEIGHT  PRI-AFF
    -1         18.55042  root default                               
    -6          3.71008      host folio02                           
     7          1.23669          osd.7         up   1.00000  1.00000
    14          1.23669          osd.14        up   1.00000  1.00000
    19          1.23669          osd.19        up   1.00000  1.00000
    -4          3.71008      host folio03                           
     6          1.23669          osd.6         up   1.00000  1.00000
    12          1.23669          osd.12        up   1.00000  1.00000
    17          1.23669          osd.17        up   1.00000  1.00000
    -5          3.71008      host folio04                           
     8          1.23669          osd.8         up   1.00000  1.00000
    10          1.23669          osd.10        up   1.00000  1.00000
    15          1.23669          osd.15        up   1.00000  1.00000
    -2          3.71008      host folio05                           
     9          1.23669          osd.9         up   1.00000  1.00000
    11          1.23669          osd.11        up   1.00000  1.00000
    16          1.23669          osd.16        up   1.00000  1.00000
    -3          3.71008      host folio09                           
     5          1.23669          osd.5         up   1.00000  1.00000
    13          1.23669          osd.13        up   1.00000  1.00000
    18          1.23669          osd.18        up   1.00000  1.00000
     0                0  osd.0               down         0  1.00000
     1                0  osd.1               down         0  1.00000
     2                0  osd.2               down         0  1.00000
     3                0  osd.3               down         0  1.00000
     4                0  osd.4               down         0  1.00000
 
    [ceph: root@folio01 /]# ceph config dump
    2024-10-18T04:22:12.959+0000 7f8abf246640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-18T04:22:12.960+0000 7f8abf246640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    WHO     MASK          LEVEL     OPTION                                                      VALUE                                                                                                                   RO
    global                basic     container_image                                             cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:aab02cb8cb2a143d61ef11ac230e7f04ee1252fb558f1234a8b0e6c29483ae18  * 
    global                advanced  enable_experimental_unrecoverable_data_corrupting_features  crimson                                                                                                                   
    global                basic     log_to_file                                                 true                                                                                                                      
    global                advanced  mon_cluster_log_to_file                                     true                                                                                                                      
    global                advanced  public_network                                              10.1.172.0/23                                                                                                           * 
    mon                   advanced  auth_allow_insecure_global_id_reclaim                       false                                                                                                                     
    mon                   advanced  osd_pool_default_crimson                                    true                                                                                                                      
    mgr                   advanced  mgr/cephadm/container_init                                  True                                                                                                                    * 
    mgr                   advanced  mgr/cephadm/migration_current                               7                                                                                                                       * 
    mgr                   advanced  mgr/dashboard/ssl_server_port                               8443                                                                                                                    * 
    mgr                   advanced  mgr/orchestrator/orchestrator                               cephadm                                                                                                                   
    osd                   advanced  crimson_seastar_num_threads                                 8                                                                                                                       * 
    osd     host:folio02  basic     osd_memory_target                                           17364155289                                                                                                               
    osd     host:folio03  basic     osd_memory_target                                           17364156723                                                                                                               
    osd     host:folio04  basic     osd_memory_target                                           17364157440                                                                                                               
    osd     host:folio05  basic     osd_memory_target                                           16021979443                                                                                                               
    osd     host:folio09  basic     osd_memory_target                                           17364156723                                                                                                               
    osd                   advanced  osd_memory_target_autotune                                  true

    [ceph: root@folio01 /]# ceph status
    2024-10-18T04:43:18.447+0000 7f276375c640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
    2024-10-18T04:43:18.447+0000 7f276375c640 -1 WARNING: the following dangerous and experimental features are enabled: crimson
      cluster:
        id:     548cb644-8cce-11ef-bd89-78ac44504f32
        health: HEALTH_WARN
                5 failed cephadm daemon(s)
     
      services:
        mon: 3 daemons, quorum folio01,folio10,folio05 (age 7h)
        mgr: folio01.ehcvql(active, since 7h), standbys: folio10.ygxvkp, folio05.olappc
        osd: 20 osds: 15 up (since 6h), 15 in (since 6h)
     
      data:
        pools:   1 pools, 1 pgs
        objects: 2 objects, 449 KiB
        usage:   5.5 TiB used, 13 TiB / 19 TiB avail
        pgs:     1 active+clean

Comment 11 errata-xmlrpc 2026-01-29 06:53:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 9.0 Security and Enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2026:1536