OSD prepare pod is stuck while running ceph-volume prepare command. The backing disk for this pv is /dev/sdd on compute-1 node. from ceph-volume logs on compute-1 node where this OSD prepare pod is stuck: ``` [2023-08-11 10:11:21,471][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdd [2023-08-11 10:11:21,492][ceph_volume.process][INFO ] stderr unable to read label for /dev/sdd: (2) No such file or directory [2023-08-11 10:11:21,492][ceph_volume.devices.raw.list][DEBUG ] assuming device /dev/sdd is not BlueStore; ceph-bluestore-tool failed to get info from device: [] ['unable to read label for /dev/sdd: (2) No such file or directory'] [2023-08-11 10:11:21,492][ceph_volume.devices.raw.list][INFO ] device /dev/sdd does not have BlueStore information ``` Could be something to do with the disk
ceph-volume inventory on this disk: ``` sh-5.1# ceph-volume inventory /dev/sdd stderr: lsblk: /dev/sdd: not a block device Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 33, in <module> sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()) File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 41, in __init__ self.main(self.argv) File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 153, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.9/site-packages/ceph_volume/inventory/main.py", line 50, in main self.format_report(Device(self.args.path, with_lsm=self.args.with_lsm)) File "/usr/lib/python3.9/site-packages/ceph_volume/util/device.py", line 131, in __init__ self._parse() File "/usr/lib/python3.9/site-packages/ceph_volume/util/device.py", line 225, in _parse dev = disk.lsblk(self.path) File "/usr/lib/python3.9/site-packages/ceph_volume/util/disk.py", line 243, in lsblk result = lsblk_all(device=device, File "/usr/lib/python3.9/site-packages/ceph_volume/util/disk.py", line 337, in lsblk_all raise RuntimeError(f"Error: {err}") RuntimeError: Error: ['lsblk: /dev/sdd: not a block device'] ```
sh-5.1# lsblk /dev/sdd lsblk: /dev/sdd: not a block device sh-5.1#
Moving out of 4.14 while the bad disk is being investigate
I have no idea about it's reproducibility. It was hit for the first time if I am not wrong.
Ceph is complaining about the disk ``` RuntimeError: Error: ['lsblk: /dev/sdd: not a block device']``` This is coming directly from the `lsblk` output. So issue could be with the disk. Can't investigate further since the cluster is no longer available. Didn't get a chance to look more into the cluster while it was up. Suggesting to retry the scenario and share a fresh cluster. Meanwhile I'm looking into the attached must gather for any clues
nothing interesting the must gather. Requesting the QE to retry this scenario and provide the cluster again. Thanks.
is the issue still reproducible?
(In reply to Santosh Pillai from comment #13) > is the issue still reproducible? It would take time to re-test it. Hence not a blocker for now.
In a new repro, the issue was that the cluster was full and the ceph health was HEALTH_ERR because of the full cluster. By increasing the full ratio [1], the OSD creation was able to complete. Shall we close this issue? [1] In the toolbox run: ceph osd set-full-ratio 0.9
(In reply to Travis Nielsen from comment #15) > In a new repro, the issue was that the cluster was full and the ceph health > was HEALTH_ERR because of the full cluster. By increasing the full ratio > [1], the OSD creation was able to complete. > > Shall we close this issue? > > [1] In the toolbox run: ceph osd set-full-ratio 0.9 Hi Travis, I don't think we should close the issue because adding capacity had issues as osd-prepare jobs were stuck in running state and never compeleted, meaning OSDs were never added until you changed the value for ceph osd set-full-ratio to 0.9. This won't be a feasible solution for customers as we don't recommend them using toolbox, and addition of OSDs when ceph is reporting HEALTH_ERR would always be a problem. We should probably track this BZ to improve adding OSDs in scenarios like this.
Sounds good to keep it open, with the new BZ title to improve this scenario. It does get into a bigger question of what Rook should automatically do when the cluster does fill up. Rook could potentially detect this health error that the OSDs are full and increase the full ratio at the moment of adding a new OSD. But we have to be very conservative with this automation. The admin needs to be aware and likely reduce load on the cluster to avoid issues even while the new OSDs are coming online.
(In reply to Travis Nielsen from comment #17) > Sounds good to keep it open, with the new BZ title to improve this scenario. > It does get into a bigger question of what Rook should automatically do when > the cluster does fill up. Rook could potentially detect this health error > that the OSDs are full and increase the full ratio at the moment of adding a > new OSD. But we have to be very conservative with this automation. The admin > needs to be aware and likely reduce load on the cluster to avoid issues even > while the new OSDs are coming online. I like the idea. Do you think we should raise a warning level alert along with this change to make admins aware about the situation?
(In reply to Aman Agrawal from comment #16) > (In reply to Travis Nielsen from comment #15) > > In a new repro, the issue was that the cluster was full and the ceph health > > was HEALTH_ERR because of the full cluster. By increasing the full ratio > > [1], the OSD creation was able to complete. > > > > Shall we close this issue? > > > > [1] In the toolbox run: ceph osd set-full-ratio 0.9 > > Hi Travis, > > I don't think we should close the issue because adding capacity had issues > as osd-prepare jobs were stuck in running state and never compeleted, > meaning OSDs were never added until you changed the value for ceph osd > set-full-ratio to 0.9. IMO, the original issue for which the BZ was filed is different than. The Original cluster had the following issue: ``` sh-5.1# lsblk /dev/sdd lsblk: /dev/sdd: not a block device sh-5.1#``` The new issue is about OSDs being full due to which user is not able to add new OSDs. So should either change the BZ title/description like Travis mentioned. Or open a new one for not being able to add new OSDs when existing OSDs are running full/near full. > This won't be a feasible solution for customers as we don't recommend them > using toolbox, and addition of OSDs when ceph is reporting HEALTH_ERR would > always be a problem. > > We should probably track this BZ to improve adding OSDs in scenarios like > this.
(In reply to Santosh Pillai from comment #19) > (In reply to Aman Agrawal from comment #16) > > (In reply to Travis Nielsen from comment #15) > > > In a new repro, the issue was that the cluster was full and the ceph health > > > was HEALTH_ERR because of the full cluster. By increasing the full ratio > > > [1], the OSD creation was able to complete. > > > > > > Shall we close this issue? > > > > > > [1] In the toolbox run: ceph osd set-full-ratio 0.9 > > > > Hi Travis, > > > > I don't think we should close the issue because adding capacity had issues > > as osd-prepare jobs were stuck in running state and never compeleted, > > meaning OSDs were never added until you changed the value for ceph osd > > set-full-ratio to 0.9. > > IMO, the original issue for which the BZ was filed is different than. OSDs were full in the original issue as well and probably that's why one of the OSD wasn't added. > The Original cluster had the following issue: > ``` sh-5.1# lsblk /dev/sdd > lsblk: /dev/sdd: not a block device > sh-5.1#``` > > The new issue is about OSDs being full due to which user is not able to add > new OSDs. > > So should either change the BZ title/description like Travis mentioned. Or > open a new one for not being able to add new OSDs when existing OSDs are > running full/near full. > > > > This won't be a feasible solution for customers as we don't recommend them > > using toolbox, and addition of OSDs when ceph is reporting HEALTH_ERR would > > always be a problem. > > > > We should probably track this BZ to improve adding OSDs in scenarios like > > this.
Still needs investigation
Not a blocker
moving it to 4.16 since its not a blocker and workaround is available.
(In reply to Travis Nielsen from comment #17) > Sounds good to keep it open, with the new BZ title to improve this scenario. > It does get into a bigger question of what Rook should automatically do when > the cluster does fill up. Rook could potentially detect this health error > that the OSDs are full and increase the full ratio at the moment of adding a > new OSD. But we have to be very conservative with this automation. The admin > needs to be aware and likely reduce load on the cluster to avoid issues even > while the new OSDs are coming online. Rather than automating, can this be part of the ODF cli tool? The admin has to do it when adding a new OSD while existing OSDs are already full. It would need documentation.
As nice as it would be to adjust the OSD full ratio setting during OSD creation to allow for creation of the OSDs, it also suffers from several issues: - How high to adjust the full ratio? If the default is 85%, should it be 90% or something else? - If already adjusted higher and the OSDs still failed to add for some other reason, how can the customer proceed if the OSDs have now reached the new threshold? - When would the threshold be returned to its previous level? If all goes well, after OSD creation is completed, but what if there is some error? Exposing a command in the new odf CLI tool will better answer these concerns about the automation. While the CLI tool is less optimal since it requires the admin to run it, it does seem a better design: - The admin can decide exactly when to increase the threshold and to what value - The admin can decide when to return the threshold back to its previous value Any concerns with requiring the CLI tool intervention?
(In reply to Travis Nielsen from comment #25) > As nice as it would be to adjust the OSD full ratio setting during OSD > creation to allow for creation of the OSDs, it also suffers from several > issues: > - How high to adjust the full ratio? If the default is 85%, should it be 90% > or something else? > - If already adjusted higher and the OSDs still failed to add for some other > reason, how can the customer proceed if the OSDs have now reached the new > threshold? > - When would the threshold be returned to its previous level? If all goes > well, after OSD creation is completed, but what if there is some error? Agree, these questions will be difficult to answer since the workload might already be running on the cluster. If Rook increases the full ratio to say, 90%, and workloads are still running, the OSDs might reach 90% before the customer has added a new OSD. So it won't be very easy to automate via Rook. > > Exposing a command in the new odf CLI tool will better answer these concerns > about the automation. While the CLI tool is less optimal since it requires > the admin to run it, it does seem a better design: > - The admin can decide exactly when to increase the threshold and to what > value > - The admin can decide when to return the threshold back to its previous > value > > Any concerns with requiring the CLI tool intervention? No concerns. Apart from CLI, it should require documentation effort to help the admin know the exact steps to be taken. Steps I can think of are: - Stop the workload - Increase the full ratio - Add OSDs - Change full ratio back to default. - Wait for data rebalance. Thoughts? @srai and Aman
(In reply to Santosh Pillai from comment #26) > (In reply to Travis Nielsen from comment #25) > > As nice as it would be to adjust the OSD full ratio setting during OSD > > creation to allow for creation of the OSDs, it also suffers from several > > issues: > > - How high to adjust the full ratio? If the default is 85%, should it be 90% > > or something else? > > - If already adjusted higher and the OSDs still failed to add for some other > > reason, how can the customer proceed if the OSDs have now reached the new > > threshold? > > - When would the threshold be returned to its previous level? If all goes > > well, after OSD creation is completed, but what if there is some error? > > Agree, these questions will be difficult to answer since the workload might > already be running on the cluster. If Rook increases the full ratio to say, > 90%, and workloads are still running, the OSDs might reach 90% before the > customer has added a new OSD. So it won't be very easy to automate via Rook. > > > > > Exposing a command in the new odf CLI tool will better answer these concerns > > about the automation. While the CLI tool is less optimal since it requires > > the admin to run it, it does seem a better design: > > - The admin can decide exactly when to increase the threshold and to what > > value > > - The admin can decide when to return the threshold back to its previous > > value > > > > Any concerns with requiring the CLI tool intervention? > > No concerns. Apart from CLI, it should require documentation effort to help > the admin know the exact steps to be taken. Steps I can think of are: > - Stop the workload > - Increase the full ratio > - Add OSDs > - Change full ratio back to default. > - Wait for data rebalance. > > Thoughts? @srai and Aman I don't think it is feasible to recommend stopping of IOs. At times, it's even difficult to do so for QE setups. Isn't there a way to have successful OSD addition without the need to stop IOs in cases like this? This could be with or without changing the current threshold of 85%.
> I don't think it is feasible to recommend stopping of IOs. At times, it's > even difficult to do so for QE setups. > > Isn't there a way to have successful OSD addition without the need to stop > IOs in cases like this? This could be with or without changing the current > threshold of 85%. What is critical is for the new OSDs to be created and to start rebalancing before the cluster fills up again to the new threshold. So stopping IO isn't really necessary, it just puts the cluster at risk of filling up to the new threshold before the OSDs are ready to handle the load. Realistically, this shouldn't be an issue though as long as the OSDs are created immediately.
(In reply to Santosh Pillai from comment #26) > (In reply to Travis Nielsen from comment #25) > > As nice as it would be to adjust the OSD full ratio setting during OSD > > creation to allow for creation of the OSDs, it also suffers from several > > issues: > > - How high to adjust the full ratio? If the default is 85%, should it be 90% > > or something else? > > - If already adjusted higher and the OSDs still failed to add for some other > > reason, how can the customer proceed if the OSDs have now reached the new > > threshold? > > - When would the threshold be returned to its previous level? If all goes > > well, after OSD creation is completed, but what if there is some error? > > Agree, these questions will be difficult to answer since the workload might > already be running on the cluster. If Rook increases the full ratio to say, > 90%, and workloads are still running, the OSDs might reach 90% before the > customer has added a new OSD. So it won't be very easy to automate via Rook. > > > > > Exposing a command in the new odf CLI tool will better answer these concerns > > about the automation. While the CLI tool is less optimal since it requires > > the admin to run it, it does seem a better design: > > - The admin can decide exactly when to increase the threshold and to what > > value > > - The admin can decide when to return the threshold back to its previous > > value > > > > Any concerns with requiring the CLI tool intervention? > > No concerns. Apart from CLI, it should require documentation effort to help > the admin know the exact steps to be taken. Steps I can think of are: > - Stop the workload > - Increase the full ratio > - Add OSDs > - Change full ratio back to default. > - Wait for data rebalance. > > Thoughts? @srai and Aman Yeah sounds good to add in the cli tool and yes, increasing the to 90% may not be the right solution say we'll always cross that limit sometime so better to leave it to admin to update that.
(In reply to Travis Nielsen from comment #28) > > I don't think it is feasible to recommend stopping of IOs. At times, it's > > even difficult to do so for QE setups. > > > > Isn't there a way to have successful OSD addition without the need to stop > > IOs in cases like this? This could be with or without changing the current > > threshold of 85%. > > What is critical is for the new OSDs to be created and to start rebalancing > before the cluster fills up again to the new threshold. > So stopping IO isn't really necessary, it just puts the cluster at risk of > filling up to the new threshold before the OSDs are ready to handle the > load. Realistically, this shouldn't be an issue though as long as the OSDs > are created immediately. ACK, so the feasible solution is to allow immediate addition of OSDs. And I hope with CLI we did not mean interacting with toolbox?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591