Bug 1837493 - After RHCS 4.0 to RHCS 4.1 upgrade, OSDs from one node are not booting up
Summary: After RHCS 4.0 to RHCS 4.1 upgrade, OSDs from one node are not booting up
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 5.*
Assignee: Neha Ojha
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-19 14:39 UTC by karan singh
Modified: 2023-09-15 00:31 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-16 22:22:22 UTC
Embargoed:


Attachments (Terms of Use)
As required attached are the logs (2.58 MB, application/zip)
2020-05-19 19:19 UTC, karan singh
no flags Details
OSD nodes CPU Utilization grafana screenshots (1.68 MB, application/zip)
2020-05-19 19:30 UTC, karan singh
no flags Details
Ceph PG Dump (266.40 KB, text/plain)
2020-05-19 19:38 UTC, karan singh
no flags Details

Description karan singh 2020-05-19 14:39:55 UTC
Description of problem:

After upgrading from RHCS 4.0 to RHCS 4.1 , all OSDs from one of the node are not booting up cleanly. As a results they are begging marked down by peer OSDs / MONs

I am seen a bunch of these message after enabling debug logs


2020-05-19 08:35:36.784 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.987s5_head pgid 18.987s5
2020-05-19 08:35:37.394 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.7a2s3_head pgid 18.7a2s3
2020-05-19 08:35:46.022 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.1728s1_head pgid 18.1728s1
2020-05-19 08:35:48.081 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.148s5_head pgid 18.148s5
2020-05-19 08:35:48.670 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.27f1s2_head pgid 18.27f1s2
2020-05-19 08:35:50.809 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.211cs4_head pgid 18.211cs4
2020-05-19 08:35:52.101 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.4f8s3_head pgid 18.4f8s3
2020-05-19 08:36:00.470 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.35c7s4_head pgid 18.35c7s4
2020-05-19 08:36:01.737 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.2123s0_head pgid 18.2123s0
2020-05-19 08:36:03.773 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.2134s5_head pgid 18.2134s5
2020-05-19 08:36:04.417 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.12d4s0_head pgid 18.12d4s0
2020-05-19 08:36:06.423 7fca97bcddc0 20 osd.8 43581  clearing temps in 17.710_head pgid 17.710
2020-05-19 08:36:06.423 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.2ff7s3_head pgid 18.2ff7s3
2020-05-19 08:36:14.785 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.3f22s5_head pgid 18.3f22s5
2020-05-19 08:36:15.391 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.1daes0_head pgid 18.1daes0
2020-05-19 08:36:17.243 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.2149s0_head pgid 18.2149s0
2020-05-19 08:36:19.015 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.5c4s5_head pgid 18.5c4s5
2020-05-19 08:36:19.591 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.3cabs3_head pgid 18.3cabs3
2020-05-19 08:36:27.661 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.13bas4_head pgid 18.13bas4
2020-05-19 08:36:28.874 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.2d1fs2_head pgid 18.2d1fs2
2020-05-19 08:36:30.947 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.216as4_head pgid 18.216as4
2020-05-19 08:36:32.168 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.21aas5_head pgid 18.21aas5
2020-05-19 08:36:32.750 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.34das2_head pgid 18.34das2
2020-05-19 08:36:34.896 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.21c2s5_head pgid 18.21c2s5
2020-05-19 08:36:35.485 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.21e6s4_head pgid 18.21e6s4
2020-05-19 08:36:36.713 7fca97bcddc0 20 osd.8 43581  clearing temps in 18.22e2s3_head pgid 18.22e2s3


Version-Release number of selected component (if applicable):

RHCS 4.1


How reproducible:


Steps to Reproduce:
1. Keep several million objects in cluster
2. Upgrade Ceph version from RHCS 4.0 to 4.1
3. Check of ALL osds are up and running ?

Actual results:

All OSDs from one are down, while OSDs are trying their best to boot up, seeing bunch of clearing temps in PG logs (above)

Expected results:

Like other nodes, OSDs of this node should also come up clean

Additional info:

Debug logs from affected OSDs : https://pastebin.com/raw/RLwec9mT

Other outputs : https://pastebin.com/raw/fYJDm1Th

Comment 1 karan singh 2020-05-19 14:50:47 UTC
Here is tht output from one of the OSD which has been running since last 2 hours 

[root@rgw-1 ceph]# podman ps | head -1
CONTAINER ID  IMAGE                                                            COMMAND               CREATED             STATUS                 PORTS  NAMES
93b496fdfa83  registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest                                   2 hours ago         Up 2 hours ago                ceph-osd-19


Logs : https://pastebin.com/raw/bMEXescH (unfortunately these are not debug logs, i enabled debuging later )

Comment 2 karan singh 2020-05-19 15:03:47 UTC
Some more logs from the same OSD



8156-f6950be79fa2.1523928.1104_readprepround97040:head by client.1552659.0:7485602 2020-05-18 23:08:11.895158 0
2020-05-19 08:50:23.855 7fca97bcddc0 20 read_log_and_missing 40489'1516 (0'0) modify   18:71f11d72:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.768_readprepround80394:head by client.1527012.0:6742513 2020-05-18 23:08:12.146016 0
2020-05-19 08:50:23.855 7fca97bcddc0 20 read_log_and_missing 40489'1517 (0'0) modify   18:71f360a4:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.806_readprepround3492:head by client.1524210.0:6836207 2020-05-18 23:08:13.612280 0
2020-05-19 08:50:23.855 7fca97bcddc0 20 read_log_and_missing 40506'1518 (0'0) modify   18:71f1334b:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.1120_readprepround87252:head by client.1552644.0:7779797 2020-05-18 23:08:48.789095 0
2020-05-19 08:50:23.855 7fca97bcddc0 20 read_log_and_missing 40623'1519 (0'0) modify   18:71f291cd:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.1120_readprepround89767:head by client.1552644.0:7787160 2020-05-18 23:17:01.783024 0
2020-05-19 08:50:23.861 7fca97bcddc0 10 read_log_and_missing done
2020-05-19 08:50:23.861 7fca97bcddc0 10 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=0 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] handle_initialize
2020-05-19 08:50:23.861 7fca97bcddc0  5 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=0 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] exit Initial 0.036784 0 0.000000
2020-05-19 08:50:23.861 7fca97bcddc0  5 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=0 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] enter Reset
2020-05-19 08:50:23.861 7fca97bcddc0 20 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=0 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] set_last_peering_reset 43579
2020-05-19 08:50:23.861 7fca97bcddc0 10 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=43579 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] Clearing blocked outgoing recovery messages
2020-05-19 08:50:23.861 7fca97bcddc0 10 osd.8 pg_epoch: 43579 pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=43579 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}] Not blocking outgoing recovery messages
2020-05-19 08:50:23.861 7fca97bcddc0  6 write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 4294967295'18446744073709551615, trimmed: , trimmed_dups: , clear_divergent_priors: 0
2020-05-19 08:50:23.861 7fca97bcddc0 10 osd.8 43581 load_pgs loaded pg[18.f8es1( v 40623'1519 lc 40068'1384 (0'0,40623'1519] local-lis/les=40550/40551 n=1457 ec=38437/36901 lis/c 43546/39072 les/c/f 43547/39095/0 43579/43579/42441) [97,8,144,129,255,394]/[97,357,144,129,255,394]p97(0) r=-1 lpr=43579 pi=[39072,43579)/3 crt=40506'1518 lcod 0'0 unknown m=118 mbc={}]
2020-05-19 08:50:23.861 7fca97bcddc0 20 osd.8 43581 register_pg 18.f8es1 0x5585497bc000
2020-05-19 08:50:23.861 7fca97bcddc0 10 osd.8:2._attach_pg 18.f8es1 0x5585497bc000
2020-05-19 08:50:23.864 7fca97bcddc0 10 osd.8 43581 pgid 18.f78s4 coll 18.f78s4_head
2020-05-19 08:50:23.868 7fca97bcddc0 10 osd.8 43581 _make_pg 18.f78s4
2020-05-19 08:50:23.868 7fca97bcddc0  5 osd.8 pg_epoch: 43579 pg[18.f78s4(unlocked)] enter Initial
2020-05-19 08:50:23.868 7fca97bcddc0 20 ErasureCodePluginJerasure: factory: {crush-device-class=,crush-failure-domain=host,crush-root=default,jerasure-per-chunk-alignment=false,k=4,m=2,plugin=jerasure,technique=reed_sol_van,w=8}
2020-05-19 08:50:23.868 7fca97bcddc0 10 ErasureCodeJerasure: technique=reed_sol_van
2020-05-19 08:50:23.868 7fca97bcddc0 20 osd.8 pg_epoch: 43579 pg[18.f78s4(unlocked)] enter NotTrimming
2020-05-19 08:50:23.868 7fca97bcddc0 20 read_log_and_missing coll 18.f78s4_head 4#18:1ef00000::::head#
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'1 (0'0) modify   18:1ef28f21:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.10_finaltestround13376:head by client.1552644.0:8252 2020-05-18 18:48:10.321021 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'2 (0'0) modify   18:1ef1da7f:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.112_finaltestround1638:head by client.1552659.0:26018 2020-05-18 18:48:15.220968 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'3 (0'0) modify   18:1ef07e21:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.174_finaltestround12501:head by client.1552659.0:53418 2020-05-18 18:48:20.852752 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'4 (0'0) modify   18:1ef2f6ae:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.432_finaltestround13304:head by client.1552659.0:102456 2020-05-18 18:48:28.083833 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'5 (0'0) modify   18:1ef3676a:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.591_finaltestround12609:head by client.1552644.0:102682 2020-05-18 18:48:30.424963 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'6 (0'0) modify   18:1ef3f72a:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.329_finaltestround14045:head by client.1552659.0:139482 2020-05-18 18:48:32.834324 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'7 (0'0) modify   18:1ef1536c:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.115_finaltestround1143:head by client.1552644.0:205218 2020-05-18 18:48:42.736130 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'8 (0'0) modify   18:1ef1c391:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.128_finaltestround13711:head by client.1552659.0:246120 2020-05-18 18:48:44.733724 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'9 (0'0) modify   18:1ef25102:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.161_finaltestround13765:head by client.1552644.0:308222 2020-05-18 18:48:53.713128 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'10 (0'0) modify   18:1ef34789:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.23_finaltestround12764:head by client.1552659.0:333352 2020-05-18 18:48:54.128010 0
2020-05-19 08:50:23.870 7fca97bcddc0 20 read_log_and_missing 39504'11 (0'0) modify   18:1ef10fd4:::747eb73a-0107-4d9f-8156-f6950be79fa2.1523928.329_finaltestround12067:head by client.1552659.0:379022 2020-05-18 18:48:58.510291 0

Comment 4 karan singh 2020-05-19 19:19:19 UTC
Created attachment 1689970 [details]
As required attached are the logs

Comment 5 karan singh 2020-05-19 19:29:06 UTC
Hi Neha

So we did a few more things, the rgw data pool has some 45 Million objects ( cosbench data ) so we deleted all pools. I thought when i do so OSDs will not do the checking thing. Just after deleting the pools, OSD utilization on all nodes skyrocketed so had to reboot all nodes one by one. Even after reboot the CPU consumption by OSD containers is super high ( see attached utilization screenshots)


As of now , cluster has
- 0 objects
- 1 pool (.rgw.root)
- rgw-1 host has most of the OSDs down
- CPU % utilization on all need (except rgw-1) node is extremely high (even thought i rebooted all nodes )


[root@rgw-5 ~]# ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       3.6 PiB     3.4 PiB     198 TiB      198 TiB          5.39
    TOTAL     3.6 PiB     3.4 PiB     198 TiB      198 TiB          5.39

POOLS:
    POOL          ID     STORED     OBJECTS     USED     %USED     MAX AVAIL
    .rgw.root     21        0 B           0      0 B         0       1.4 PiB
[root@rgw-5 ~]#


[root@rgw-5 ~]# ceph osd tree
ID  CLASS WEIGHT     TYPE NAME       STATUS REWEIGHT PRI-AFF
 -1       4878.71826 root default
 -7        813.11523     host rgw-1
  0   hdd   15.34180         osd.0     down  1.00000 1.00000
  1   hdd   15.34180         osd.1       up  1.00000 1.00000
  2   hdd   15.34180         osd.2     down  1.00000 1.00000
  3   hdd   15.34180         osd.3       up  1.00000 1.00000
  4   hdd   15.34180         osd.4       up  1.00000 1.00000
  5   hdd   15.34180         osd.5     down  1.00000 1.00000
  6   hdd   15.34180         osd.6     down  1.00000 1.00000
  7   hdd   15.34180         osd.7       up  1.00000 1.00000
  8   hdd   15.34180         osd.8       up  1.00000 1.00000
  9   hdd   15.34180         osd.9       up  1.00000 1.00000
 10   hdd   15.34180         osd.10    down  1.00000 1.00000
 11   hdd   15.34180         osd.11    down  1.00000 1.00000
 14   hdd   15.34180         osd.14    down  1.00000 1.00000
 15   hdd   15.34180         osd.15    down  1.00000 1.00000
 18   hdd   15.34180         osd.18    down  1.00000 1.00000
 19   hdd   15.34180         osd.19    down  1.00000 1.00000
 20   hdd   15.34180         osd.20    down  1.00000 1.00000
 23   hdd   15.34180         osd.23      up  1.00000 1.00000
 24   hdd   15.34180         osd.24      up  1.00000 1.00000
 26   hdd   15.34180         osd.26      up  1.00000 1.00000
 28   hdd   15.34180         osd.28    down  1.00000 1.00000
 29   hdd   15.34180         osd.29    down  1.00000 1.00000
 32   hdd   15.34180         osd.32      up  1.00000 1.00000
 33   hdd   15.34180         osd.33    down  1.00000 1.00000
 35   hdd   15.34180         osd.35    down  1.00000 1.00000
 37   hdd   15.34180         osd.37      up  1.00000 1.00000
 38   hdd   15.34180         osd.38    down  1.00000 1.00000
 41   hdd   15.34180         osd.41      up  1.00000 1.00000
 42   hdd   15.34180         osd.42    down  1.00000 1.00000
 45   hdd   15.34180         osd.45    down  1.00000 1.00000
 46   hdd   15.34180         osd.46      up  1.00000 1.00000
 47   hdd   15.34180         osd.47    down  1.00000 1.00000
 50   hdd   15.34180         osd.50      up  1.00000 1.00000
 51   hdd   15.34180         osd.51      up  1.00000 1.00000
 54   hdd   15.34180         osd.54    down  1.00000 1.00000
 55   hdd   15.34180         osd.55    down  1.00000 1.00000
 58   hdd   15.34180         osd.58    down  1.00000 1.00000
 59   hdd   15.34180         osd.59      up  1.00000 1.00000
 61   hdd   15.34180         osd.61    down  1.00000 1.00000
 63   hdd   15.34180         osd.63    down  1.00000 1.00000
 65   hdd   15.34180         osd.65    down  1.00000 1.00000
 67   hdd   15.34180         osd.67    down  1.00000 1.00000
 68   hdd   15.34180         osd.68      up  1.00000 1.00000
 71   hdd   15.34180         osd.71    down  1.00000 1.00000
 72   hdd   15.34180         osd.72    down  1.00000 1.00000
 75   hdd   15.34180         osd.75    down  1.00000 1.00000
 76   hdd   15.34180         osd.76    down  1.00000 1.00000
 79   hdd   15.34180         osd.79    down  1.00000 1.00000
 80   hdd   15.34180         osd.80      up  1.00000 1.00000
 83   hdd   15.34180         osd.83      up  1.00000 1.00000
 84   hdd   15.34180         osd.84    down  1.00000 1.00000
 87   hdd   15.34180         osd.87    down  1.00000 1.00000
 88   hdd   15.34180         osd.88    down  1.00000 1.00000
 -5        813.11523     host rgw-2
 13   hdd   15.34180         osd.13      up  1.00000 1.00000
 17   hdd   15.34180         osd.17      up  1.00000 1.00000
 22   hdd   15.34180         osd.22      up  1.00000 1.00000
 27   hdd   15.34180         osd.27      up  1.00000 1.00000
 31   hdd   15.34180         osd.31      up  1.00000 1.00000
 36   hdd   15.34180         osd.36      up  1.00000 1.00000
 40   hdd   15.34180         osd.40      up  1.00000 1.00000
 44   hdd   15.34180         osd.44      up  1.00000 1.00000
 49   hdd   15.34180         osd.49      up  1.00000 1.00000
 53   hdd   15.34180         osd.53      up  1.00000 1.00000
 57   hdd   15.34180         osd.57      up  1.00000 1.00000
 62   hdd   15.34180         osd.62      up  1.00000 1.00000
 66   hdd   15.34180         osd.66      up  1.00000 1.00000
 70   hdd   15.34180         osd.70      up  1.00000 1.00000
 74   hdd   15.34180         osd.74      up  1.00000 1.00000
 78   hdd   15.34180         osd.78      up  1.00000 1.00000
 82   hdd   15.34180         osd.82      up  1.00000 1.00000
 86   hdd   15.34180         osd.86      up  1.00000 1.00000
 90   hdd   15.34180         osd.90      up  1.00000 1.00000
 92   hdd   15.34180         osd.92      up  1.00000 1.00000
 94   hdd   15.34180         osd.94      up  1.00000 1.00000
 96   hdd   15.34180         osd.96      up  1.00000 1.00000
 98   hdd   15.34180         osd.98      up  1.00000 1.00000
100   hdd   15.34180         osd.100     up  1.00000 1.00000
102   hdd   15.34180         osd.102     up  1.00000 1.00000
104   hdd   15.34180         osd.104     up  1.00000 1.00000
106   hdd   15.34180         osd.106     up  1.00000 1.00000
108   hdd   15.34180         osd.108     up  1.00000 1.00000
110   hdd   15.34180         osd.110     up  1.00000 1.00000
112   hdd   15.34180         osd.112     up  1.00000 1.00000
114   hdd   15.34180         osd.114     up  1.00000 1.00000
116   hdd   15.34180         osd.116     up  1.00000 1.00000
118   hdd   15.34180         osd.118     up  1.00000 1.00000
120   hdd   15.34180         osd.120     up  1.00000 1.00000
122   hdd   15.34180         osd.122     up  1.00000 1.00000
124   hdd   15.34180         osd.124     up  1.00000 1.00000
126   hdd   15.34180         osd.126     up  1.00000 1.00000
128   hdd   15.34180         osd.128     up  1.00000 1.00000
130   hdd   15.34180         osd.130     up  1.00000 1.00000
132   hdd   15.34180         osd.132     up  1.00000 1.00000
134   hdd   15.34180         osd.134     up  1.00000 1.00000
136   hdd   15.34180         osd.136     up  1.00000 1.00000
138   hdd   15.34180         osd.138     up  1.00000 1.00000
140   hdd   15.34180         osd.140     up  1.00000 1.00000
142   hdd   15.34180         osd.142     up  1.00000 1.00000
144   hdd   15.34180         osd.144     up  1.00000 1.00000
146   hdd   15.34180         osd.146     up  1.00000 1.00000
148   hdd   15.34180         osd.148     up  1.00000 1.00000
150   hdd   15.34180         osd.150     up  1.00000 1.00000
152   hdd   15.34180         osd.152     up  1.00000 1.00000
154   hdd   15.34180         osd.154     up  1.00000 1.00000
156   hdd   15.34180         osd.156     up  1.00000 1.00000
158   hdd   15.34180         osd.158     up  1.00000 1.00000
 -3        813.11523     host rgw-3
 43   hdd   15.34180         osd.43      up  1.00000 1.00000
 52   hdd   15.34180         osd.52      up  1.00000 1.00000
107   hdd   15.34180         osd.107     up  1.00000 1.00000
109   hdd   15.34180         osd.109     up  1.00000 1.00000
111   hdd   15.34180         osd.111     up  1.00000 1.00000
113   hdd   15.34180         osd.113     up  1.00000 1.00000
115   hdd   15.34180         osd.115     up  1.00000 1.00000
117   hdd   15.34180         osd.117     up  1.00000 1.00000
119   hdd   15.34180         osd.119     up  1.00000 1.00000
121   hdd   15.34180         osd.121     up  1.00000 1.00000
123   hdd   15.34180         osd.123     up  1.00000 1.00000
125   hdd   15.34180         osd.125     up  1.00000 1.00000
127   hdd   15.34180         osd.127     up  1.00000 1.00000
129   hdd   15.34180         osd.129     up  1.00000 1.00000
131   hdd   15.34180         osd.131     up  1.00000 1.00000
133   hdd   15.34180         osd.133     up  1.00000 1.00000
135   hdd   15.34180         osd.135     up  1.00000 1.00000
137   hdd   15.34180         osd.137     up  1.00000 1.00000
139   hdd   15.34180         osd.139     up  1.00000 1.00000
141   hdd   15.34180         osd.141     up  1.00000 1.00000
143   hdd   15.34180         osd.143     up  1.00000 1.00000
145   hdd   15.34180         osd.145     up  1.00000 1.00000
147   hdd   15.34180         osd.147     up  1.00000 1.00000
149   hdd   15.34180         osd.149     up  1.00000 1.00000
151   hdd   15.34180         osd.151     up  1.00000 1.00000
153   hdd   15.34180         osd.153     up  1.00000 1.00000
155   hdd   15.34180         osd.155     up  1.00000 1.00000
157   hdd   15.34180         osd.157     up  1.00000 1.00000
187   hdd   15.34180         osd.187     up  1.00000 1.00000
188   hdd   15.34180         osd.188     up  1.00000 1.00000
190   hdd   15.34180         osd.190     up  1.00000 1.00000
191   hdd   15.34180         osd.191     up  1.00000 1.00000
192   hdd   15.34180         osd.192     up  1.00000 1.00000
193   hdd   15.34180         osd.193     up  1.00000 1.00000
194   hdd   15.34180         osd.194     up  1.00000 1.00000
196   hdd   15.34180         osd.196     up  1.00000 1.00000
197   hdd   15.34180         osd.197     up  1.00000 1.00000
198   hdd   15.34180         osd.198     up  1.00000 1.00000
199   hdd   15.34180         osd.199     up  1.00000 1.00000
200   hdd   15.34180         osd.200     up  1.00000 1.00000
202   hdd   15.34180         osd.202     up  1.00000 1.00000
203   hdd   15.34180         osd.203     up  1.00000 1.00000
204   hdd   15.34180         osd.204     up  1.00000 1.00000
205   hdd   15.34180         osd.205     up  1.00000 1.00000
206   hdd   15.34180         osd.206     up  1.00000 1.00000
208   hdd   15.34180         osd.208     up  1.00000 1.00000
209   hdd   15.34180         osd.209     up  1.00000 1.00000
210   hdd   15.34180         osd.210     up  1.00000 1.00000
211   hdd   15.34180         osd.211     up  1.00000 1.00000
212   hdd   15.34180         osd.212     up  1.00000 1.00000
214   hdd   15.34180         osd.214     up  1.00000 1.00000
215   hdd   15.34180         osd.215     up  1.00000 1.00000
216   hdd   15.34180         osd.216     up  1.00000 1.00000
 -9        813.11523     host rgw-4
265   hdd   15.34180         osd.265     up  1.00000 1.00000
268   hdd   15.34180         osd.268     up  1.00000 1.00000
271   hdd   15.34180         osd.271     up  1.00000 1.00000
273   hdd   15.34180         osd.273     up  1.00000 1.00000
276   hdd   15.34180         osd.276     up  1.00000 1.00000
279   hdd   15.34180         osd.279     up  1.00000 1.00000
281   hdd   15.34180         osd.281     up  1.00000 1.00000
284   hdd   15.34180         osd.284     up  1.00000 1.00000
287   hdd   15.34180         osd.287     up  1.00000 1.00000
290   hdd   15.34180         osd.290     up  1.00000 1.00000
293   hdd   15.34180         osd.293     up  1.00000 1.00000
296   hdd   15.34180         osd.296     up  1.00000 1.00000
299   hdd   15.34180         osd.299     up  1.00000 1.00000
301   hdd   15.34180         osd.301     up  1.00000 1.00000
304   hdd   15.34180         osd.304     up  1.00000 1.00000
307   hdd   15.34180         osd.307     up  1.00000 1.00000
310   hdd   15.34180         osd.310     up  1.00000 1.00000
313   hdd   15.34180         osd.313     up  1.00000 1.00000
316   hdd   15.34180         osd.316     up  1.00000 1.00000
319   hdd   15.34180         osd.319     up  1.00000 1.00000
321   hdd   15.34180         osd.321     up  1.00000 1.00000
324   hdd   15.34180         osd.324     up  1.00000 1.00000
327   hdd   15.34180         osd.327     up  1.00000 1.00000
329   hdd   15.34180         osd.329     up  1.00000 1.00000
332   hdd   15.34180         osd.332     up  1.00000 1.00000
335   hdd   15.34180         osd.335     up  1.00000 1.00000
338   hdd   15.34180         osd.338     up  1.00000 1.00000
341   hdd   15.34180         osd.341     up  1.00000 1.00000
344   hdd   15.34180         osd.344     up  1.00000 1.00000
347   hdd   15.34180         osd.347     up  1.00000 1.00000
349   hdd   15.34180         osd.349     up  1.00000 1.00000
352   hdd   15.34180         osd.352     up  1.00000 1.00000
354   hdd   15.34180         osd.354     up  1.00000 1.00000
357   hdd   15.34180         osd.357     up  1.00000 1.00000
360   hdd   15.34180         osd.360     up  1.00000 1.00000
363   hdd   15.34180         osd.363     up  1.00000 1.00000
366   hdd   15.34180         osd.366     up  1.00000 1.00000
369   hdd   15.34180         osd.369     up  1.00000 1.00000
372   hdd   15.34180         osd.372     up  1.00000 1.00000
375   hdd   15.34180         osd.375     up  1.00000 1.00000
378   hdd   15.34180         osd.378     up  1.00000 1.00000
380   hdd   15.34180         osd.380     up  1.00000 1.00000
382   hdd   15.34180         osd.382     up  1.00000 1.00000
385   hdd   15.34180         osd.385     up  1.00000 1.00000
388   hdd   15.34180         osd.388     up  1.00000 1.00000
391   hdd   15.34180         osd.391     up  1.00000 1.00000
394   hdd   15.34180         osd.394     up  1.00000 1.00000
397   hdd   15.34180         osd.397     up  1.00000 1.00000
399   hdd   15.34180         osd.399     up  1.00000 1.00000
402   hdd   15.34180         osd.402     up  1.00000 1.00000
404   hdd   15.34180         osd.404     up  1.00000 1.00000
407   hdd   15.34180         osd.407     up  1.00000 1.00000
410   hdd   15.34180         osd.410     up  1.00000 1.00000
-13        813.14203     host rgw-5
189   hdd   15.34279         osd.189     up  1.00000 1.00000
195   hdd   15.34279         osd.195     up  1.00000 1.00000
201   hdd   15.34180         osd.201     up  1.00000 1.00000
207   hdd   15.34180         osd.207     up  1.00000 1.00000
213   hdd   15.34180         osd.213     up  1.00000 1.00000
217   hdd   15.34180         osd.217     up  1.00000 1.00000
218   hdd   15.34279         osd.218     up  1.00000 1.00000
219   hdd   15.34180         osd.219     up  1.00000 1.00000
220   hdd   15.34279         osd.220     up  1.00000 1.00000
221   hdd   15.34180         osd.221     up  1.00000 1.00000
222   hdd   15.34279         osd.222     up  1.00000 1.00000
223   hdd   15.34180         osd.223     up  1.00000 1.00000
224   hdd   15.34279         osd.224     up  1.00000 1.00000
225   hdd   15.34180         osd.225     up  1.00000 1.00000
226   hdd   15.34279         osd.226     up  1.00000 1.00000
227   hdd   15.34180         osd.227     up  1.00000 1.00000
228   hdd   15.34279         osd.228     up  1.00000 1.00000
229   hdd   15.34180         osd.229     up  1.00000 1.00000
230   hdd   15.34279         osd.230     up  1.00000 1.00000
231   hdd   15.34180         osd.231     up  1.00000 1.00000
232   hdd   15.34279         osd.232     up  1.00000 1.00000
233   hdd   15.34180         osd.233     up  1.00000 1.00000
234   hdd   15.34180         osd.234     up  1.00000 1.00000
235   hdd   15.34279         osd.235     up  1.00000 1.00000
236   hdd   15.34180         osd.236     up  1.00000 1.00000
237   hdd   15.34279         osd.237     up  1.00000 1.00000
238   hdd   15.34180         osd.238     up  1.00000 1.00000
239   hdd   15.34279         osd.239     up  1.00000 1.00000
240   hdd   15.34180         osd.240     up  1.00000 1.00000
241   hdd   15.34279         osd.241     up  1.00000 1.00000
242   hdd   15.34180         osd.242     up  1.00000 1.00000
243   hdd   15.34279         osd.243     up  1.00000 1.00000
244   hdd   15.34180         osd.244     up  1.00000 1.00000
245   hdd   15.34279         osd.245     up  1.00000 1.00000
246   hdd   15.34180         osd.246     up  1.00000 1.00000
247   hdd   15.34279         osd.247     up  1.00000 1.00000
248   hdd   15.34180         osd.248     up  1.00000 1.00000
249   hdd   15.34279         osd.249     up  1.00000 1.00000
250   hdd   15.34180         osd.250     up  1.00000 1.00000
251   hdd   15.34279         osd.251     up  1.00000 1.00000
252   hdd   15.34180         osd.252     up  1.00000 1.00000
253   hdd   15.34279         osd.253     up  1.00000 1.00000
254   hdd   15.34180         osd.254     up  1.00000 1.00000
255   hdd   15.34279         osd.255     up  1.00000 1.00000
256   hdd   15.34180         osd.256     up  1.00000 1.00000
257   hdd   15.34279         osd.257     up  1.00000 1.00000
258   hdd   15.34180         osd.258     up  1.00000 1.00000
259   hdd   15.34279         osd.259     up  1.00000 1.00000
260   hdd   15.34180         osd.260     up  1.00000 1.00000
261   hdd   15.34279         osd.261     up  1.00000 1.00000
262   hdd   15.34279         osd.262     up  1.00000 1.00000
263   hdd   15.34279         osd.263     up  1.00000 1.00000
264   hdd   15.34279         osd.264     up  1.00000 1.00000
-11        813.11523     host rgw-6
 12   hdd   15.34180         osd.12      up  1.00000 1.00000
 16   hdd   15.34180         osd.16      up  1.00000 1.00000
 21   hdd   15.34180         osd.21      up  1.00000 1.00000
 25   hdd   15.34180         osd.25      up  1.00000 1.00000
 30   hdd   15.34180         osd.30      up  1.00000 1.00000
 34   hdd   15.34180         osd.34      up  1.00000 1.00000
 39   hdd   15.34180         osd.39      up  1.00000 1.00000
 48   hdd   15.34180         osd.48      up  1.00000 1.00000
 56   hdd   15.34180         osd.56      up  1.00000 1.00000
 60   hdd   15.34180         osd.60      up  1.00000 1.00000
 64   hdd   15.34180         osd.64      up  1.00000 1.00000
 69   hdd   15.34180         osd.69      up  1.00000 1.00000
 73   hdd   15.34180         osd.73      up  1.00000 1.00000
 77   hdd   15.34180         osd.77      up  1.00000 1.00000
 81   hdd   15.34180         osd.81      up  1.00000 1.00000
 85   hdd   15.34180         osd.85      up  1.00000 1.00000
 89   hdd   15.34180         osd.89      up  1.00000 1.00000
 91   hdd   15.34180         osd.91      up  1.00000 1.00000
 93   hdd   15.34180         osd.93      up  1.00000 1.00000
 95   hdd   15.34180         osd.95      up  1.00000 1.00000
 97   hdd   15.34180         osd.97      up  1.00000 1.00000
 99   hdd   15.34180         osd.99      up  1.00000 1.00000
101   hdd   15.34180         osd.101     up  1.00000 1.00000
103   hdd   15.34180         osd.103     up  1.00000 1.00000
105   hdd   15.34180         osd.105     up  1.00000 1.00000
159   hdd   15.34180         osd.159     up  1.00000 1.00000
160   hdd   15.34180         osd.160     up  1.00000 1.00000
161   hdd   15.34180         osd.161     up  1.00000 1.00000
162   hdd   15.34180         osd.162     up  1.00000 1.00000
163   hdd   15.34180         osd.163     up  1.00000 1.00000
164   hdd   15.34180         osd.164     up  1.00000 1.00000
165   hdd   15.34180         osd.165     up  1.00000 1.00000
166   hdd   15.34180         osd.166     up  1.00000 1.00000
167   hdd   15.34180         osd.167     up  1.00000 1.00000
168   hdd   15.34180         osd.168     up  1.00000 1.00000
169   hdd   15.34180         osd.169     up  1.00000 1.00000
170   hdd   15.34180         osd.170     up  1.00000 1.00000
171   hdd   15.34180         osd.171     up  1.00000 1.00000
172   hdd   15.34180         osd.172     up  1.00000 1.00000
173   hdd   15.34180         osd.173     up  1.00000 1.00000
174   hdd   15.34180         osd.174     up  1.00000 1.00000
175   hdd   15.34180         osd.175     up  1.00000 1.00000
176   hdd   15.34180         osd.176     up  1.00000 1.00000
177   hdd   15.34180         osd.177     up  1.00000 1.00000
178   hdd   15.34180         osd.178     up  1.00000 1.00000
179   hdd   15.34180         osd.179     up  1.00000 1.00000
180   hdd   15.34180         osd.180     up  1.00000 1.00000
181   hdd   15.34180         osd.181     up  1.00000 1.00000
182   hdd   15.34180         osd.182     up  1.00000 1.00000
183   hdd   15.34180         osd.183     up  1.00000 1.00000
184   hdd   15.34180         osd.184     up  1.00000 1.00000
185   hdd   15.34180         osd.185     up  1.00000 1.00000
186   hdd   15.34180         osd.186     up  1.00000 1.00000
[root@rgw-5 ~]#


[root@rgw-5 ~]# ceph -s
  cluster:
    id:     ebe0aa4b-4fb5-4c68-84ab-cbf1118937a2
    health: HEALTH_WARN
            nodown,noout,norebalance,norecover flag(s) set
            34 osds down
            Long heartbeat ping times on back interface seen, longest is 27716.598 msec
            Long heartbeat ping times on front interface seen, longest is 27723.520 msec
            Reduced data availability: 16 pgs inactive, 12 pgs peering, 2 pgs stale

  services:
    mon: 3 daemons, quorum rgw-1,rgw-2,rgw-3 (age 36m)
    mgr: rgw-4(active, since 100m), standbys: rgw-6, rgw-5
    osd: 318 osds: 284 up (since 83s), 318 in (since 4h); 10 remapped pgs
         flags nodown,noout,norebalance,norecover

  data:
    pools:   1 pools, 32 pgs
    objects: 0 objects, 0 B
    usage:   199 TiB used, 3.4 PiB / 3.6 PiB avail
    pgs:     9.375% pgs unknown
             40.625% pgs not active
             11 active+clean
             7  peering
             5  active+clean+remapped
             3  unknown
             3  remapped+peering
             2  stale+peering
             1  activating

[root@rgw-5 ~]#

Comment 6 karan singh 2020-05-19 19:30:15 UTC
Created attachment 1689971 [details]
OSD nodes CPU Utilization grafana screenshots

Comment 7 karan singh 2020-05-19 19:38:33 UTC
Created attachment 1689972 [details]
Ceph PG Dump

One more interesting output is the ceph pg dump (attached here)

As i mentioned, i deleted all the pools (approx 45 Million objects) and then got .rgw.root pool auto-created by one of the running RGW with 128 PGs. The PG dump output has a lot of stale/undeleted entries of PG, which should get deleted as i delete the pool.

Do you think these stale entries are causing 1) super high cup utilization on all OSD nodes 2) causing OSDs in RGW-1 no to boot up ?

Comment 11 Red Hat Bugzilla 2023-09-15 00:31:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.