Bug 1987034
| Summary: | [Multus][VMware] Ceph status reporting osds down and slow ops during or after OCS deployment | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Sidhant Agrawal <sagrawal> |
| Component: | rook | Assignee: | Rohan Gupta <rohgupta> |
| Status: | CLOSED WORKSFORME | QA Contact: | Elad <ebenahar> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | madam, muagarwa, ocs-bugs, odf-bz-bot, rohgupta, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-29 09:19:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 3
Travis Nielsen
2021-07-28 19:33:28 UTC
Removing blocker flag which was added due to the urgent severity Can you define exactly when this issue started reproing? When it happens, can we get a connection to the live cluster to debug? (Rohan/Seb/me) Not a blocker for 4.8 Rohan can you take a look? I see that there is network connection issue in the cluster from the logs
debug 2021-08-04 19:20:05.453 7fb954953700 1 osd.0 pg_epoch: 89 pg[10.5( empty local-lis/les=74/75 n=0 ec=48/48 lis/c 83/68 les/c/f 84/69/0 89/89/89) [0,2,1] r=0 lpr=89 pi=[68,89)/3 crt=0'0 mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary
debug 2021-08-04 19:20:06.455 7fb961767700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.0 down, but it is still running
debug 2021-08-04 19:20:06.455 7fb961767700 0 log_channel(cluster) log [DBG] : map e90 wrongly marked me down at e90
debug 2021-08-04 19:20:06.455 7fb961767700 0 osd.0 90 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
debug 2021-08-04 19:20:06.455 7fb961767700 1 osd.0 90 start_waiting_for_healthy
debug 2021-08-04 19:20:06.459 7fb954152700 1 osd.0 pg_epoch: 90 pg[10.1f( empty local-lis/les=81/82 n=0 ec=48/48 lis/c 83/81 les/c/f 84/82/0 90/90/74) [1,2] r=-1 lpr=90 pi=[81,90)/1 crt=0'0 unknown NOTIFY mbc={}] start_peering_interval up [1,2,0] -> [1,2], acting [1,2,0] -> [1,2], acting_primary 1 -> 1, up_primary 1 -> 1, role 2 -> -1, features acting 4611087858330828799 upacting 4611087858330828799
At certain intervals the network connectivity between the pods on the multus network gets broken.
Sidhant, is this still reproducible? |