Bug 2231684
| Summary: | keep alive timeout and hosts disconnects when working with ~100+ namespaces | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Aviv Caro <acaro> |
| Component: | NVMeOF | Assignee: | Aviv Caro <aviv.caro> |
| Status: | CLOSED WORKSFORME | QA Contact: | Manohar Murthy <mmurthy> |
| Severity: | high | Docs Contact: | ceph-doc-bot <ceph-doc-bugzilla> |
| Priority: | unspecified | ||
| Version: | 7.0 | CC: | cephqe-warriors, idryomov |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 7.1 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-25 14:32:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Aviv Caro
2023-08-13 09:02:34 UTC
Issue seems to be fixed after taking bump version to 23.01.1 LTS. Re-opening issue as issue is still seen at 23.01.1. Details at https://github.com/ceph/ceph-nvmeof/issues/161#issuecomment-1685718132 Observations after some test runs 1) With a 16GB RAM GW node - Test pass and No KA timeouts, host and subsystem are intact even after a day - 1) http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-BPXH6Y/Scale_to_256_namespaces_in_single_subsystem_on_NVMeOF_GW_0.log 2) http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-BPXH6Y/GW_server.log 3) [root@ceph-nvmf3-bpxh6y-node5 ceph-nvmeof]# cat /proc/meminfo MemTotal: 16107316 kB MemFree: 476232 kB MemAvailable: 1078012 kB 2) With a 8GB RAM GW node - Test fails but not with KA timeout message and GW crashes 1) http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-GRYB7N/Scale_to_256_namespaces_in_single_subsystem_on_NVMeOF_GW_0.log 2) http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-GRYB7N/GW_server.log 3) [root@ceph-nvmf1-gryb7n-node5 ceph-nvmeof]# cat /proc/meminfo MemTotal: 7862076 kB MemFree: 939148 kB MemAvailable: 1239144 kB After some discussions with @orit.was, rlepaksh, and manohar.m - we agreed that for 7.0 TP, we will need at least 16 GB for the GW. We also agreed to reconsider if we can work with less memory for the GA in 7.1. So need to change the target release to 7.1. With upstream container build, Seeing GW crash with containers also with 4GB RAM node at 135 namespace with IO- http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-KDQSHS Whereas on 16GB RAM node - crashes at 391 namespaces with IO - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-Y4GMR7/test_1k_namespace_with_1_subsystem_in_Single_GW_0.log |