+++ This bug was initially created as a clone of Bug #2257730 +++ +++ This bug was initially created as a clone of Bug #2239951 +++ Description of problem: - Initially the attached case was submitted because the customer was experiencing the following error message: "MDS_CLIENT_OLDEST_TID: 15 clients failing to advance oldest client/flush tid" - The customer then took action to update the cluster. In the customers own word: ~~~ Around 3:40am Eastern today (09/15) some of our mds's failed and standbys took over and we were left with insufficient standbys. The missing standbys were identified and systemctl restarted. After which the "failing to advance TID" errors went away. This cluster has mds issues, since upgrading to 6.1 it has got worse. if you manually fail 1 mds it will take multiple others with it. (This was happening under 5.x as well) Under 6.1 when this happens the standbys take over and the failed mds stay in some kind of hung state. These hung mds will not show up as standbys until they are bounced via systemctl. (ceph orch daemon restart mds.instance <-- this doesn't work in this state, only systemctl restart works) ~~~ - It's important to note that this cluster has another bz/hotfix pending resolution/deployment. That bz is located at: https://bugzilla.redhat.com/show_bug.cgi?id=2228635 - Please also note that on 09/18 client jobs were failing due to issues with mds, but most of the errors at that time pertained to errors regarding behind on trimming, blocked ops, clients failing to release caps, and they saw these in mds.5: "failed to authpin, dir is being fragmented" - The primary issue for this particular bug is to address the the following errors: HEALTH_WARN 2 clients failing to advance oldest client/flush tid [WRN] MDS_CLIENT_OLDEST_TID: 2 clients failing to advance oldest client/flush tid mds.root.host11.emjjsf(mds.1): Client client75:pthnrt failing to advance its oldest client/flush tid. client_id: 69394526 mds.root.host10.fckajv(mds.4): Client client75:pthnrt failing to advance its oldest client/flush tid. client_id: 69394526 - I'm fully aware that this could indeed be related to bz: https://bugzilla.redhat.com/show_bug.cgi?id=2228635, but I'm requesting confirmation of that fact from engineering on this bz given that this customer is experiencing frequent problems with utilizing mds. Version-Release number of selected component (if applicable): How reproducible: Only reproducible in customer's environment Steps to Reproduce: 1. 2. 3. Actual results: "MDS_CLIENT_OLDEST_TID" occur Expected results: "MDS_CLIENT_OLDEST_TID" do not occur Additional info: - The customer has uploaded sosreports from the failing clients and Ceph mon node. They've also uploaded mds debug logs at attachment "0100-ceph-mds.debug.logs.all.9.mds.instances.tar.gz". Here's the mds layout before mds.4 was failed over and debug mds logs were collected: ==== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active root.host13.kpbxhd Reqs: 133 /s 426k 424k 10.6k 142k 1 active root.host11.emjjsf Reqs: 47 /s 8899k 8897k 10.9k 1668k 2 active root.host12.fzjadk Reqs: 37 /s 19.4M 19.3M 35.8k 1067k 3 active root.host6.gxtzai Reqs: 42 /s 19.1M 19.1M 93.7k 960k 4 active root.host10.fckajv Reqs: 1257 /s 17.7M 17.7M 2173k 6024k 5 active root.host7.vainnz Reqs: 289 /s 14.2M 14.2M 64.7k 2506k - Please let me know if you require any further information. Thanks -Brandon --- Additional comment from Venky Shankar on 2023-09-25 16:32:31 IST --- Similar to BZ2238319 - esp. BZ.2238319#c12 --- Additional comment from on 2023-09-27 00:34:24 IST --- Hi Venky, Thank you for responding. I just wanted to get some clarification on a few things. You stated that this issue is similar to "BZ.2238319#c12". I'm a bit confused as to how since this is standalone Ceph and not ODF. A couple of questions: Are you seeing evidence of the selinux relabelling issue present in this case? (I ask because I have never known this to be an issue outside of ODF) Are the "MDS_CLIENT_OLDEST_TID" errors we see ultimately a result of https://bugzilla.redhat.com/show_bug.cgi?id=2238663 in your opinion? Thanks -Brandon --- Additional comment from on 2023-09-29 21:50:17 IST --- Hi Venky, Just checking-in to see if I can get your input on my last post. Thanks -Brandon --- Additional comment from on 2023-09-30 06:04:56 IST --- The customer has updated the case: --- hotfix applied to this cluster. ceph version 17.2.6-100.0.hotfix.bz2228635.el9cp after upgrade we still have: 5 clients failing to advance oldest client/flush tid --- I've also asked for the following the next time we have mds hang: --- $ ceph daemon mds.rank dump_ops_in_flight > dump_ops_in_flight.txt $ ceph daemon mds.rank dump_historic_ops > dump_historic_ops.txt $ ceph daemon mds.rank ops > ops.txt $ ceph daemon mds.rank session ls > session_ls.txt $ ceph daemon mds.rank client ls > client_ls.tx --- The rest of the data from https://access.redhat.com/solutions/7031034 should be available in supportshell via various sosreports from the effected clients, monitor nodes, and debug mds logs are available at 0100-ceph-mds.debug.logs.all.9.mds.instances.tar.gz. Here's a list of all of the attachments in supportshel: [bmcmurra@supportshell-1 03614313]$ ls 0010-ceph.health.detail.txt 0070-soscleaner-6643370664242141.client3.tar.gz 0130-core.ceph-mgr.167.d93f36e12cd94a0881642abfd94b5a74.20977.1695063963000000.lz4 0020-sosreport-host13-mds.1-2023-09-14-ebfnpen-obfuscated.tar.xz 0080-sosreport-host9-mds.4-2023-09-14-raykejc-obfuscated.tar.xz 0140-sosreport-host0-03614313-2023-09-18-mivyczp-obfuscated.tar.xz 0030-soscleaner-9724157162355248.client2.tar.gz 0090-sosreport-host14-mds.5-2023-09-14-wqimlsp-obfuscated.tar.xz sosreport-20230914-123211 0040-sosreport-host9-mds.2-2023-09-14-yvzskfq-obfuscated.tar.xz 0100-ceph-mds.debug.logs.all.9.mds.instances.tar.gz sosreport-20230918-190716 0050-sosreport-host13-mds.0-2023-09-14-imfqgvb-obfuscated.tar.xz 0110-soscleaner-2114610557032500.tar.gz 0060-soscleaner-7122134623234608.client1.tar.gz 0120-mgr.host1.axzzmg.crash.info.txt Let me know if you require any more data. Thanks -Brandon --- Additional comment from Venky Shankar on 2023-10-03 10:56:01 IST --- Hi Brandon, (In reply to bmcmurra from comment #2) > Hi Venky, > > Thank you for responding. I just wanted to get some clarification on a few > things. You stated that this issue is similar to "BZ.2238319#c12". I'm a bit > confused as to how since this is standalone Ceph and not ODF. A couple of > questions: > > Are you seeing evidence of the selinux relabelling issue present in this > case? (I ask because I have never known this to be an issue outside of ODF) The issue is due to a bug (likely in the MDS) that causes buildup of session metadata to a point that when the metadata is flushed by the MDS to RADOS, bad things happen since essentially there is a per-transaction limit on the op size that RADOS can handle. With selinux is picture, the problem just aggravates. > Are the "MDS_CLIENT_OLDEST_TID" errors we see ultimately a result of > https://bugzilla.redhat.com/show_bug.cgi?id=2238663 in your opinion? BZ2238663 is a temporary workaround where the MDS would blocklist the client that is experiencing buildup of session metadata. Without this, the MDS would go read-only since RADOS would reject the op (due to the size limit explained above). (In reply to bmcmurra from comment #3) > Hi Venky, > > Just checking-in to see if I can get your input on my last post. Sorry for the late reply (was on PTO -- public holiday). --- Additional comment from on 2023-10-10 23:40:17 IST --- Hi Venky, Thanks for your response. I've relayed it to the customer. They've came back with some questions/concerns: --- while I was on PTO we had a couple mds failovers due to slow/blocked ops and the requested logs where uploaded to case 03573559. As of today this cluster has "HEALTH_WARN 10 clients failing to advance oldest client/flush tid" Let me know if you want additional debug logs / sosreports.... So the BZ2238663 would just blocklist these clients? Would the expectation be that the clients should be rebooted? Client side reboots/down time is not something we can handle in this environment. These clients have a single monthly maintenance window of 4 hours. Any client outage outside that window requires a P1 case and RCA. blocklisting/reboots are the absolute last ditch effort to be avoided if possible. --- Let me know if we need anymore data from the customer to answer these questions. Thanks -Brandon --- Additional comment from Venky Shankar on 2023-10-11 10:16:00 IST --- Hi Brandon, (In reply to bmcmurra from comment #6) > Hi Venky, > > Thanks for your response. I've relayed it to the customer. They've came back > with some questions/concerns: > > --- > while I was on PTO we had a couple mds failovers due to slow/blocked ops and > the requested logs where uploaded to case 03573559. > As of today this cluster has "HEALTH_WARN 10 clients failing to advance > oldest client/flush tid" > Let me know if you want additional debug logs / sosreports.... > > > So the BZ2238663 would just blocklist these clients? Yes. That's a workaround we added so that the MDS does not go into read-only mode. > Would the expectation be that the clients should be rebooted? Yes. Once a client is blocklisted, it needs to be restarted. > Client side reboots/down time is not something we can handle in this > environment. > These clients have a single monthly maintenance window of 4 hours. > Any client outage outside that window requires a P1 case and RCA. > blocklisting/reboots are the absolute last ditch effort to be avoided if > possible. Got it, but until the underlying issue is RCA'd and fixed (cause for the warnings), that's the only workaround we have. > --- > > Let me know if we need anymore data from the customer to answer these > questions. Would it be possible to turn on mds debug logs for a short period and then collect those via sosreports? We might get some insights on the underlying issue. --- Additional comment from on 2023-10-13 06:32:17 IST --- Hi Venky, Sosreport have been uploaded with debug logs. Let me know if you'd like anymore data from the customer. Thanks -Brandon --- Additional comment from on 2023-10-13 06:33:08 IST --- Hi Venky, Sosreport have been uploaded with debug logs. Let me know if you'd like anymore data from the customer. Thanks -Brandon --- Additional comment from Venky Shankar on 2023-10-18 11:22:31 IST --- (In reply to bmcmurra from comment #9) > Hi Venky, > > Sosreport have been uploaded with debug logs. Let me know if you'd like > anymore data from the customer. Thanks. Will have a look today. Keeping NI on me. --- Additional comment from Venky Shankar on 2023-10-20 16:19:28 IST --- (In reply to Venky Shankar from comment #10) > (In reply to bmcmurra from comment #9) > > Hi Venky, > > > > Sosreport have been uploaded with debug logs. Let me know if you'd like > > anymore data from the customer. > > Thanks. Will have a look today. > > Keeping NI on me. I'm on planned PTO next week. Handing this over to Milind for debugging. Milind, PTAL on prio. --- Additional comment from Milind Changire on 2023-10-20 21:06:52 IST --- What type of ceph client was in use: ceph-fuse or kernel client module ? --- Additional comment from Brett Hull on 2023-10-24 02:34:36 IST --- (In reply to Milind Changire from comment #12) > What type of ceph client was in use: ceph-fuse or kernel client module ? Hello Milind, My name is Brett, I work with Brandon. Brandon is out today. I do not see any indication of ceph-fuse being in use. I looked at the sosreport for a Process of that description and there were none. And I also looked to see if there were any systemd services, again there were not any. Is there better method for me to use? Best regards, Brett Hull --- Additional comment from on 2023-10-28 04:56:00 IST --- Hi Milland, I've relayed your question to the customer and here's what they said: ---- "All clients are kernel module mounts." --- Let me know if you have anymore questions. Thanks. -Brandon --- Additional comment from Xiubo Li on 2023-10-31 06:45:15 IST --- Hi Brandon What's the kernel version for the client nodes ? - Xiubo --- Additional comment from Xiubo Li on 2023-10-31 09:24:53 IST --- Currently I just figured out one possible case could cause this issue, more detail please see ceph tracker https://tracker.ceph.com/issues/63364, and I have raised one ceph PR to fix it. If this makes sense I need to fix it in kclient later. --- Additional comment from on 2023-11-01 05:13:54 IST --- Thanks for the update @Xiubo. Is there any way to verify we are indeed hitting https://tracker.ceph.com/issues/63364? I dug into https://tracker.ceph.com/issues/61947 a bit, but the posted logs look fairly generic, and I wasn't sure if they were the relevant Diagnostic Steps or not. Thanks -Brandon --- Additional comment from Xiubo Li on 2023-11-01 05:57:34 IST --- (In reply to bmcmurra from comment #17) > Thanks for the update @Xiubo. Is there any way to verify we are indeed > hitting https://tracker.ceph.com/issues/63364? I dug into > https://tracker.ceph.com/issues/61947 a bit, but the posted logs look fairly > generic, and I wasn't sure if they were the relevant Diagnostic Steps or not. > Hi Brandon Currently we have figured out two cases: 1, https://tracker.ceph.com/issues/63364, when the clients become silent just after issuing and finished a lot of client requests in burst. 2, when MDS have more than one blocked ops, which are waiting on some resources and then skipped, in this case there should have slow request warnings. 3, the clients are using old kernels, which < kernel-4.14.0-1.el8 also could cause the oldest client tid not being advanced issue. The kernel lower than kernel-4.14.0-1.el8 will miss commit e8a7b8b12b13("ceph: exclude setfilelock requests when calculating oldest tid") For this BZ I checked the sosreports and didn't find any slow request and all the logs looks good. So it shouldn't be case 2. For case 3, this is why I was asking the kernel version in https://bugzilla.redhat.com/show_bug.cgi?id=2239951#c15. For case 1, currently there is no any good way to verify this. Thanks - Xiubo --- Additional comment from on 2023-11-02 01:17:42 IST --- Hi Xiubu, Thank you for your clear response. It's very helpful. Judging from the sosereport the customer provided, it appears the client in question is running kernel version: 3.10.0-1160.92.1.el7.x86_64 Let me know if any further client data would be useful, and what next steps might be (upgrading the client kernel version maybe? I'm not sure if the customer would open to this or not, but I can ask). The sosereport for the client in question is in supportshell at: 0030-soscleaner-9724157162355248.client2.tar.gz Thanks - Brandon --- Additional comment from Xiubo Li on 2023-11-02 06:01:42 IST --- (In reply to bmcmurra from comment #19) > Hi Xiubu, > > Thank you for your clear response. It's very helpful. > > Judging from the sosereport the customer provided, it appears the client in > question is running kernel version: 3.10.0-1160.92.1.el7.x86_64 > Okay, cu is using rhel-7. I check the rhel-7 and we have included it since 'kernel-3.10.0-1000.el7'. > Let me know if any further client data would be useful, and what next steps > might be (upgrading the client kernel version maybe? I'm not sure if the > customer would open to this or not, but I can ask). > For the fixes of case 1 in https://bugzilla.redhat.com/show_bug.cgi?id=2239951#c18, I haven't generate the kclient fixing patches yet. So currently upgrading client kernel couldn't resolve anything. For a workaround, please try to do something in the client side, such as make the APPs to do some work, or manually create/delete some tmp or test files. If the above still doesn't work, please try to evict the corresponding client. Thanks - Xiubo --- Additional comment from Venky Shankar on 2023-12-15 07:54:38 IST --- Xiubo - please post an (MDS side) MR. --- Additional comment from Xiubo Li on 2023-12-15 08:25:50 IST --- (In reply to Venky Shankar from comment #21) > Xiubo - please post an (MDS side) MR. Venky, The MDS side patches will depend on another PR, which the corresponding PR https://tracker.ceph.com/issues/62952 is still under reviewing. Should we wait or just backport them all ? Thanks - Xiubo --- Additional comment from Venky Shankar on 2023-12-15 09:32:56 IST --- (In reply to Xiubo Li from comment #22) > (In reply to Venky Shankar from comment #21) > > Xiubo - please post an (MDS side) MR. > > Venky, > > The MDS side patches will depend on another PR, which the corresponding PR > https://tracker.ceph.com/issues/62952 is still under reviewing. > > Should we wait or just backport them all ? In that case, let's wait till the (Reef) backport is tested and merged. Thx Xiubo! --- Additional comment from Xiubo Li on 2023-12-15 09:37:16 IST --- (In reply to Venky Shankar from comment #23) > (In reply to Xiubo Li from comment #22) > > (In reply to Venky Shankar from comment #21) > > > Xiubo - please post an (MDS side) MR. > > > > Venky, > > > > The MDS side patches will depend on another PR, which the corresponding PR > > https://tracker.ceph.com/issues/62952 is still under reviewing. > > > > Should we wait or just backport them all ? > > In that case, let's wait till the (Reef) backport is tested and merged. Thx > Xiubo! Sure, Thanks!
See KCS #6971376, (https://access.redhat.com/solutions/6971376). /MC
Waiting on bkunal to get feedback from customer.
(In reply to Bipin Kunal from comment #33) > Venky, > I have lost my permission to access internal comment, so I am unable to > see comments between 22-32 so no clue what is happening. I am working on > getting the access back but in the meanwhile If there a hotfix request from > support please suggest them to follow the hotfix process and send a formal > hotfix request. There is not hotfix request, just that we need to check with the customer if they are fine to upgrade to RHCS6 (where the fix is available) instead of pulling in the stripped down fix in z7. (I'm unable to mark comments as private, IDK why - can someone please do the needful)
(In reply to Bipin Kunal from comment #35) > (In reply to Venky Shankar from comment #34) > > (In reply to Bipin Kunal from comment #33) > > > Venky, > > > I have lost my permission to access internal comment, so I am unable to > > > see comments between 22-32 so no clue what is happening. I am working on > > > getting the access back but in the meanwhile If there a hotfix request from > > > support please suggest them to follow the hotfix process and send a formal > > > hotfix request. > > > > There is not hotfix request, just that we need to check with the customer if > > they are fine to upgrade to RHCS6 (where the fix is available) instead of > > pulling in the stripped down fix in z7. > > I had discussion with the TAM, and got to know that customer is waiting for > it. > They might be on 5.3z till mid of next year. So it is better to ship the fix > in z7. ACK. cc Xiubo who did the hotfix backport (stripped down version) for the customer. Xiubo, we need to get in the same set of patches for z7.
(In reply to Venky Shankar from comment #36) > (In reply to Bipin Kunal from comment #35) > > (In reply to Venky Shankar from comment #34) > > > (In reply to Bipin Kunal from comment #33) > > > > Venky, > > > > I have lost my permission to access internal comment, so I am unable to > > > > see comments between 22-32 so no clue what is happening. I am working on > > > > getting the access back but in the meanwhile If there a hotfix request from > > > > support please suggest them to follow the hotfix process and send a formal > > > > hotfix request. > > > > > > There is not hotfix request, just that we need to check with the customer if > > > they are fine to upgrade to RHCS6 (where the fix is available) instead of > > > pulling in the stripped down fix in z7. > > > > I had discussion with the TAM, and got to know that customer is waiting for > > it. > > They might be on 5.3z till mid of next year. So it is better to ship the fix > > in z7. > > ACK. cc Xiubo who did the hotfix backport (stripped down version) for the > customer. > > Xiubo, we need to get in the same set of patches for z7. Sure, will do it today.
(In reply to Xiubo Li from comment #37) > (In reply to Venky Shankar from comment #36) > > (In reply to Bipin Kunal from comment #35) > > > (In reply to Venky Shankar from comment #34) > > > > (In reply to Bipin Kunal from comment #33) > > > > > Venky, > > > > > I have lost my permission to access internal comment, so I am unable to > > > > > see comments between 22-32 so no clue what is happening. I am working on > > > > > getting the access back but in the meanwhile If there a hotfix request from > > > > > support please suggest them to follow the hotfix process and send a formal > > > > > hotfix request. > > > > > > > > There is not hotfix request, just that we need to check with the customer if > > > > they are fine to upgrade to RHCS6 (where the fix is available) instead of > > > > pulling in the stripped down fix in z7. > > > > > > I had discussion with the TAM, and got to know that customer is waiting for > > > it. > > > They might be on 5.3z till mid of next year. So it is better to ship the fix > > > in z7. > > > > ACK. cc Xiubo who did the hotfix backport (stripped down version) for the > > customer. > > > > Xiubo, we need to get in the same set of patches for z7. > > Sure, will do it today. Done, please see https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/642.
(In reply to Xiubo Li from comment #38) > (In reply to Xiubo Li from comment #37) > > (In reply to Venky Shankar from comment #36) > > > (In reply to Bipin Kunal from comment #35) > > > > (In reply to Venky Shankar from comment #34) > > > > > (In reply to Bipin Kunal from comment #33) > > > > > > Venky, > > > > > > I have lost my permission to access internal comment, so I am unable to > > > > > > see comments between 22-32 so no clue what is happening. I am working on > > > > > > getting the access back but in the meanwhile If there a hotfix request from > > > > > > support please suggest them to follow the hotfix process and send a formal > > > > > > hotfix request. > > > > > > > > > > There is not hotfix request, just that we need to check with the customer if > > > > > they are fine to upgrade to RHCS6 (where the fix is available) instead of > > > > > pulling in the stripped down fix in z7. > > > > > > > > I had discussion with the TAM, and got to know that customer is waiting for > > > > it. > > > > They might be on 5.3z till mid of next year. So it is better to ship the fix > > > > in z7. > > > > > > ACK. cc Xiubo who did the hotfix backport (stripped down version) for the > > > customer. > > > > > > Xiubo, we need to get in the same set of patches for z7. > > > > Sure, will do it today. > > Done, please see > https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/642. Can we merge the above into ceph-5.3-rhel-patches? Or is some review needed? Thomas
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4118