Bug 1822125
Summary: | Allow XML IDs of arbitrary length in Pacemaker CIB | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | michael.oldham | ||||||||||
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 8.3 | CC: | ccaulfie, cfeist, cluster-maint, jfriesse, kgaillot, kostos, msmazova, nwahl, sbradley, tojeline | ||||||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||||||
Target Release: | 8.8 | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | pacemaker-2.1.5-1.el8 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Cause: Pacemaker read XML IDs into a fixed-length buffer.
Consequence: XML IDs longer than the buffer size could crash Pacemaker.
Fix: Pacemaker reads XML IDs into dynamically sized memory.
Result: XML IDs of arbitrary length can be used in the Pacemaker CIB.
|
Story Points: | --- | ||||||||||
Clone Of: | |||||||||||||
: | 1824206 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2023-05-16 08:35:22 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1824206 | ||||||||||||
Attachments: |
|
Description
michael.oldham
2020-04-08 10:22:18 UTC
Created attachment 1677193 [details]
stack back trace
Created attachment 1677195 [details]
corosync.log excerpt
@michael: Thank you for the report. Would you mind to share also hot-fix? Hi Jan, Yes sorry the hot-fix was simply to delete the line at #571 cpg_inst->finalize = 1; I then supplied a new version of libcpg.so (corosync.2.4.3) Do you need the actual library file I sent? @michael: Thank you for the hot-fix description. No, actual library file is not needed :) I just wanted to make sure that I understand what was happening. I've spent some time reading and trying to understand backtrace, but sadly it haven't helped much. 1. I've tried simple example where cpg_finalize is called in the dispatch callback - just for sure, but I was pretty positive we solved that problem long time ago. That doesn't reproduce the bug and behave correctly because of refcounting. 2. I'm sure that deleting cpg_inst->finalize (no matter how useless that line is) is not really solution for the problem. If you take a look to backtrace, there are few weird things: - cpg_inst = 0x273d6469405b6665 - Points to weird address. Other pointers are much shorter. - cpg_inst_copy - it contains corrupted data. This one shouldn't be really corrupted. Few examples, finalize = 1769236852 - should be zero or one, it's never set to such huge value, model = (CPG_MODEL_V1 | unknown: 23846) - should be just CPG_MODEL_V1, ... But the most weird one is cpg_totem_confchg_fn = 0x273d6469405b6665, which exactly equals to cpg_inst. AFAIK cpg_totem_confchg_fn is/was not used by pcmk in rhel 7.7 so it should be NULL. 3. Checked twice refcounting. It works as expected and really every function in libcpg gets and puts handle back to handle_db as expected, so refcounting works. Also get of freed handle correctly doesn't work. 4. Studying pcmk code haven't helped much ether. It is pretty normal for pcmk to call cpg_finalize, but it always logs the problem. There was not such log in the logfile. All and all, I believe problem is ether somewhere in the pcmk for rhel 7.7 where memory is overwritten (more specifically stack memory) or it may be some hw problem (corrupted memory, ...) Ken, do you have any opinion? Does it ring the bell maybe? Created attachment 1677488 [details]
Corosync log post hot fix
Great .. I had typed a lot in here and lost it when added my attachment ..sigh! I agree with your assessment, I did not uncover the cause of the memory corruption. I just removed some of the impact. However given the comment /* * Make copy of callbacks, message data, unlock instance, and call callback * A risk of this dispatch method is that the callback routines may * operate at the same time that cpgFinalize has been called. */ and the lack of any thread locking logic I realise I do not yet understand the code (other than the line I removed being redundant). I have added the corosync.log post the hot fix, look for 'Bad Handle' and you can see that things are still far from happy but they do appear to be functional as the cluster creates and starts. Maybe this will help reproduce the problem. Currently the customer cannot proceed as we cannot support using the hot-fix. I did look at trying to investigate further but there is little trace in this code and I suspect we would need thread enabled trace anyway. Fyi we think this is also occurring in another of their clusters. We could get the old cluster to create if they drastically reduced the number of resources. Created attachment 1677532 [details]
Debug core
@michael: Yup, bugzilla is not super intuitive. That comment is there since the very beginning and I'm not entirely convinced that is really correct. Actually I'm pretty sure the whole idea is incorrect (thanks to refcounting) and copying of callbacks is not really needed. I've filled upstream issue to take a second look if that copy is really needed and potentionaly remove it (https://github.com/corosync/corosync/issues/550), because it is no longer needed. No matter what, pcmk doesn't use multiple threads for cpg calls, so it wouldn't apply anyway. "Bad handle" error is result of `cpg_inst_copy.finalize` being set to "random" non-zero value. So it is actually not the problem I was having in mind. The log message like "Could not get local node id from the CPG API", "Could not obtain the CPG API connection", ... would be interesting and would mean `cpg_finalize` was called in the dispatch callback. But I don't think it was the case. I really believe main problem is stack corruption in pcmk code resulting of broken cpg api data structures. Maybe you may try to generate a huge cib update to trigger the bug easily (at least messages logged right before "Bad Handle" log messages seems to be result of quite a big cib change)? The first unusual thing I noticed is an insanely long XML id in the CIB change just before the core dump: Jan 30 07:38:39 [5568] sinlxsveshi01 cib: info: cib_perform_op: ++ /cib/configuration/constraints: <rsc_colocation score="INFINITY" id="pcs_rsc_colocation_set_sv.svtest.aa.partition.10_sv.svtest.aa.partition.11_sv.svtest.aa.partition.12_sv.svtest.aa.partition.13_sv.svtest.aa.partition.14_sv.svtest.aa.partition.15_sv.svtest.aa.partition.16_sv.svtest.aa.partition.17_sv.svtest.aa.partition.18_sv.svtest.aa.partition.19_sv.svtest.aa.partition.20_sv.svtest.aa.partition.21_sv.svtest.aa.partition.22_sv The log message cuts off after a certain number of characters, so there's no telling how long it actually is. I did some tests, and I was able to reproduce the core dump with an id 5000 characters long (1024 was OK, haven't checked what the exact threshhold is). Hopefully I can get a fix in 7.9. The obvious workaround is to keep XML IDs under 1KB. Thanks Guys for the assessment. I had thought it linked to the number of resources which is why I asked the customer to drop the number (whereupon it worked). However as you can see from this excerpt from our trace file we do not directly create this XML id. We do perform a 'pcs constraint order' and a 'pcs constraint colocation' prior to the 'pcs resource update' which triggers the core dump. The number of resource listed (about 90) are required for a 7 node cluster supporting a number of data partitions per node (about 7). The only things I could think to do would be to reduce the length of the names or attempt to break these commands up which either may not work or not suport the required semantics. It would seem that the creation of the id is performed by pacemaker from the list of associated resources. The length of the Id would be 2730 byte i think. Is the reference to 7.9 for Linux version? I concerned that the Severity has been altered to medium when I still can't see a viable workaround yet (although I will be looking for it now!) Regards Michael P.S. Jan I am relieved that the comment is incorrect .. I couldn't find any sign of thread locking! -----@@@@----- pcs constraint order set sv.svtest.aa.sv.monitor-clone action=start set sv.svtest.aa.partition.10 sv.svtest.aa.partition.11 sv.svtest.aa.partition.12 sv.svtest.aa.partition.13 sv.svtest.aa.partition.14 sv.svtest.aa.partition.15 sv.svtest.aa.partition.16 sv.svtest.aa.partition.17 sv.svtest.aa.partition.18 sv.svtest.aa.partition.19 sv.svtest.aa.partition.20 sv.svtest.aa.partition.21 sv.svtest.aa.partition.22 sv.svtest.aa.partition.23 sv.svtest.aa.partition.24 sv.svtest.aa.partition.25 sv.svtest.aa.partition.26 sv.svtest.aa.partition.27 sv.svtest.aa.partition.28 sv.svtest.aa.partition.29 sv.svtest.aa.partition.30 sv.svtest.aa.partition.31 sv.svtest.aa.partition.32 sv.svtest.aa.partition.33 sv.svtest.aa.partition.34 sv.svtest.aa.partition.35 sv.svtest.aa.partition.36 sv.svtest.aa.partition.37 sv.svtest.aa.partition.38 sv.svtest.aa.partition.39 sv.svtest.aa.partition.40 sv.svtest.aa.partition.41 sv.svtest.aa.partition.42 sv.svtest.aa.partition.43 sv.svtest.aa.partition.44 sv.svtest.aa.partition.45 sv.svtest.aa.partition.46 sv.svtest.aa.partition.47 sv.svtest.aa.partition.48 sv.svtest.aa.partition.49 sv.svtest.aa.partition.50 sv.svtest.aa.partition.51 sv.svtest.aa.partition.52 sv.svtest.aa.partition.53 sv.svtest.aa.partition.54 sv.svtest.aa.partition.56 sv.svtest.aa.partition.backup.10 sv.svtest.aa.partition.backup.11 sv.svtest.aa.partition.backup.12 sv.svtest.aa.partition.backup.13 sv.svtest.aa.partition.backup.14 sv.svtest.aa.partition.backup.15 sv.svtest.aa.partition.backup.16 sv.svtest.aa.partition.backup.17 sv.svtest.aa.partition.backup.18 sv.svtest.aa.partition.backup.19 sv.svtest.aa.partition.backup.20 sv.svtest.aa.partition.backup.21 sv.svtest.aa.partition.backup.22 sv.svtest.aa.partition.backup.23 sv.svtest.aa.partition.backup.24 sv.svtest.aa.partition.backup.25 sv.svtest.aa.partition.backup.26 sv.svtest.aa.partition.backup.27 sv.svtest.aa.partition.backup.28 sv.svtest.aa.partition.backup.29 sv.svtest.aa.partition.backup.30 sv.svtest.aa.partition.backup.31 sv.svtest.aa.partition.backup.32 sv.svtest.aa.partition.backup.33 sv.svtest.aa.partition.backup.34 sv.svtest.aa.partition.backup.35 sv.svtest.aa.partition.backup.36 sv.svtest.aa.partition.backup.37 sv.svtest.aa.partition.backup.38 sv.svtest.aa.partition.backup.39 sv.svtest.aa.partition.backup.40 sv.svtest.aa.partition.backup.41 sv.svtest.aa.partition.backup.42 sv.svtest.aa.partition.backup.43 sv.svtest.aa.partition.backup.44 sv.svtest.aa.partition.backup.45 sv.svtest.aa.partition.backup.46 sv.svtest.aa.partition.backup.47 sv.svtest.aa.partition.backup.48 sv.svtest.aa.partition.backup.49 sv.svtest.aa.partition.backup.50 sv.svtest.aa.partition.backup.51 sv.svtest.aa.partition.backup.52 sv.svtest.aa.partition.backup.53 sv.svtest.aa.partition.backup.54 sv.svtest.aa.partition.backup.56 sv.svtest.aa.std sequential=false action=start require-all=true setoptions kind=Optional symmetrical=true => Return=0 2020-01-30 07:38:39.205 [18804:sinlxsveshi01] sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4739) <Rc=0 Errno=0 2020-01-30 07:38:39.205 [18804:sinlxsveshi01] sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4685) > noexec=0 always=0 <pcs constraint colocation set sv.svtest.aa.partition.10 sv.svtest.aa.partition.11 sv.svtest.aa.partition.12 sv.svtest.aa.partition.13 sv.svtest.aa.partition.14 sv.svtest.aa.partition.15 sv.svtest.aa.partition.16 sv.svtest.aa.partition.17 sv.svtest.aa.partition.18 sv.svtest.aa.partition.19 sv.svtest.aa.partition.20 sv.svtest.aa.partition.21 sv.svtest.aa.partition.22 sv.svtest.aa.partition.23 sv.svtest.aa.partition.24 sv.svtest.aa.partition.25 sv.svtest.aa.partition.26 sv.svtest.aa.partition.27 sv.svtest.aa.partition.28 sv.svtest.aa.partition.29 sv.svtest.aa.partition.30 sv.svtest.aa.partition.31 sv.svtest.aa.partition.32 sv.svtest.aa.partition.33 sv.svtest.aa.partition.34 sv.svtest.aa.partition.35 sv.svtest.aa.partition.36 sv.svtest.aa.partition.37 sv.svtest.aa.partition.38 sv.svtest.aa.partition.39 sv.svtest.aa.partition.40 sv.svtest.aa.partition.41 sv.svtest.aa.partition.42 sv.svtest.aa.partition.43 sv.svtest.aa.partition.44 sv.svtest.aa.partition.45 sv.svtest.aa.partition.46 sv.svtest.aa.partition.47 sv.svtest.aa.partition.48 sv.svtest.aa.partition.49 sv.svtest.aa.partition.50 sv.svtest.aa.partition.51 sv.svtest.aa.partition.52 sv.svtest.aa.partition.53 sv.svtest.aa.partition.54 sv.svtest.aa.partition.56 sv.svtest.aa.partition.backup.10 sv.svtest.aa.partition.backup.11 sv.svtest.aa.partition.backup.12 sv.svtest.aa.partition.backup.13 sv.svtest.aa.partition.backup.14 sv.svtest.aa.partition.backup.15 sv.svtest.aa.partition.backup.16 sv.svtest.aa.partition.backup.17 sv.svtest.aa.partition.backup.18 sv.svtest.aa.partition.backup.19 sv.svtest.aa.partition.backup.20 sv.svtest.aa.partition.backup.21 sv.svtest.aa.partition.backup.22 sv.svtest.aa.partition.backup.23 sv.svtest.aa.partition.backup.24 sv.svtest.aa.partition.backup.25 sv.svtest.aa.partition.backup.26 sv.svtest.aa.partition.backup.27 sv.svtest.aa.partition.backup.28 sv.svtest.aa.partition.backup.29 sv.svtest.aa.partition.backup.30 sv.svtest.aa.partition.backup.31 sv.svtest.aa.partition.backup.32 sv.svtest.aa.partition.backup.33 sv.svtest.aa.partition.backup.34 sv.svtest.aa.partition.backup.35 sv.svtest.aa.partition.backup.36 sv.svtest.aa.partition.backup.37 sv.svtest.aa.partition.backup.38 sv.svtest.aa.partition.backup.39 sv.svtest.aa.partition.backup.40 sv.svtest.aa.partition.backup.41 sv.svtest.aa.partition.backup.42 sv.svtest.aa.partition.backup.43 sv.svtest.aa.partition.backup.44 sv.svtest.aa.partition.backup.45 sv.svtest.aa.partition.backup.46 sv.svtest.aa.partition.backup.47 sv.svtest.aa.partition.backup.48 sv.svtest.aa.partition.backup.49 sv.svtest.aa.partition.backup.50 sv.svtest.aa.partition.backup.51 sv.svtest.aa.partition.backup.52 sv.svtest.aa.partition.backup.53 sv.svtest.aa.partition.backup.54 sv.svtest.aa.partition.backup.56 sv.svtest.aa.std sequential=false set sv.svtest.aa.sv.monitor-clone setoptions score=INFINITY> Errno=0 2020-01-30 07:38:39.205 [18804:sinlxsveshi01] CM_Pacemaker sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4683)> 2020-01-30 07:38:39.764 [18804:sinlxsveshi01] CM_Pacemaker sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4683)< pcs constraint colocation set sv.svtest.aa.partition.10 sv.svtest.aa.partition.11 sv.svtest.aa.partition.12 sv.svtest.aa.partition.13 sv.svtest.aa.partition.14 sv.svtest.aa.partition.15 sv.svtest.aa.partition.16 sv.svtest.aa.partition.17 sv.svtest.aa.partition.18 sv.svtest.aa.partition.19 sv.svtest.aa.partition.20 sv.svtest.aa.partition.21 sv.svtest.aa.partition.22 sv.svtest.aa.partition.23 sv.svtest.aa.partition.24 sv.svtest.aa.partition.25 sv.svtest.aa.partition.26 sv.svtest.aa.partition.27 sv.svtest.aa.partition.28 sv.svtest.aa.partition.29 sv.svtest.aa.partition.30 sv.svtest.aa.partition.31 sv.svtest.aa.partition.32 sv.svtest.aa.partition.33 sv.svtest.aa.partition.34 sv.svtest.aa.partition.35 sv.svtest.aa.partition.36 sv.svtest.aa.partition.37 sv.svtest.aa.partition.38 sv.svtest.aa.partition.39 sv.svtest.aa.partition.40 sv.svtest.aa.partition.41 sv.svtest.aa.partition.42 sv.svtest.aa.partition.43 sv.svtest.aa.partition.44 sv.svtest.aa.partition.45 sv.svtest.aa.partition.46 sv.svtest.aa.partition.47 sv.svtest.aa.partition.48 sv.svtest.aa.partition.49 sv.svtest.aa.partition.50 sv.svtest.aa.partition.51 sv.svtest.aa.partition.52 sv.svtest.aa.partition.53 sv.svtest.aa.partition.54 sv.svtest.aa.partition.56 sv.svtest.aa.partition.backup.10 sv.svtest.aa.partition.backup.11 sv.svtest.aa.partition.backup.12 sv.svtest.aa.partition.backup.13 sv.svtest.aa.partition.backup.14 sv.svtest.aa.partition.backup.15 sv.svtest.aa.partition.backup.16 sv.svtest.aa.partition.backup.17 sv.svtest.aa.partition.backup.18 sv.svtest.aa.partition.backup.19 sv.svtest.aa.partition.backup.20 sv.svtest.aa.partition.backup.21 sv.svtest.aa.partition.backup.22 sv.svtest.aa.partition.backup.23 sv.svtest.aa.partition.backup.24 sv.svtest.aa.partition.backup.25 sv.svtest.aa.partition.backup.26 sv.svtest.aa.partition.backup.27 sv.svtest.aa.partition.backup.28 sv.svtest.aa.partition.backup.29 sv.svtest.aa.partition.backup.30 sv.svtest.aa.partition.backup.31 sv.svtest.aa.partition.backup.32 sv.svtest.aa.partition.backup.33 sv.svtest.aa.partition.backup.34 sv.svtest.aa.partition.backup.35 sv.svtest.aa.partition.backup.36 sv.svtest.aa.partition.backup.37 sv.svtest.aa.partition.backup.38 sv.svtest.aa.partition.backup.39 sv.svtest.aa.partition.backup.40 sv.svtest.aa.partition.backup.41 sv.svtest.aa.partition.backup.42 sv.svtest.aa.partition.backup.43 sv.svtest.aa.partition.backup.44 sv.svtest.aa.partition.backup.45 sv.svtest.aa.partition.backup.46 sv.svtest.aa.partition.backup.47 sv.svtest.aa.partition.backup.48 sv.svtest.aa.partition.backup.49 sv.svtest.aa.partition.backup.50 sv.svtest.aa.partition.backup.51 sv.svtest.aa.partition.backup.52 sv.svtest.aa.partition.backup.53 sv.svtest.aa.partition.backup.54 sv.svtest.aa.partition.backup.56 sv.svtest.aa.std sequential=false set sv.svtest.aa.sv.monitor-clone setoptions score=INFINITY => Return=0 2020-01-30 07:38:39.765 [18804:sinlxsveshi01] sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4739) <Rc=0 Errno=0 2020-01-30 07:38:39.765 [18804:sinlxsveshi01] sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4685) > noexec=0 always=0 <pcs resource update sv.svtest.aa.sv.monitor sv_ha_cluster_managed=0 --wait=30 --force> Errno=0 2020-01-30 07:38:39.765 [18804:sinlxsveshi01] CM_Pacemaker sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4683)> 2020-01-30 07:38:45.667 [18804:sinlxsveshi01] CM_Pacemaker sv_ha_ctl(6850).Main(6724).zClusterCreate(3190).zClusterCreateActiveActive(3256).zCreateResource(3376).CM_ResourceCreate(353).CM_ResourceCreate(1995).zCM_Exec(4683)< pcs resource update sv.svtest.aa.sv.monitor sv_ha_cluster_managed=0 --wait=30 --force => Return=1 Warning: Agent 'ocf:csg:sv' is not installed or does not provide valid metadata: Metadata query for ocf:csg:sv failed: Input/output error -----@@@@@@@------- @michael: Yup, corosync APIs are really not thread safe (at least they are not declared as being so), eventho actually there is thread locking - the refcounting acts also as a lock. And messages sent/received by QB IPC should be atomic, so in theory it should work fine... but I wouldn't try it anyway. Hi Michael, I've cloned this BZ for the pcs component, which is what creates the long ID. The number of resources is fine -- the cluster should be able to handle 100s or even 1000s of resources. By 7.9, I mean the fix is targeted to be released in RHEL 7.9 (7.8 was just released). As a workaround, you could use pcs to create the configuration in a separate file, then edit that file to change the IDs before uploading it to the cluster. With this method, you first save the current configuration to a file: pcs cluster cib <filename> Then, you can run all the configuration commands you normally would, but putting "-f <filename>" at the start (like "pcs -f <filename> constraint order ..."). At this point, you could manually edit the file to use short IDs. Finally, upload the configuration to the cluster with: pcs cluster cib-push <filename> --config Alternatively, you could create the constraint XML yourself instead of with pcs, and use "pcs cluster edit" or "cibadmin" to add it to the cluster. You can look at your existing CIB XML with "pcs cluster cib" to see what the XML syntax for constraints is (not very user-friendly). With "pcs cluster edit", you see the raw XML of the CIB, and can manually edit it. If you want to automate it rather than edit it manually, you can create the constraint XML in a file (starting with the constraints element and putting your new constraint as its child), then run "cibadmin --create --scope constraints -x <filename>". Due to a fix in pcs to not create long IDs, and a workaround of shortening the IDs being available, handling such IDs in pacemaker is now a lower priority and will be considered for RHEL 8 only Fixed upstream by commit a120f342 verified as SanityOnly in pacemaker-2.1.5-4.el8 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pacemaker bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:2818 |