Bug 883429
Summary: | Incorrect synchronization in mmap cache | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Florian Weimer <fweimer> | ||||||||
Component: | sssd | Assignee: | Jakub Hrozek <jhrozek> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 17 | CC: | jhrozek, sbose, sgallagh, ssorce | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2013-01-12 00:10:15 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 882335 | ||||||||||
Attachments: |
|
Description
Florian Weimer
2012-12-04 15:18:52 UTC
(In reply to comment #0) > sss_nss_check_header() in src/sss_client/nss_mc_common.c lacks a memory > barrier. I've thought about this some more, and I think you need at least acquire barriers after loading h.b1 and before loading h.b2. Similar for sss_nss_mc_get_record(). (Plus the change to deal with counter overflow.) I don't feel confident to confirm the absence of any remaining concurrency bugs, though. Upstream ticket: https://fedorahosted.org/sssd/ticket/1694 sssd-1.9.3-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/sssd-1.9.3-1.fc18 Package sssd-1.9.3-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.9.3-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-19960/sssd-1.9.3-1.fc18 then log in and leave karma (feedback). Created attachment 660930 [details]
mmap-cache.patch
The upstream patch fb0de650e7454e1dfa76136e325e62a00748238b is still incorrect, I believe. Barriers are needed after loading b1 and before loading b2, so that the internal ordering of the memcpy does not matter. I'm attaching a patch on top of current master which hopefully clarifies my concerns.
I do not see why the new patch would be necessary. The way b1 and b2 are *written* guarantees that if you read the memory and you get both with the same value then the memory chunk is consistent. Can you describe a case where the current code would fail ? Are you worring that the memcpy may end up copying the stuff in the middle before either b1 or b2 are loaded ? Is that possible ? (In reply to comment #6) > I do not see why the new patch would be necessary. > The way b1 and b2 are *written* guarantees that if you read the memory and > you get both with the same value then the memory chunk is consistent. Reads can be reordered as well, not just writes. And in the sss_nss_mc_get_record, you really have to check against the old b1 value, otherwise you don't detect the case where the slot has been altered during the memcpy. > Can you describe a case where the current code would fail ? > Are you worring that the memcpy may end up copying the stuff in the middle > before either b1 or b2 are loaded ? Is that possible ? It's definitely possible. Even if the machine code contains the read instructions in the expected order, the processor may execute them in a different order or issue speculative reads which are used later. Perhaps you could put the reader and writer code into a library, then we could write a test harness and run it on several architectures, to check that the results are as expected. But I'm not sure how effective such testing is in uncovering concurrency bugs. (In reply to comment #7) > (In reply to comment #6) > > I do not see why the new patch would be necessary. > > The way b1 and b2 are *written* guarantees that if you read the memory and > > you get both with the same value then the memory chunk is consistent. > > Reads can be reordered as well, not just writes. > > And in the sss_nss_mc_get_record, you really have to check against the old > b1 value, otherwise you don't detect the case where the slot has been > altered during the memcpy. > > > Can you describe a case where the current code would fail ? > > Are you worring that the memcpy may end up copying the stuff in the middle > > before either b1 or b2 are loaded ? Is that possible ? > > It's definitely possible. Even if the machine code contains the read > instructions in the expected order, the processor may execute them in a > different order or issue speculative reads which are used later. You are right, I didn't properly consider the effect on CPU architectures with SMP and multiple layers of caching. This may hit particularly hard on NUMA architectures where CPUs try to optmize as much as possible cross-node transfers. I'll reopen the upstream ticket. > Perhaps you could put the reader and writer code into a library, then we > could write a test harness and run it on several architectures, to check > that the results are as expected. But I'm not sure how effective such > testing is in uncovering concurrency bugs. I used to have test code to verify barriers worked properly. and you can catch concurrency issues if you have multiple threads running in tight loops over and over. I will try to find out if I still have them around. Created attachment 661730 [details]
USe a macro to do memcpy with barriers
I came up with a slightly different patch that uses a macro to memcpy with barriers.
Can you check this is ok, it is not 100% identical to your code.
Also I found a glaring error around line 200, b2 wsa ssigned the value of rec-b1 instead of rec-b2 ...
(In reply to comment #10) > Created attachment 661730 [details] > USe a macro to do memcpy with barriers > > I came up with a slightly different patch that uses a macro to memcpy with > barriers. Can you check this is ok, it is not 100% identical to your code. + if (copy_ok && b1 == copy_rec->b2) { The read of b2 does not necessarily come after all the other reads for other members of the copy_rec structure, so an unchanged value does not necessarily mean that copy is consistent (I think). Using rec->b2 should be okay, but the macro really obscures what's going on here. Created attachment 662504 [details]
Macro to deal with memcpy barriers
I think the patch is correct wrt using copy_rec->b2 however Florian pointed out on IRC that I mistakenly swapped src/dest in the macro when testing b2.
With that corrected I am confident this code properly handles the necessary synchronizations now.
sssd-1.9.3-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report. |