| Summary: | kernel: Backwards-incompatible change in set_tid_address breaks nscd file mapping liveness check | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Deepu K S <dkochuka> | ||||
| Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||
| kernel sub component: | Process management | QA Contact: | Chunyu Hu <chuhu> | ||||
| Status: | CLOSED WORKSFORME | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | ashankar, codonell, cww, dj, dkochuka, fweimer, glibc-bugzilla, mnewsome, pfrankli, sjohnsto, skozina | ||||
| Version: | 7.2 | Keywords: | Patch | ||||
| Target Milestone: | rc | Flags: | onestero:
needinfo?
(dkochuka) |
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-05-15 13:54:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1298243, 1420851, 1469551, 1549423 | ||||||
| Attachments: |
|
||||||
|
Description
Deepu K S
2016-09-01 17:04:55 UTC
Created attachment 1196899 [details]
reproducer program
This issue is not being considered for release with RHEL 7.4 but we will continue to look at this problem in more detail and keep you updated as we make progress. This can include a rhel-7.4.z release with this fix, or an immediate hotifx, depending on the user requirements, timeline and impact. RCA done, looks like we upstream kernel patch 735f2770a770156100f534646158cb58cb8b2939 (see https://patchwork.kernel.org/patch/9254247/) Reassigning to kernel. (In reply to DJ Delorie from comment #7) > RCA done, looks like we upstream kernel patch > 735f2770a770156100f534646158cb58cb8b2939 (see > https://patchwork.kernel.org/patch/9254247/) Yes, but why nscd doesn't hit this problem on rhel-6 which has the same PF_SIGNALED check? OK, probably rhel6's version doesn't rely on CLONE_CHILD_CLEARTID, but it would be nice to verify to ensure we actually understand whats going on. (In reply to Deepu K S from comment #0) > > In RHEL5/6 after restarting the nscd with volatile cache the running test > binaries next gethostbyname() causes a NIS query to be started right away. And afaik there are no incompatible cleartid changes between rhel6 and rhel7. Do you know how did they came to conclusion that something is wrong with set_tid_address() (compared to rhel6) ? There is nothing in description about that... I agree, that simple patch might help, but only because the changelog mentions the (hopefully) same nscd/CLONE_CHILD_CLEARTID problem. I can build the rhel7 kernel with the trivial backport, can you test it? (In reply to Oleg Nesterov from comment #9) > (In reply to Deepu K S from comment #0) > > > > In RHEL5/6 after restarting the nscd with volatile cache the running test > > binaries next gethostbyname() causes a NIS query to be started right away. > > And afaik there are no incompatible cleartid changes between rhel6 and rhel7. > Do you know how did they came to conclusion that something is wrong with > set_tid_address() (compared to rhel6) ? There is nothing in description > about that... Customer didn't point out issue with set_tid_address(). They had found the difference in behaviour between RHEL 6 and 7. Under RHEL5/6 after restarting the nscd with volatile cache the running test binarie's next gethostbyname() causes a NIS querie to be started right away. Under RHEL7.1 restarting nscd final decouples the running binary from the NIS nameservice and no subsequent gethostbyname makes an NIS query leave the machine. Thus no chance of getting any update exists. I think it was found from our RCA done (from comment #7). > > I agree, that simple patch might help, but only because the changelog > mentions > the (hopefully) same nscd/CLONE_CHILD_CLEARTID problem. > > I can build the rhel7 kernel with the trivial backport, can you test it? Yes. I can test that. I shall also share with our customer who would be able to test it in actual environments. Thanks. (In reply to Deepu K S from comment #10) > (In reply to Oleg Nesterov from comment #9) > > (In reply to Deepu K S from comment #0) > > > > > > In RHEL5/6 after restarting the nscd with volatile cache the running test > > > binaries next gethostbyname() causes a NIS query to be started right away. > > > > And afaik there are no incompatible cleartid changes between rhel6 and rhel7. > > Do you know how did they came to conclusion that something is wrong with > > set_tid_address() (compared to rhel6) ? There is nothing in description > > about that... > > Customer didn't point out issue with set_tid_address(). They had found the > difference in behaviour between RHEL 6 and 7. but the subject clearly blames set_tid_address? > Under RHEL5/6 after restarting the nscd with volatile cache the running test > binarie's next gethostbyname() causes a NIS querie to be started right away. > Under RHEL7.1 restarting nscd final decouples the running binary from the > NIS nameservice and no subsequent gethostbyname makes an NIS query leave the > machine. Thus no chance of getting any update exists. Sorry, I can't understand this ;) OK, nevermind. If you can test the patch and the problem goes away this all doesn't really matter. > > I can build the rhel7 kernel with the trivial backport, can you test it? > > Yes. I can test that. > I shall also share with our customer who would be able to test it in actual > environments. Great, thanks, https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13812322 (In reply to Oleg Nesterov from comment #11) > > Yes. I can test that. > > I shall also share with our customer who would be able to test it in actual > > environments. > > Great, thanks, > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13812322 Hi Oleg, Sorry. I missed downloading the packages from above build. It seems the download link is not valid now. Could you please initiate a new brew build. Thanks. We checked with latest 7.4 kernel and nscd packages and the nscd cache works correctly now. Customer has also acknowledged the issue to be resolved for them. Thanks! (In reply to Deepu K S from comment #14) > > We checked with latest 7.4 kernel hmm. afaics the 7.4 kernel doesn't have that patch, so it was something else. Nevermind... > and nscd packages and the nscd cache works > correctly now. > Customer has also acknowledged the issue to be resolved for them. OK, can we close this bug? Or did you mean they checked the test kernel with that patch I built for you? |