Bug 1400957
Summary: | [SAMBA-CTDB]CTDB ip failover test leads to creation of huge number of core files that leads to memory & disc full status | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vivek Das <vdas> |
Component: | samba | Assignee: | Anoop C S <anoopcs> |
Status: | CLOSED NOTABUG | QA Contact: | surabhi <sbhaloth> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.2 | CC: | amukherj, anoopcs, gdeschner, madam, rcyriac, rhinduja, rhs-smb, vdas |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-12-09 15:12:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vivek Das
2016-12-02 12:13:26 UTC
Some findings: /var/log/messages repeatedly displayed the following logs+back trace: Dec 2 10:02:18 dhcp47-12 smbd[31742]: [2016/12/02 10:02:18.084385, 0] ../source3/lib/util.c:478(reinit_after_fork) Dec 2 10:02:18 dhcp47-12 smbd[31742]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR Dec 2 10:02:18 dhcp47-12 smbd[31742]: [2016/12/02 10:02:18.084532, 0] ../source3/smbd/server.c:758(smbd_accept_connection) Dec 2 10:02:18 dhcp47-12 smbd[31742]: reinit_after_fork() failed Dec 2 10:02:18 dhcp47-12 smbd[31742]: [2016/12/02 10:02:18.084680, 0] ../source3/lib/util.c:791(smb_panic_s3) Dec 2 10:02:18 dhcp47-12 smbd[31742]: PANIC (pid 31742): reinit_after_fork() failed Dec 2 10:02:18 dhcp47-12 smbd[31742]: [2016/12/02 10:02:18.085999, 0] ../source3/lib/util.c:902(log_stack_trace) Dec 2 10:02:18 dhcp47-12 smbd[31742]: BACKTRACE: 11 stack frames: Dec 2 10:02:18 dhcp47-12 smbd[31742]: #0 /lib64/libsmbconf.so.0(log_stack_trace+0x1a) [0x7fd128de8e5a] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #1 /lib64/libsmbconf.so.0(smb_panic_s3+0x20) [0x7fd128de8f30] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #2 /lib64/libsamba-util.so.0(smb_panic+0x2f) [0x7fd12b2db57f] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #3 /usr/sbin/smbd(+0xc37c) [0x7fd12b97737c] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #4 /lib64/libsmbconf.so.0(run_events_poll+0x16c) [0x7fd128dfe34c] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #5 /lib64/libsmbconf.so.0(+0x355a0) [0x7fd128dfe5a0] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #6 /lib64/libtevent.so.0(_tevent_loop_once+0x8d) [0x7fd12782540d] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #7 /lib64/libtevent.so.0(tevent_common_loop_wait+0x1b) [0x7fd1278255ab] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #8 /usr/sbin/smbd(main+0x15d4) [0x7fd12b972ad4] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #9 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fd127481b35] Dec 2 10:02:18 dhcp47-12 smbd[31742]: #10 /usr/sbin/smbd(+0x7ea9) [0x7fd12b972ea9] Dec 2 10:02:18 dhcp47-12 smbd[31742]: [2016/12/02 10:02:18.088052, 0] ../source3/lib/dumpcore.c:303(dump_core) Dec 2 10:02:18 dhcp47-12 smbd[31742]: dumping core in /var/log/samba/cores/smbd Dec 2 10:02:18 dhcp47-12 smbd[31742]: Dec 2 10:02:18 dhcp47-12 abrt-hook-ccpp: Process 31742 (smbd) of user 0 killed by SIGABRT - dumping core Even then Samba logs(at least from logs provided @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1400957) does not have any information regarding these crashes. NT_STATUS_IO_DEVICE_ERROR is being mapped from unix errno EIO using map_nt_error_from_unix() within messaging_reinit() [1]. If so, with log level set to 5 in smb.conf we should have seen one among the following log entries: messaging_dgm_ref failed: Input/output error <-- debug level 2 OR messaging_ctdbd_init failed: Input/output error <-- debug level 1 But the above messages are also missing from Samba logs. Needs more investigation... [1] https://github.com/samba-team/samba/blob/samba-4.4.6/source3/lib/messages.c#L393 Hi Vivek, Myself and Gunther looked over the issue today. As mentioned in my previous comment, we couldn't find enough log entries from whatever you have provided now in order to pin point the exact reason for crash. It is because of the max log size set to 50 which will rename the log file to log.smbd.old if it exceeds 50KB and this repeats. Even though we hope/suspect some fixes that are already present in Samba upstream master to resolve this issue(found by going through the code path) we can only confirm the same based on better logs which will lead us to put a RCA for the crash. So, can you please try reproducing the crash after making the following changes to smb.conf? log level = 10 max log size = 0 The misbehavior was triggered by a connection to an internal (non-public) interface. This was in fact a crashed/ half-unmounted cifs-mount from node#1 to node#0. This was not visible any more, but the cifs.ko still tried to connect to the node-internal address of node #1 periodically. When ctdb is stopped (ctdb stop) -- each such SMB connection to an internal address will trigger a fork and the reinit_after_fork() will just panic trying to connect to ctdb (because ctdb will reject). This is by current design. Getting rid of the cifs mount (by rebooting) solved the problem for us. |