Bug 829694
Summary: | F17pre-ga3 PPC64:running fsstress test on samba4 mount points triggered call trace on client system | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | IBM Bug Proxy <bugproxy> | ||||||||||||||||
Component: | samba4 | Assignee: | Andreas Schneider <asn> | ||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||
Priority: | unspecified | ||||||||||||||||||
Version: | 17 | CC: | abokovoy, asn, bbaude, gansalmon, gdeschner, itamar, jkachuck, jonathan, kernel-maint, madhu.chinakonda, ovasik, sbose, ssorce, wgomerin | ||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | ppc64 | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2012-07-23 20:21:33 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
IBM Bug Proxy
2012-06-07 11:00:58 UTC
Created attachment 590161 [details]
dmesg.txt
Created attachment 590162 [details]
var-log-messages.txt
Created attachment 590163 [details]
dmesg-server.txt
Created attachment 590164 [details]
var-log-messages-server.txt.tgz
This has nothing to do with filesystem package. So what we see here is that kernel has exhausted available RAM and smbd was the process executed at the time when kernel was requested to allocate one page marked with GFP_ATOMIC | __GFP_COMP. In the logs attached I do not see any issue with smbd on the server side. If we look closer, what happens is that smbd received the data over network and tried to write it down to a file on reiserfs file system. At which point reiserfs tried to allocate a page and that caused kernel to reclaim free list of the cgroup in use. That caused XFS driver to shrink its buffers and that one failed. So, something leaked enough memory in a long term test. Could you please repeat the test with smbd from samba package (not samba4)? This way we will know the changes boiled down to affected source3/ code added in Samba master. ------- Comment From maknayak.com 2012-06-08 07:40 EDT------- Hello Redhat, I removed samba4 packages and reinstalled with samba packages . Tried to start the services , but starting smb service got failed. [root@miz11 ~]# systemctl start smb.service nmb.service Active: failed (Result: exit-code) since Fri, 08 Jun 2012 07:23:56 -0400; 2s ago Process: 822 ExecStart=/usr/sbin/smbd $SMBDOPTIONS (code=exited, status=0/SUCCESS) Main PID: 824 (code=exited, status=1/FAILURE) Active: active (running) since Fri, 08 Jun 2012 07:23:56 -0400; 3s ago Process: 823 ExecStart=/usr/sbin/nmbd $NMBDOPTIONS (code=exited, status=0/SUCCESS) Main PID: 826 (nmbd) ? 826 /usr/sbin/nmbd --- output from /var/log/messages --- Jun 8 07:37:56 miz11 smbd[692]: [2012/06/08 07:37:56.800347, 0] smbd/server.c:1107(main) Jun 8 07:37:56 miz11 smbd[692]: standard input is not a socket, assuming -D option Jun 8 07:37:56 miz11 systemd[1]: PID 519 read from file /run/smbd.pid does not exist. Jun 8 07:37:56 miz11 smbd[693]: [2012/06/08 07:37:56.811814, 0] registry/reg_init_basic.c:36(registry_init_common) Jun 8 07:37:56 miz11 smbd[693]: Failed to initialize the registry: WERR_CAN_NOT_COMPLETE Jun 8 07:37:57 miz11 systemd[1]: smb.service: main process exited, code=exited, status=1 Jun 8 07:37:57 miz11 systemd[1]: Unit smb.service entered failed state. --- Packages installed --- samba-winbind-clients-3.6.5-86.fc17.1.ppc64 samba-client-3.6.5-86.fc17.1.ppc64 samba-common-3.6.5-86.fc17.1.ppc64 samba-domainjoin-gui-3.6.5-86.fc17.1.ppc64 samba-doc-3.6.5-86.fc17.1.ppc64 samba-3.6.5-86.fc17.1.ppc64 samba-winbind-3.6.5-86.fc17.1.ppc64 samba-winbind-krb5-locator-3.6.5-86.fc17.1.ppc64 samba-swat-3.6.5-86.fc17.1.ppc64 you needed to correctly clean up samba4 install before starting samba packages. In particular, /var/lib/samba contains databases which may or may not be portable across different versions. In particular, registry.tdb has newer version number in samba4 than in samba3, this causes WERR_CAN_NOT_COMPLETE. Please back up /var/lib/samba and remove all databases from there between tests of different versions of Samba. You would need to set up things from scratch (add accounts to samba, etc) when downgrading Samba versions. ------- Comment From maknayak.com 2012-06-08 09:03 EDT------- (In reply to comment #15) > you needed to correctly clean up samba4 install before starting samba > packages. In particular, /var/lib/samba contains databases which may or may > not be portable across different versions. > In particular, registry.tdb has newer version number in samba4 than in > samba3, this causes WERR_CAN_NOT_COMPLETE. > > Please back up /var/lib/samba and remove all databases from there between > tests of different versions of Samba. You would need to set up things from > scratch (add accounts to samba, etc) when downgrading Samba versions. Hello Alexander, Thanks a lot for the trick.It worked. I have restarted the test on samba (smbd version Version 3.6.5-86.fc17) , I will leave the test run for at-least 24 hours. I will update you with results. ------- Comment From maknayak.com 2012-06-08 09:06 EDT------- One more thing I would like update for samba4 : While cleaning up samba4 tests, I unmounted all cifs mounts from SAMB4 client , which had triggered following call traces in dmesg output. [332333.794651] ------------[ cut here ]------------ [332333.794658] WARNING: at fs/namespace.c:795 [332333.794660] Modules linked in: des_generic md4 nls_utf8 cifs fscache lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_ipv4 ip6_tables nf_defrag_ipv4 xt_state nf_conntrack uinput windfarm_smu_sat i2c_core ibmveth windfarm_pid ibmvscsic scsi_transport_srp scsi_tgt [last unloaded: scsi_wait_scan] [332333.794699] NIP: c000000000263fd0 LR: c000000000263fc4 CTR: 0000000001679580 [332333.794703] REGS: c000000041583a00 TRAP: 0700 Not tainted (3.3.4-5.fc17.ppc64) [332333.794707] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 22000428 XER: 20000000 [332333.794717] CFAR: c000000000262f90 [332333.794720] TASK = c000000009980000[3311] 'umount' THREAD: c000000041580000 CPU: 2 [332333.794724] GPR00: c000000000263fc4 c000000041583c80 c0000000012d0a28 0000000000000000 [332333.794730] GPR04: 0000000000000400 0000000000000400 ffff000000000000 ffffffffffffffff [332333.794735] GPR08: 0000000000000000 0000000000000001 0000000000002000 0000000000000000 [332333.794741] GPR12: 0000000022000422 c00000000ec81000 0000000000000000 0000000000000000 [332333.794746] GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000041583d90 [332333.794752] GPR20: c000000041583da0 0000000000000000 c0000000015d9e00 0000000000000000 [332333.794757] GPR24: c000000053c73800 0000000040343fb8 c000000053c73800 0000000000000000 [332333.795272] GPR28: c0000000049afc00 c000000053c73800 c0000000012503e8 c000000041583c80 [332333.795289] NIP [c000000000263fd0] .mntput_no_expire+0x100/0x180 [332333.795294] LR [c000000000263fc4] .mntput_no_expire+0xf4/0x180 [332333.795297] Call Trace: [332333.795302] [c000000041583c80] [c000000000263fc4] .mntput_no_expire+0xf4/0x180 (unreliable) [332333.795308] [c000000041583d20] [c000000000265424] .SyS_umount+0xb4/0x450 [332333.795314] [c000000041583e30] [c0000000000098e4] syscall_exit+0x0/0x40 [332333.795318] Instruction dump: [332333.795321] 2f880000 409e0068 387d0068 481b8ce5 60000000 4bffdffd 387d0038 eb9d0028 [332333.795330] 4bffef11 7c630034 5463d97e 68690001 <0b090000> 387d0020 4802e0f9 60000000 [332333.795341] ---[ end trace ce6c48dbf9d981c0 ]--- Thanks... Manas ------- Comment From maknayak.com 2012-06-11 10:46 EDT------- (In reply to comment #16) > Hello Alexander, > Thanks a lot for the trick.It worked. > I have restarted the test on samba (smbd version Version 3.6.5-86.fc17) , I > will leave the test run for at-least 24 hours. I will update you with > results. > > Thanks... > Manas Hello Alexander, I could not reproduce this issue with samba 3.6.5 version. test was running for more than 48 hours but no sign of call trace yet or any error yet. Thanks... Manas Thank you Manas. I think this bug has to be moved to samba4. We'll keep looking into memory leaking but probably fix it around or after samba4 4.0 release as the code changes has to slow down first. Moved to samba4 for further research. Could you run the test with samba4 again and which the test is running check with smbcontrol <smbd-pid> pool-usage if you can find something suspicious? s/which/while/ Can you explain what the test is doing so that we can create a simpler version of it and maybe reproduce it here. ------- Comment From maknayak.com 2012-06-12 15:23 EDT------- (In reply to comment #21) > Can you explain what the test is doing so that we can create a simpler > version of it and maybe reproduce it here. Hello Andreas, fsstress is the I/o load generator by creating several directories and files with different modes. This test is from LTP test suite & I don't think you need to recreate again ... use it from ltp would be sufficient. Test case can be found in ltp-full-xxx/testcases/kernel/fs/fsstress/ Thanks... Manas samba4-4.0.0-53alpha18.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/samba4-4.0.0-53alpha18.fc17 Could you run the test again with the version from comment #17? Could you save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that: for p in $(pidof smbd); do smbcontrol $p pool-usage >> /tmp/samba-fsstress/smbd-pool-usage.$(date -u +%Y%m%d).$p; done Thanks. ------- Comment From maknayak.com 2012-06-13 19:48 EDT------- (In reply to comment #24) > Could you run the test again with the version from comment #17? Could you > save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that: > > for p in ; do smbcontrol $p pool-usage >> > /tmp/samba-fsstress/smbd-pool-usage..$p; done > > Thanks. Running the test on samba4 with above script on server ... will update result soon. Thanks... Manas ------- Comment From maknayak.com 2012-06-14 07:04 EDT------- (In reply to comment #25) > (In reply to comment #24) > > Could you run the test again with the version from comment #17? Could you > > save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that: > > > > for p in ; do smbcontrol $p pool-usage >> > > /tmp/samba-fsstress/smbd-pool-usage..$p; done > > > > Thanks. > > Running the test on samba4 with above script on server ... will update > result soon. > > Thanks... > Manas Here is the files attached to the bugzilla ,which conatins data on pool usage for smbd PIDs for samab4 fsstress test: smbd-pool-usage_PID.970 smbd-pool-usage_PID.971 smbd-pool-usage_PID.993.tgz (this was 7MB data ...so made tar file) Thanks... Manas Created attachment 591751 [details]
smbd-pool-usage_PID.970
------- Comment (attachment only) From maknayak.com 2012-06-14 07:05 EDT-------
Created attachment 591752 [details]
smbd-pool-usage_PID.971
------- Comment (attachment only) From maknayak.com 2012-06-14 07:05 EDT-------
Created attachment 591753 [details]
smbd-pool-usage_PID.993.tgz
------- Comment (attachment only) From maknayak.com 2012-06-14 07:06 EDT-------
Package samba4-4.0.0-53alpha18.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing samba4-4.0.0-53alpha18.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-9393/samba4-4.0.0-53alpha18.fc17 then log in and leave karma (feedback). samba4-4.0.0-54alpha18.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/samba4-4.0.0-54alpha18.fc17 The memory usage looks fine. It is always between 500 KB and 2 MB. So you did run into the problem again with these tests? The memory usage is higher with a lot of locks else it looks fine. Did you test the latest packages from comment #24 or comment #25? ------- Comment From maknayak.com 2012-06-20 06:38 EDT------- (In reply to comment #32) > The memory usage looks fine. It is always between 500 KB and 2 MB. So you > did run into the problem again with these tests? The memory usage is higher > with a lot of locks else it looks fine. > > Did you test the latest packages from comment #24 or comment #25? I could not find ppc64 packages for latest samba4 from the link http://koji.fedoraproject.org/koji/buildinfo?buildID=326027 & https://admin.fedoraproject.org/updates/samba4-4.0.0-54alpha18.fc17 It contains packages for ic86 & x86_64 only. Please share the exact link to download samba4 packages with latest patch for F17 PPC64 . Thanks... Manas samba4-4.0.0-55alpha18.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/samba4-4.0.0-55alpha18.fc17 ppc64 builds should eventually land here -> http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=593100 samba4-4.0.0-56alpha18.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/samba4-4.0.0-56alpha18.fc17 ------- Comment From maknayak.com 2012-06-28 03:10 EDT------- Verified on F17 GA build with the samba packages mentioned in Comment # 35 and this issue did not reproduce again. This issue is fixed with updated packages. --- upgraded samba packages --- [root@miz12 ~]# rpm -qa | grep -i samba samba4-test-4.0.0-55alpha18.fc17.ppc64 samba4-debuginfo-4.0.0-55alpha18.fc17.ppc64 samba4-winbind-clients-4.0.0-55alpha18.fc17.ppc64 samba4-libs-4.0.0-55alpha18.fc17.ppc64 samba4-client-4.0.0-55alpha18.fc17.ppc64 samba4-winbind-krb5-locator-4.0.0-55alpha18.fc17.ppc64 samba4-python-4.0.0-55alpha18.fc17.ppc64 samba4-dc-4.0.0-55alpha18.fc17.ppc64 samba4-4.0.0-55alpha18.fc17.ppc64 samba4-devel-4.0.0-55alpha18.fc17.ppc64 samba4-pidl-4.0.0-55alpha18.fc17.ppc64 samba4-dc-libs-4.0.0-55alpha18.fc17.ppc64 samba4-swat-4.0.0-55alpha18.fc17.ppc64 samba4-common-4.0.0-55alpha18.fc17.ppc64 samba4-winbind-4.0.0-55alpha18.fc17.ppc64 Thanks... Manas samba4-4.0.0-58alpha18.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/samba4-4.0.0-58alpha18.fc17 samba4-4.0.0-58alpha18.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |