Bug 829694

Summary:

F17pre-ga3 PPC64:running fsstress test on samba4 mount points triggered call trace on client system

Product:

[Fedora] Fedora

Reporter:

IBM Bug Proxy <bugproxy>

Component:

samba4

Assignee:

Andreas Schneider <asn>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

CC:

abokovoy, asn, bbaude, gansalmon, gdeschner, itamar, jkachuck, jonathan, kernel-maint, madhu.chinakonda, ovasik, sbose, ssorce, wgomerin

Target Milestone:

---

Target Release:

---

Hardware:

ppc64

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-07-23 20:21:33 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
dmesg.txt	none
var-log-messages.txt	none
dmesg-server.txt	none
var-log-messages-server.txt.tgz	none
smbd-pool-usage_PID.970	none
smbd-pool-usage_PID.971	none
smbd-pool-usage_PID.993.tgz	none

Description IBM Bug Proxy 2012-06-07 11:00:58 UTC

== Comment: #0 - MANAS K. NAYAK <maknayak.com> - 2012-06-06 03:57:09 ==
While running fsstress on samba mount points on F17 prega3 PPC64 on P7 Juno IOCL, noticed following call traces in dmesg & /var/log/messages on samba client.

--- Call trace on dmesg ---

[159579.063954] --- Exception: 901 at .smp_call_function_many+0x2b4/0x360
[159579.063960]     LR = .smp_call_function_many+0x294/0x360
[159579.063971] [c000000042003500] [c00000000011d4d0] .smp_call_function+0x60/0x90
[159579.063984] [c0000000420035a0] [c00000000011d548] .on_each_cpu+0x48/0x100
[159579.063999] [c000000042003650] [c0000000001bf874] .drain_all_pages+0x34/0x50
[159579.064012] [c0000000420036d0] [c0000000001c3a18] .__alloc_pages_nodemask+0x5f8/0x880
[159579.064028] [c000000042003860] [c00000000021becc] .new_slab+0x13c/0x4f0
[159579.064040] [c000000042003920] [c0000000008072a4] .__slab_alloc+0x284/0x574
[159579.064054] [c000000042003a50] [c00000000021eab8] .kmem_cache_alloc_node+0xb8/0x260
[159579.064068] [c000000042003b10] [c000000000018c30] .alloc_thread_info_node+0x40/0x60
[159579.064083] [c000000042003ba0] [c000000000097e18] .copy_process.part.22+0x138/0x1120
[159579.064095] [c000000042003ca0] [c000000000098ff0] .do_fork+0x140/0x540
[159579.064108] [c000000042003d80] [c000000000018250] .sys_clone+0x90/0xc0
[159579.064121] [c000000042003e30] [c000000000009b14] .ppc_clone+0x8/0xc
[159651.643728] INFO: rcu_sched detected stalls on CPUs/tasks: { 7} (detected by 2, t=13926706 jiffies)
[159651.643754] Call Trace:
[159651.643772] [c00000005ae82ae0] [c0000000000185e0] .show_stack+0xb0/0x1c0 (unreliable)
[159651.643790] [c00000005ae82ba0] [c00000000080397c] .dump_stack+0x28/0x3c
[159651.643805] [c00000005ae82c20] [c00000000016d1d0] .__rcu_pending+0x490/0x4a0
[159651.643819] [c00000005ae82cf0] [c00000000016e18c] .rcu_check_callbacks+0xac/0x270
[159651.643834] [c00000005ae82d90] [c0000000000b3564] .update_process_times+0x54/0xb0
[159651.643848] [c00000005ae82e30] [c00000000011522c] .tick_sched_timer+0x8c/0x110
[159651.643863] [c00000005ae82ee0] [c0000000000d4e54] .__run_hrtimer+0xc4/0x2c0
[159651.643877] [c00000005ae82f90] [c0000000000d6198] .hrtimer_interrupt+0x128/0x370
[159651.643891] [c00000005ae83080] [c000000000022bf0] .timer_interrupt+0xf0/0x2b0
[159651.643905] [c00000005ae83130] [c000000000003ab4] decrementer_common+0x134/0x180


NOTE: System was running for more than 24 hours
          dmesg.txt & -var-log-messages.txt attached , which contains detail logs for the exception & call traces.

 --- uname -a ---
Linux miz12.austin.ibm.com 3.3.4-5.fc17.ppc64 #1 SMP Mon May 14 10:18:37 MST 2012 ppc64 ppc64 ppc64 GNU/Linux

--- Steps to reproduce ---
At Samba Server Side :
1. Create about 5 partitions in a spare hard disk by using fdisk of size 4 GB.
2. Make different file systems(ext2,ext3,ext4,xfs,reiserfs) on these 5 partitions.
3. mkdir /SAMBA1 /SAMBA2 /SAMBA3 /SAMBA4 /SAMBA5.
4. MOunt those partitions
5. Give 777 permissions for these 10 directories by issuing chmod 777 / SAMBA*
6. Install samba4 packages , then go to /etc/samba directory and edit the smb.conf file with these
modifications for all the mounted directories,..add the following to the smb.conf.

[SAMBA1]
path = /SAMBA1
public = yes
only guest = yes
writable = yes
printable = yes

[SAMBA2]
path = /SAMBA2
public = yes
only guest = yes
writable = yes
printable = yes
.
.
.
[SAMBA5]
path = /SAMBA5
public = yes
only guest = yes
writable = yes
printable = yes

7) enable & start the samba server by starting smb and nmb services
systemctl enable smb.services nmb.services
systemctl start smb.service nmb.service
[root@miz11 ~]# systemctl status smb.service nmb.service
smb.service - Samba SMB Daemon
	  Loaded: loaded (/usr/lib/systemd/system/smb.service; enabled)
	  Active: active (running) since Mon, 04 Jun 2012 11:29:34 -0400; 1 day and 20h ago
	Main PID: 984 (smbd)
	  CGroup: name=systemd:/system/smb.service
		  ?  984 /usr/sbin/smbd
		  ?  985 /usr/sbin/smbd
		  ? 7966 /usr/sbin/smbd


nmb.service - Samba NMB Daemon
	  Loaded: loaded (/usr/lib/systemd/system/nmb.service; enabled)
	  Active: active (running) since Mon, 04 Jun 2012 11:29:33 -0400; 1 day and 20h ago
	Main PID: 982 (nmbd)
	  CGroup: name=systemd:/system/nmb.service
		  ? 982 /usr/sbin/nmbd

8) set samba password for root
# smbpasswd -a root

9) #mount
/dev/sda6 on /SAMBA1 type ext3 (rw,relatime,user_xattr,barrier=1,nodelalloc,data=ordered)
/dev/sda7 on /SAMBA2 type ext4 (rw,relatime,user_xattr,barrier=1,data=ordered)
/dev/sda8 on /SAMBA3 type xfs (rw,relatime,attr2,noquota)
/dev/sda9 on /SAMBA4 type reiserfs (rw,relatime)
/dev/sda11 on /SAMBA5 type ext2 (rw,relatime,user_xattr,barrier=1)


On Samba client Side:
1) mkdir /SAMBA2 /SAMBA3 /SAMBA4 /SAMBA5
2)  chmmod 777 /SAMBA*
3) mount samba shares on client
    mount -t cifs -o username=root //9.3.110.176/SAMBA1 /SAMBA1
    mount -t cifs -o username=root //9.3.110.176/SAMBA2 /SAMBA2
    mount -t cifs -o username=root //9.3.110.176/SAMBA3 /SAMBA3
    mount -t cifs -o username=root //9.3.110.176/SAMBA4 /SAMBA4
    mount -t cifs -o username=root //9.3.110.176/SAMBA5 /SAMBA5

4) # mount 
//9.3.110.176/SAMBA1 on /SAMBA1 type cifs (rw,relatime,sec=ntlm,unc=\\9.3.110.176\SAMBA1,username=root,uid=0,noforceuid,gid=0,noforcegid,addr=9.3.110.176,unix,posixpaths,serverino,acl,rsize=1048576,wsize=65536,actimeo=1)
//9.3.110.176/SAMBA2 on /SAMBA2 type cifs (rw,relatime,sec=ntlm,unc=\\9.3.110.176\SAMBA2,username=root,uid=0,noforceuid,gid=0,noforcegid,addr=9.3.110.176,unix,posixpaths,serverino,acl,rsize=1048576,wsize=65536,actimeo=1)
//9.3.110.176/SAMBA3 on /SAMBA3 type cifs (rw,relatime,sec=ntlm,unc=\\9.3.110.176\SAMBA3,username=root,uid=0,noforceuid,gid=0,noforcegid,addr=9.3.110.176,unix,posixpaths,acl,rsize=1048576,wsize=65536,actimeo=1)
//9.3.110.176/SAMBA4 on /SAMBA4 type cifs (rw,relatime,sec=ntlm,unc=\\9.3.110.176\SAMBA4,username=root,uid=0,noforceuid,gid=0,noforcegid,addr=9.3.110.176,unix,posixpaths,serverino,acl,rsize=1048576,wsize=65536,actimeo=1)
//9.3.110.176/SAMBA5 on /SAMBA5 type cifs (rw,relatime,sec=ntlm,unc=\\9.3.110.176\SAMBA5,username=root,uid=0,noforceuid,gid=0,noforcegid,addr=9.3.110.176,unix,posixpaths,serverino,acl,rsize=1048576,wsize=65536,actimeo=1)

5) Now  run fsstress test on these mount points to do the I/O on these File Systems
nohup ./fsstress ?d /SAMBA1 ?l 0 ?n 1000 ?p 4 ?r &
nohup ./fsstress ?d /SAMBA2 ?l 0 ?n 1000 ?p 4 ?r &
nohup ./fsstress ?d /SAMBA3 ?l 0 ?n 1000 ?p 4 ?r &
nohup ./fsstress ?d /SAMBA4 ?l 0 ?n 1000 ?p 4 ?r &
nohup ./fsstress ?d /SAMBA5 ?l 0 ?n 1000 ?p 4 ?r &

Run it more for 24 hour ... you may see call trace on client dmesg.

== Comment: #1 - MANAS K. NAYAK <maknayak.com> - 2012-06-06 05:19:48 ==


== Comment: #2 - MANAS K. NAYAK <maknayak.com> - 2012-06-06 05:25:22 ==


== Comment: #3 - SRINIVASA N. THIMALAPUR <srinivasa.tn.com> - 2012-06-06 05:48:54 ==
Hi Manas,
   What is the os and h/w of server?  Are there any odd messages in the server log?  Are there any failure reports in the fstress?

Regards,
Seenu.

== Comment: #4 - MANAS K. NAYAK <maknayak.com> - 2012-06-06 06:22:06 ==
(In reply to comment #3)
> Hi Manas,
>    What is the os and h/w of server?  Are there any odd messages in the
> server log?  Are there any failure reports in the fstress?
> 
> Regards,
> Seenu.

Hi Seenu,
I have already mentioned OS & hardware details on my above problem description part.
Providing here again along with some other details :

OS : F17 prega3 PPC64 
H/W: P7 Juno IOCL LPAR
Memory: 2GB 

[root@miz12 ~]# lscpu 
Architecture:          ppc64
Byte Order:            Big Endian
CPU(s):                48
On-line CPU(s) list:   0-7
Off-line CPU(s) list:  8-47
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          2
Model:                 IBM,8246-L2C
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     

--- samba installed packages ---
[root@miz11 ~]# rpm -qa | grep -i samba
samba4-dc-4.0.0-47alpha18.fc17.ppc64
samba4-dc-libs-4.0.0-47alpha18.fc17.ppc64
samba4-libs-4.0.0-47alpha18.fc17.ppc64
samba4-client-4.0.0-47alpha18.fc17.ppc64
samba4-common-4.0.0-47alpha18.fc17.ppc64
samba4-4.0.0-47alpha18.fc17.ppc64
samba4-winbind-4.0.0-47alpha18.fc17.ppc64

[root@miz11 ~]# smbd -V
Version 4.0.0alpha18
[root@miz11 ~]# nmbd -V
Version 4.0.0alpha18

Thanks...
Manas

== Comment: #5 - SRINIVASA N. THIMALAPUR <srinivasa.tn.com> - 2012-06-06 06:34:36 ==
Hi Manas,
   The above mentioned details are for client side.  I was asking for the details of server?  Or is that server and client are running on the same system?

Regards,
Seenu.

== Comment: #6 - MANAS K. NAYAK <maknayak.com> - 2012-06-06 10:42:42 ==
(In reply to comment #5)
> Hi Manas,

Hi Seenu,
>    The above mentioned details are for client side.  I was asking for the
> details of server? 

Ahh... Sorry , I misunderstood .
Both the systems (client & server) are having same OS with F17 prega3 ppc64.

> Or is that server and client are running on the same
> system?
No, Both are running on top of separate P7 Juno IOCL LPARs.


Also samba server has following call trace in dmesg & /var/log/messages output.

[180836.911287] smbd: page allocation failure: order:0, mode:0x4020
[180836.911296] Call Trace:
[180836.911310] [c00000007fff3380] [c0000000000185e0] .show_stack+0xb0/0x1c0 (unreliable)
[180836.911320] [c00000007fff3440] [c00000000080397c] .dump_stack+0x28/0x3c
[180836.911327] [c00000007fff34c0] [c0000000001bfe98] .warn_alloc_failed+0x118/0x170
[180836.911332] [c00000007fff3580] [c0000000001c3acc] .__alloc_pages_nodemask+0x6ac/0x880
[180836.911337] [c00000007fff3710] [c0000000002112d0] .alloc_pages_current+0xc0/0x150
[180836.911343] [c00000007fff37b0] [c00000000021c234] .new_slab+0x4a4/0x4f0
[180836.911347] [c00000007fff3870] [c0000000008072a4] .__slab_alloc+0x284/0x574
[180836.911352] [c00000007fff39a0] [c00000000022219c] .__kmalloc_node_track_caller+0xdc/0x310
[180836.911358] [c00000007fff3a60] [c00000000069e6c4] .__alloc_skb+0xa4/0x1b0
[180836.911362] [c00000007fff3b10] [c00000000069f04c] .__netdev_alloc_skb+0x4c/0x90
[180836.911374] [c00000007fff3bb0] [d000000000d50920] .ibmveth_replenish_task+0x170/0x4f0 [ibmveth]
[180836.911380] [c00000007fff3cb0] [d000000000d52008] .ibmveth_poll+0x168/0x3e0 [ibmveth]
[180836.911386] [c00000007fff3db0] [c0000000006ad730] .net_rx_action+0x1f0/0x360
[180836.911392] [c00000007fff3ea0] [c0000000000a5988] .__do_softirq+0x118/0x2d0
[180836.911398] [c00000007fff3f90] [c000000000028024] .call_do_softirq+0x14/0x24
[180836.911403] [c00000005d682500] [c0000000000114c0] .do_softirq+0x130/0x170
[180836.911408] [c00000005d6825b0] [c0000000000a5f14] .irq_exit+0xd4/0x100
[180836.911412] [c00000005d682640] [c000000000011044] .do_IRQ+0xd4/0x310
[180836.911417] [c00000005d682700] [c0000000000052b4] hardware_interrupt_entry+0x18/0x1c
[180836.911517] --- Exception: 501 at .xfs_count_page_state+0x74/0xc0 [xfs]
[180836.911520]     LR = .xfs_vm_releasepage+0x50/0x140 [xfs]
[180836.911531] [c00000005d6829f0] [c00000000022abe4] .__mem_cgroup_uncharge_common+0x114/0x300 (unreliable)
[180836.911559] [c00000005d682a90] [d00000000137feb0] .xfs_vm_releasepage+0x50/0x140 [xfs]
[180836.911566] [c00000005d682b40] [c0000000001b6e68] .try_to_release_page+0x88/0xe0
[180836.911571] [c00000005d682bd0] [c0000000001d2f84] .shrink_page_list+0x724/0xbb0
[180836.911576] [c00000005d682d60] [c0000000001d39fc] .shrink_inactive_list+0x24c/0x630
[180836.911581] [c00000005d682e70] [c0000000001d4580] .shrink_mem_cgroup_zone+0x330/0x620
[180836.911586] [c00000005d682fe0] [c0000000001d4908] .shrink_zone+0x98/0xe0
[180836.911590] [c00000005d6830b0] [c0000000001d4b08] .zone_reclaim+0x1b8/0x380
[180836.911596] [c00000005d6831c0] [c0000000001c3240] .get_page_from_freelist+0x7d0/0x9b0
[180836.911600] [c00000005d683350] [c0000000001c35a0] .__alloc_pages_nodemask+0x180/0x880
[180836.911605] [c00000005d6834e0] [c0000000002112d0] .alloc_pages_current+0xc0/0x150
[180836.911610] [c00000005d683580] [c0000000001b8c1c] .__page_cache_alloc+0xdc/0x130
[180836.911615] [c00000005d683610] [c0000000001b8fe4] .grab_cache_page_write_begin+0x84/0x140
[180836.911628] [c00000005d6836c0] [d00000000157d6f0] .reiserfs_write_begin+0x90/0x2d0 [reiserfs]
[180836.911633] [c00000005d683780] [c0000000001b7a08] .generic_file_buffered_write+0x228/0x340
[180836.911637] [c00000005d6838b0] [c0000000001ba2c8] .__generic_file_aio_write+0x298/0x430
[180836.911642] [c00000005d6839b0] [c0000000001ba4f8] .generic_file_aio_write+0x98/0x170
[180836.911648] [c00000005d683aa0] [c000000000236094] .do_sync_write+0xc4/0x150
[180836.911657] [c00000005d683c30] [d000000001583854] .reiserfs_file_write+0xa4/0xf0 [reiserfs]
[180836.911662] [c00000005d683cd0] [c000000000236cdc] .vfs_write+0xdc/0x200
[180836.911667] [c00000005d683d80] [c000000000237360] .SyS_pwrite64+0xc0/0xe0
[180836.911673] [c00000005d683e30] [c0000000000098e4] syscall_exit+0x0/0x40
[180836.911677] Mem-Info:
[180836.911680] Node 1 DMA per-cpu:
[180836.911684] CPU    0: hi:    6, btch:   1 usd:   5
[180836.911687] CPU    1: hi:    6, btch:   1 usd:   5
[180836.911689] CPU    2: hi:    6, btch:   1 usd:   0
[180836.911692] CPU    3: hi:    6, btch:   1 usd:   0
[180836.911695] CPU    4: hi:    6, btch:   1 usd:   5
[180836.911697] CPU    5: hi:    6, btch:   1 usd:   4
[180836.911700] CPU    6: hi:    6, btch:   1 usd:   0
[180836.911703] CPU    7: hi:    6, btch:   1 usd:   0
[180836.911710] active_anon:1231 inactive_anon:1285 isolated_anon:0
[180836.911711]  active_file:7075 inactive_file:7159 isolated_file:0
[180836.911713]  unevictable:0 dirty:717 writeback:0 unstable:0
[180836.911714]  free:52 slab_reclaimable:1158 slab_unreclaimable:13074
[180836.911716]  mapped:966 shmem:1126 pagetables:213 bounce:0
[180836.911719] Node 1 DMA free:3328kB min:5760kB low:7168kB high:8640kB active_anon:78784kB inactive_anon:82240kB active_file:452800kB inactive_file:458176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2095360kB mlocked:0kB dirty:45888kB writeback:0kB mapped:61824kB shmem:72064kB slab_reclaimable:74112kB slab_unreclaimable:836736kB kernel_stack:2448kB pagetables:13632kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[180836.911736] lowmem_reserve[]: 0 0 0
[180836.911741] Node 1 DMA: 5*64kB 8*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB 0*8192kB 0*16384kB = 3392kB
[180836.911753] 15428 total pagecache pages
[180836.911755] 179 pages in swap cache
[180836.911758] Swap cache stats: add 8816, delete 8637, find 26309/27606
[180836.911761] Free swap  = 856384kB
[180836.911762] Total swap = 1023936kB
[180836.912708] 32768 pages RAM
[180836.912712] 922 pages reserved
[180836.912714] 16065 pages shared
[180836.912716] 17655 pages non-shared
[180836.912720] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[180836.912725]   cache: kmalloc-4096, object size: 4096, buffer size: 4096, default order: 1, min order: 0
[180836.912732]   node 1: slabs: 84, objs: 2688, free: 115

There are repeated below messages in /var/log/messages of samba server:
Jun  6 06:27:06 miz11 smbd[7882]: [2012/06/06 06:27:06.616061,  0] ../source3/printing/print_cups.c:110(cups_connect)
Jun  6 06:27:06 miz11 smbd[7882]:   Unable to connect to CUPS server localhost:631 - Connection refused
Jun  6 06:27:06 miz11 smbd[985]: [2012/06/06 06:27:06.616335,  0] ../source3/printing/print_cups.c:487(cups_async_callback)
Jun  6 06:27:06 miz11 smbd[985]:   failed to retrieve printer list: NT_STATUS_UNSUCCESSFUL
Jun  6 06:27:11 miz11 systemd-logind[435]: New session 305 of user root.
Jun  6 06:28:36 miz11 smbd[7966]: [2012/06/06 06:28:36.521272,  0] ../source3/auth/auth.c:425(load_auth_module)
Jun  6 06:28:36 miz11 smbd[7966]:   load_auth_module: can't find auth method guest!
Jun  6 06:28:36 miz11 smbd[7966]: [2012/06/06 06:28:36.875498,  0] ../source3/smbd/trans2.c:1285(unix_filetype)
Jun  6 06:28:36 miz11 smbd[7966]:   unix_filetype: unknown filetype 0
Jun  6 06:28:36 miz11 smbd[7966]: [2012/06/06 06:28:36.891520,  0] ../source3/smbd/trans2.c:1285(unix_filetype)
Jun  6 06:28:36 miz11 smbd[7966]:   unix_filetype: unknown filetype 0

I have attached both dmesg & /var/log/messages output in file dmesg-server.txt & var-log-messages-server.txt.tgz.

Thanks...
Manas

Comment 1 IBM Bug Proxy 2012-06-07 11:01:22 UTC

Created attachment 590161 [details]
dmesg.txt

Comment 2 IBM Bug Proxy 2012-06-07 11:01:39 UTC

Created attachment 590162 [details]
var-log-messages.txt

Comment 3 IBM Bug Proxy 2012-06-07 11:01:54 UTC

Created attachment 590163 [details]
dmesg-server.txt

Comment 4 IBM Bug Proxy 2012-06-07 11:02:15 UTC

Created attachment 590164 [details]
var-log-messages-server.txt.tgz

Comment 5 Ondrej Vasik 2012-06-07 11:19:04 UTC

This has nothing to do with filesystem package.

Comment 6 Alexander Bokovoy 2012-06-07 12:05:23 UTC

So what we see here is that kernel has exhausted available RAM and smbd was the process executed at the time when kernel was requested to allocate one page marked with GFP_ATOMIC | __GFP_COMP.

In the logs attached I do not see any issue with smbd on the server side.

If we look closer, what happens is that smbd received the data over network and tried to write it down to a file on reiserfs file system. At which point reiserfs tried to allocate a page and that caused kernel to reclaim free list of the cgroup in use. That caused XFS driver to shrink its buffers and that one failed.

So, something leaked enough memory in a long term test.

Could you please repeat the test with smbd from samba package (not samba4)? This way we will know the changes boiled down to affected source3/ code added in Samba master.

Comment 7 IBM Bug Proxy 2012-06-08 07:52:50 UTC

------- Comment From maknayak.com 2012-06-08 07:40 EDT-------
Hello Redhat,

I removed samba4 packages and reinstalled with samba packages .
Tried to start the services , but starting smb service got failed.

[root@miz11 ~]# systemctl start smb.service nmb.service
Active: failed (Result: exit-code) since Fri, 08 Jun 2012 07:23:56 -0400; 2s ago
Process: 822 ExecStart=/usr/sbin/smbd $SMBDOPTIONS (code=exited, status=0/SUCCESS)
Main PID: 824 (code=exited, status=1/FAILURE)

Active: active (running) since Fri, 08 Jun 2012 07:23:56 -0400; 3s ago
Process: 823 ExecStart=/usr/sbin/nmbd $NMBDOPTIONS (code=exited, status=0/SUCCESS)
Main PID: 826 (nmbd)
? 826 /usr/sbin/nmbd

--- output from /var/log/messages ---
Jun  8 07:37:56 miz11 smbd[692]: [2012/06/08 07:37:56.800347,  0] smbd/server.c:1107(main)
Jun  8 07:37:56 miz11 smbd[692]:   standard input is not a socket, assuming -D option
Jun  8 07:37:56 miz11 systemd[1]: PID 519 read from file /run/smbd.pid does not exist.
Jun  8 07:37:56 miz11 smbd[693]: [2012/06/08 07:37:56.811814,  0] registry/reg_init_basic.c:36(registry_init_common)
Jun  8 07:37:56 miz11 smbd[693]:   Failed to initialize the registry: WERR_CAN_NOT_COMPLETE
Jun  8 07:37:57 miz11 systemd[1]: smb.service: main process exited, code=exited, status=1
Jun  8 07:37:57 miz11 systemd[1]: Unit smb.service entered failed state.

--- Packages installed ---
samba-winbind-clients-3.6.5-86.fc17.1.ppc64
samba-client-3.6.5-86.fc17.1.ppc64
samba-common-3.6.5-86.fc17.1.ppc64
samba-domainjoin-gui-3.6.5-86.fc17.1.ppc64
samba-doc-3.6.5-86.fc17.1.ppc64
samba-3.6.5-86.fc17.1.ppc64
samba-winbind-3.6.5-86.fc17.1.ppc64
samba-winbind-krb5-locator-3.6.5-86.fc17.1.ppc64
samba-swat-3.6.5-86.fc17.1.ppc64

Comment 8 Alexander Bokovoy 2012-06-08 08:05:22 UTC

you needed to correctly clean up samba4 install before starting samba packages. In particular, /var/lib/samba contains databases which may or may not be portable across different versions.

In particular, registry.tdb has newer version number in samba4 than in samba3, this causes WERR_CAN_NOT_COMPLETE.

Please back up /var/lib/samba and remove all databases from there between tests of different versions of Samba. You would need to set up things from scratch (add accounts to samba, etc) when downgrading Samba versions.

Comment 9 IBM Bug Proxy 2012-06-08 09:11:42 UTC

------- Comment From maknayak.com 2012-06-08 09:03 EDT-------
(In reply to comment #15)
> you needed to correctly clean up samba4 install before starting samba
> packages. In particular, /var/lib/samba contains databases which may or may
> not be portable across different versions.
> In particular, registry.tdb has newer version number in samba4 than in
> samba3, this causes WERR_CAN_NOT_COMPLETE.
>
> Please back up /var/lib/samba and remove all databases from there between
> tests of different versions of Samba. You would need to set up things from
> scratch (add accounts to samba, etc) when downgrading Samba versions.

Hello Alexander,
Thanks a lot for the trick.It worked.
I have restarted the test on samba (smbd version Version 3.6.5-86.fc17) , I will leave the test run for at-least 24 hours. I will update you with results.

------- Comment From maknayak.com 2012-06-08 09:06 EDT-------
One more thing I would like update for samba4 :
While cleaning up samba4 tests, I unmounted all cifs mounts from SAMB4 client , which had triggered following call traces in dmesg output.

[332333.794651] ------------[ cut here ]------------
[332333.794658] WARNING: at fs/namespace.c:795
[332333.794660] Modules linked in: des_generic md4 nls_utf8 cifs fscache lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter nf_conntrack_ipv4 ip6_tables nf_defrag_ipv4 xt_state nf_conntrack uinput windfarm_smu_sat i2c_core ibmveth windfarm_pid ibmvscsic scsi_transport_srp scsi_tgt [last unloaded: scsi_wait_scan]
[332333.794699] NIP: c000000000263fd0 LR: c000000000263fc4 CTR: 0000000001679580
[332333.794703] REGS: c000000041583a00 TRAP: 0700   Not tainted  (3.3.4-5.fc17.ppc64)
[332333.794707] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 22000428  XER: 20000000
[332333.794717] CFAR: c000000000262f90
[332333.794720] TASK = c000000009980000[3311] 'umount' THREAD: c000000041580000 CPU: 2
[332333.794724] GPR00: c000000000263fc4 c000000041583c80 c0000000012d0a28 0000000000000000
[332333.794730] GPR04: 0000000000000400 0000000000000400 ffff000000000000 ffffffffffffffff
[332333.794735] GPR08: 0000000000000000 0000000000000001 0000000000002000 0000000000000000
[332333.794741] GPR12: 0000000022000422 c00000000ec81000 0000000000000000 0000000000000000
[332333.794746] GPR16: 0000000000000000 0000000000000000 0000000000000000 c000000041583d90
[332333.794752] GPR20: c000000041583da0 0000000000000000 c0000000015d9e00 0000000000000000
[332333.794757] GPR24: c000000053c73800 0000000040343fb8 c000000053c73800 0000000000000000
[332333.795272] GPR28: c0000000049afc00 c000000053c73800 c0000000012503e8 c000000041583c80
[332333.795289] NIP [c000000000263fd0] .mntput_no_expire+0x100/0x180
[332333.795294] LR [c000000000263fc4] .mntput_no_expire+0xf4/0x180
[332333.795297] Call Trace:
[332333.795302] [c000000041583c80] [c000000000263fc4] .mntput_no_expire+0xf4/0x180 (unreliable)
[332333.795308] [c000000041583d20] [c000000000265424] .SyS_umount+0xb4/0x450
[332333.795314] [c000000041583e30] [c0000000000098e4] syscall_exit+0x0/0x40
[332333.795318] Instruction dump:
[332333.795321] 2f880000 409e0068 387d0068 481b8ce5 60000000 4bffdffd 387d0038 eb9d0028
[332333.795330] 4bffef11 7c630034 5463d97e 68690001 <0b090000> 387d0020 4802e0f9 60000000
[332333.795341] ---[ end trace ce6c48dbf9d981c0 ]---

Thanks...
Manas

Comment 10 IBM Bug Proxy 2012-06-11 10:54:05 UTC

------- Comment From maknayak.com 2012-06-11 10:46 EDT-------
(In reply to comment #16)
> Hello Alexander,
> Thanks a lot for the trick.It worked.
> I have restarted the test on samba (smbd version Version 3.6.5-86.fc17) , I
> will leave the test run for at-least 24 hours. I will update you with
> results.
>
> Thanks...
> Manas

Hello Alexander,
I could not reproduce this issue with samba 3.6.5 version. test was running for more than 48 hours but no sign of call trace yet or any error yet.

Thanks...
Manas

Comment 11 Alexander Bokovoy 2012-06-11 11:38:32 UTC

Thank you Manas.

I think this bug has to be moved to samba4. We'll keep looking into memory leaking but probably fix it around or after samba4 4.0 release as the code changes has to slow down first.

Comment 12 Alexander Bokovoy 2012-06-11 11:43:06 UTC

Moved to samba4 for further research.

Comment 13 Andreas Schneider 2012-06-12 14:12:24 UTC

Could you run the test with samba4 again and which the test is running check with

smbcontrol <smbd-pid> pool-usage

if you can find something suspicious?

Comment 14 Andreas Schneider 2012-06-12 14:14:23 UTC

s/which/while/

Comment 15 Andreas Schneider 2012-06-12 14:28:20 UTC

Can you explain what the test is doing so that we can create a simpler version of it and maybe reproduce it here.

Comment 16 IBM Bug Proxy 2012-06-12 15:32:35 UTC

------- Comment From maknayak.com 2012-06-12 15:23 EDT-------
(In reply to comment #21)
> Can you explain what the test is doing so that we can create a simpler
> version of it and maybe reproduce it here.

Hello Andreas,
fsstress is the I/o load generator by creating several directories and files with different modes.
This test is from LTP test suite & I don't think you need to recreate again ... use it from ltp would be sufficient.
Test case can be found in ltp-full-xxx/testcases/kernel/fs/fsstress/

Thanks...
Manas

Comment 17 Fedora Update System 2012-06-12 18:48:12 UTC

samba4-4.0.0-53alpha18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/samba4-4.0.0-53alpha18.fc17

Comment 18 Andreas Schneider 2012-06-13 09:06:47 UTC

Could you run the test again with the version from comment #17? Could you save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that:

for p in $(pidof smbd); do smbcontrol $p pool-usage >> /tmp/samba-fsstress/smbd-pool-usage.$(date -u +%Y%m%d).$p; done

Thanks.

Comment 19 IBM Bug Proxy 2012-06-13 19:51:31 UTC

------- Comment From maknayak.com 2012-06-13 19:48 EDT-------
(In reply to comment #24)
> Could you run the test again with the version from comment #17? Could you
> save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that:
>
> for p in ; do smbcontrol $p pool-usage >>
> /tmp/samba-fsstress/smbd-pool-usage..$p; done
>
> Thanks.

Running the test on samba4 with above script on server ... will update result soon.

Thanks...
Manas

Comment 20 IBM Bug Proxy 2012-06-14 07:12:00 UTC

------- Comment From maknayak.com 2012-06-14 07:04 EDT-------
(In reply to comment #25)
> (In reply to comment #24)
> > Could you run the test again with the version from comment #17? Could you
> > save 'smbcontrol <smbd-pid> pool-usage' every hour? Something like that:
> >
> > for p in ; do smbcontrol $p pool-usage >>
> > /tmp/samba-fsstress/smbd-pool-usage..$p; done
> >
> > Thanks.
>
> Running the test on samba4 with above script on server ... will update
> result soon.
>
> Thanks...
> Manas

Here is the files attached to the bugzilla ,which conatins data on pool usage for smbd PIDs for samab4 fsstress test:

smbd-pool-usage_PID.970
smbd-pool-usage_PID.971
smbd-pool-usage_PID.993.tgz (this was 7MB data ...so made tar file)

Thanks...
Manas

Comment 21 IBM Bug Proxy 2012-06-14 07:12:22 UTC

Created attachment 591751 [details]
smbd-pool-usage_PID.970


------- Comment (attachment only) From maknayak.com 2012-06-14 07:05 EDT-------

Comment 22 IBM Bug Proxy 2012-06-14 07:12:38 UTC

Created attachment 591752 [details]
smbd-pool-usage_PID.971


------- Comment (attachment only) From maknayak.com 2012-06-14 07:05 EDT-------

Comment 23 IBM Bug Proxy 2012-06-14 07:12:53 UTC

Created attachment 591753 [details]
smbd-pool-usage_PID.993.tgz


------- Comment (attachment only) From maknayak.com 2012-06-14 07:06 EDT-------

Comment 24 Fedora Update System 2012-06-15 00:27:13 UTC

Package samba4-4.0.0-53alpha18.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing samba4-4.0.0-53alpha18.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9393/samba4-4.0.0-53alpha18.fc17
then log in and leave karma (feedback).

Comment 25 Fedora Update System 2012-06-18 12:26:18 UTC

samba4-4.0.0-54alpha18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/samba4-4.0.0-54alpha18.fc17

Comment 26 Andreas Schneider 2012-06-18 13:20:42 UTC

The memory usage looks fine. It is always between 500 KB and 2 MB. So you did run into the problem again with these tests? The memory usage is higher with a lot of locks else it looks fine.

Did you test the latest packages from comment #24 or comment #25?

Comment 27 IBM Bug Proxy 2012-06-20 06:42:32 UTC

------- Comment From maknayak.com 2012-06-20 06:38 EDT-------
(In reply to comment #32)
> The memory usage looks fine. It is always between 500 KB and 2 MB. So you
> did run into the problem again with these tests? The memory usage is higher
> with a lot of locks else it looks fine.
>
> Did you test the latest packages from comment #24 or comment #25?

I could not find ppc64 packages for latest samba4 from the link
http://koji.fedoraproject.org/koji/buildinfo?buildID=326027 & https://admin.fedoraproject.org/updates/samba4-4.0.0-54alpha18.fc17

It contains packages for ic86 & x86_64 only.
Please share the exact link to download samba4 packages with latest patch for F17 PPC64 .

Thanks...
Manas

Comment 28 Fedora Update System 2012-06-21 20:02:21 UTC

samba4-4.0.0-55alpha18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/samba4-4.0.0-55alpha18.fc17

Comment 29 Brent Baude 2012-06-21 21:52:40 UTC

ppc64 builds should eventually land here -> http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=593100

Comment 30 Fedora Update System 2012-06-22 15:21:32 UTC

samba4-4.0.0-56alpha18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/samba4-4.0.0-56alpha18.fc17

Comment 31 IBM Bug Proxy 2012-06-28 03:40:06 UTC

------- Comment From maknayak.com 2012-06-28 03:10 EDT-------
Verified on F17 GA build with the samba packages mentioned in Comment # 35 and this issue did not reproduce again. This issue is fixed with updated packages.

--- upgraded samba packages ---

[root@miz12 ~]# rpm -qa | grep -i samba
samba4-test-4.0.0-55alpha18.fc17.ppc64
samba4-debuginfo-4.0.0-55alpha18.fc17.ppc64
samba4-winbind-clients-4.0.0-55alpha18.fc17.ppc64
samba4-libs-4.0.0-55alpha18.fc17.ppc64
samba4-client-4.0.0-55alpha18.fc17.ppc64
samba4-winbind-krb5-locator-4.0.0-55alpha18.fc17.ppc64
samba4-python-4.0.0-55alpha18.fc17.ppc64
samba4-dc-4.0.0-55alpha18.fc17.ppc64
samba4-4.0.0-55alpha18.fc17.ppc64
samba4-devel-4.0.0-55alpha18.fc17.ppc64
samba4-pidl-4.0.0-55alpha18.fc17.ppc64
samba4-dc-libs-4.0.0-55alpha18.fc17.ppc64
samba4-swat-4.0.0-55alpha18.fc17.ppc64
samba4-common-4.0.0-55alpha18.fc17.ppc64
samba4-winbind-4.0.0-55alpha18.fc17.ppc64

Thanks...
Manas

Comment 32 Fedora Update System 2012-07-12 16:11:34 UTC

samba4-4.0.0-58alpha18.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/samba4-4.0.0-58alpha18.fc17

Comment 33 Fedora Update System 2012-07-23 20:21:33 UTC

samba4-4.0.0-58alpha18.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.