Bug 859131

Summary: Performance problem copying/writing file on CIFS using Oracle FILE_UTL procedures.
Product: Red Hat Enterprise Linux 5 Reporter: Johan Bergström <johan.bergstrom>
Component: kernelAssignee: Sachin Prabhu <sprabhu>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.8CC: asn, gdeschner, jlayton, rwheeler, sbose
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-15 19:21:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
copyfile.sql none

Description Johan Bergström 2012-09-20 15:47:28 UTC
Created attachment 614966 [details]
copyfile.sql

Description of problem:

When using Oracle FILE_UTL procedures to copy a file in a CIFS directory to a new file on the same CIFS share performance is horrible. We're talking 1-10 kb/s. 

Running the same Oracle procedure to copy a file locally on ext3 (/tmp) completes in normal times, 48seconds for ~ 450 MB.

Using dd to write a file in the CIFS directory using /dev/zero as source gives around 3 MB/s writes.

Also the same procedure to create/copy files to a CIFS share worked fine in RHEL5.1 this behaviour appeared when we moved DB to a new platform running RHEL5.8.

Version-Release number of selected component (if applicable):
samba3x-client-3.5.10-0.110.el5_8

How reproducible:
Always.

 
Actual results:
Hours and hours of waittime in batchjob

Expected results:
Completion in a minute or two.

Additional info:
Adding PL/SQL code as attachment as an example.

Comment 1 Andreas Schneider 2012-09-26 10:20:52 UTC
Are you talking about copying a file on a share mounted with the Kernel CIFS client?

Comment 2 Johan Bergström 2012-09-26 11:20:06 UTC
Yes, I have mounted a Windows 2008R2 share on RHEL5.8, and trying to copy files to it using oracle pl/sql, FILE_UTL procedures, with horrible performance. While creating a file using dd from /dev/zero on the share works like a charm.

We have done some more tests and there is more debugging information available in this open internal case: 00711928.

Comment 4 Johan Bergström 2012-09-27 16:25:04 UTC
We just managed to solve this issue with changing mount options.

I changed it from:

username=<username>,password=<pass>,uid=oracle,gid=dba,file_mode=0664,noserverino

to:

username=<domain>\<username>,password=<pass>,rsize=8192,wsize=8192,domain=<domain>,uid=oracle,gid=dba

And suddenly performance is good as expected while writing from oracle procedures.

So the issue has something to do with one of the mount opts changed, and how FILE_UTL does writes I'm sure. I can do some more tests tomorrow and try to figure out exacly what mount option fixed the problem.

Comment 5 Johan Bergström 2012-09-27 18:46:01 UTC
I belive it has something to do with that I previously didn't have domain specified, neither by domain=<domain> or username=<domain>\<username>.

When I have the working mountpoints up and running and then mount the same CIFS share on a new mountpoint but with the the old mountopts, it's working good on that one too.

But I can see in /proc/mounts that it has domain=<domain> specified, even tho I didn't set it in the options on commandline, probably due to the fact that there already is a connection made to the same server with the same user active.

I am not sure what the difference in the way dd and oracle write their files, but I can imagine that dd just fopen() and push data until it's set to stop, while oracle maybe do alot of poll() / select() which cause domainname lookups on the CIFS volume for each little bit of data, causing bad performance. 

Just speculations... I haven't read any code related to this....

Comment 6 Johan Bergström 2012-09-28 07:26:09 UTC
Ran into the same problem again on another machine (other oracle RAC node).

Altho, this time I had the same mount options that worked on the first machine, didn't work on node2. But to solve it, I reloaded the cifs module with CIFSMaxBufSize=130048 and performance went up to normal again..

I am not sure if it's mount opts or something in the module that causes this performance problem after running some time..

Comment 7 Sumit Bose 2012-09-28 07:49:11 UTC
Since this ticket is about performance tuning of the cifs kernel module I change the component to kernel.

Comment 8 Sachin Prabhu 2012-09-28 10:12:43 UTC
Johan,

Can you please open a case with Red Hat support who will help with the data gathering required to debug this case.  You should also consider moving to RHEL 6 which includes async write support which improves write performance substantially over CIFS.

Sachin Prabhu

Comment 9 Johan Bergström 2012-09-28 10:15:16 UTC
Hello.

I have a case opened: 00711928.

I don't have the option to move to RHEL6 currently, we will look at that when we upgrade the database engine perhaps during Q2 2013.

Comment 10 Sachin Prabhu 2012-09-28 10:59:08 UTC
Johan,

Can you please get 2 tcpdumps taken at 
1) When the writes seem to work fine and 
2) The writes seem to have slowed down

Also can you pleas enable additional cifs debugging by using the command
echo 7 >/proc/fs/cifs/cifsFYI
and perform the write tasks. This needs to be run for the case where the writes are slow. The debug messages will be captured in /var/log/messages.

Please attach these files to the support case and the support engineer working on the case will forward those to me.

Sachin Prabhu

Comment 11 Johan Bergström 2012-09-28 12:05:36 UTC
Hello.

The data you ask for is available in the internal case since yesterday. I used debuglevel 1 in cifsFYI, not 7, hope thats OK.

There is only one tcpdump file, and it contains first a slow write (pl/sql), then a fast write (dd).

Comment 13 Andrius Benokraitis 2013-10-15 19:21:03 UTC
No additional minor releases are planned for Production Phase 2 in Red Hat Enterprise Linux 5, and therefore Red Hat is closing this bugzilla as it does not meet the inclusion criteria as stated in:
https://access.redhat.com/site/support/policy/updates/errata/#Production_2_Phase