Red Hat Bugzilla – Bug 859131
Performance problem copying/writing file on CIFS using Oracle FILE_UTL procedures.
Last modified: 2013-10-15 15:21:03 EDT
Created attachment 614966 [details]
Description of problem:
When using Oracle FILE_UTL procedures to copy a file in a CIFS directory to a new file on the same CIFS share performance is horrible. We're talking 1-10 kb/s.
Running the same Oracle procedure to copy a file locally on ext3 (/tmp) completes in normal times, 48seconds for ~ 450 MB.
Using dd to write a file in the CIFS directory using /dev/zero as source gives around 3 MB/s writes.
Also the same procedure to create/copy files to a CIFS share worked fine in RHEL5.1 this behaviour appeared when we moved DB to a new platform running RHEL5.8.
Version-Release number of selected component (if applicable):
Hours and hours of waittime in batchjob
Completion in a minute or two.
Adding PL/SQL code as attachment as an example.
Are you talking about copying a file on a share mounted with the Kernel CIFS client?
Yes, I have mounted a Windows 2008R2 share on RHEL5.8, and trying to copy files to it using oracle pl/sql, FILE_UTL procedures, with horrible performance. While creating a file using dd from /dev/zero on the share works like a charm.
We have done some more tests and there is more debugging information available in this open internal case: 00711928.
We just managed to solve this issue with changing mount options.
I changed it from:
And suddenly performance is good as expected while writing from oracle procedures.
So the issue has something to do with one of the mount opts changed, and how FILE_UTL does writes I'm sure. I can do some more tests tomorrow and try to figure out exacly what mount option fixed the problem.
I belive it has something to do with that I previously didn't have domain specified, neither by domain=<domain> or username=<domain>\<username>.
When I have the working mountpoints up and running and then mount the same CIFS share on a new mountpoint but with the the old mountopts, it's working good on that one too.
But I can see in /proc/mounts that it has domain=<domain> specified, even tho I didn't set it in the options on commandline, probably due to the fact that there already is a connection made to the same server with the same user active.
I am not sure what the difference in the way dd and oracle write their files, but I can imagine that dd just fopen() and push data until it's set to stop, while oracle maybe do alot of poll() / select() which cause domainname lookups on the CIFS volume for each little bit of data, causing bad performance.
Just speculations... I haven't read any code related to this....
Ran into the same problem again on another machine (other oracle RAC node).
Altho, this time I had the same mount options that worked on the first machine, didn't work on node2. But to solve it, I reloaded the cifs module with CIFSMaxBufSize=130048 and performance went up to normal again..
I am not sure if it's mount opts or something in the module that causes this performance problem after running some time..
Since this ticket is about performance tuning of the cifs kernel module I change the component to kernel.
Can you please open a case with Red Hat support who will help with the data gathering required to debug this case. You should also consider moving to RHEL 6 which includes async write support which improves write performance substantially over CIFS.
I have a case opened: 00711928.
I don't have the option to move to RHEL6 currently, we will look at that when we upgrade the database engine perhaps during Q2 2013.
Can you please get 2 tcpdumps taken at
1) When the writes seem to work fine and
2) The writes seem to have slowed down
Also can you pleas enable additional cifs debugging by using the command
echo 7 >/proc/fs/cifs/cifsFYI
and perform the write tasks. This needs to be run for the case where the writes are slow. The debug messages will be captured in /var/log/messages.
Please attach these files to the support case and the support engineer working on the case will forward those to me.
The data you ask for is available in the internal case since yesterday. I used debuglevel 1 in cifsFYI, not 7, hope thats OK.
There is only one tcpdump file, and it contains first a slow write (pl/sql), then a fast write (dd).
No additional minor releases are planned for Production Phase 2 in Red Hat Enterprise Linux 5, and therefore Red Hat is closing this bugzilla as it does not meet the inclusion criteria as stated in: