Description of problem: Version-Release number of selected component (if applicable): 2.6.9-11.ELsmp #1 SMP x86_64 How reproducible: 100% Steps to Reproduce: 1. Install Oracle 10g R2 according to Metalink Note 339367.1 2. Create ASM-based database 3. Actual results: Installer hungs during 'shutdown immediate' (74% of total progress) Expected results: Installation should be smooth Additional info: We use the following hardware: IBM e326 server (2 * AMD64 CPU, 8Gb RAM) Qlogic QLA2342 FC card IBM TotalStorage DS400 Please not: this configuration is failing to create database on raw devices, as well. But it works fine on ext3 filesystem We have to install clustered database. So, the only option for us is to have ASM-based database running. If it can help, I filed the same problem with Oracle as TAR #4855265.992 Thanks, Boris
Unless you can prove otherwise, this looks like it should be filed with Oracle. Please re-open it, if there is a Red Hat specific issue. thanks.
What would you consider as a proof? When I use Oracle on RAW device and it dows not work I can assume that it is somewhere between Oracle and RedHat. What should I do to reopen it? I just did it!
Hello, Could you please answer my question? Regards, Boris
thanks for re-opening the bug, we are actively looking into it, and will update it with progress.
The "doesn't work on ext3" thing is something I wonder about. With raw devices mapping to chunks of the disk, I wonder if we are accessing parts of the disk that an ext3 filesystem doesn't touch. An ext3 would front-load the datafiles.
Hello, It seems to be working with kernel parameter "schedule=deadline". Regards, Boris
Dup of BZ 151368 per above testing.
Hi John, Could you please clue me about connection between /etc/dev.d/default (bugzilla bug# 151368) and problem between Oracle and IBM e326 in my case? Thanks, Boris
Sorry about that, it's 151368 in novell.bugzilla.com, patched file is cfq-iosched.c. So switching from the cfq to the deadline I/O schedule avoids the code with the problem. Regards, John
Yes, it's being tracked as issuetracker 88208 (private, unfortunately) but there is no bugzilla open for it yet
John, when you mention the issue with the cfq scheduler is this the issue in bug #184535. thanks.
i'll answer my own question in the affirmative. marking this as a duplicate of 184535. Boris, you can try a test kernel from http://people.redhat.com/~jbaron/rhel4/ which already has this fix so that you can use the default cfq elevator. thanks. *** This bug has been marked as a duplicate of 184535 ***
Hi Jason, Which kernel you want me to try SMP or SMP-DEVEL? If "devel" one then do I need to use special kernel parameters to collect more data? Thanks, Boris
Hello, I tried new kernel with both cfq and deadline schedulers and problem still exists. Regards, Boris
Reopening bug since it doesn't sound like it was a duplicate of the other one after all. Boris, you mentioned trying the test kernel with both the CFQ and deadline schedulers and the problem still exists. Does this mean that you've also seen this when using the deadline scheduler?
Hi Jeffrey, Yes, I tried 34.5smp kernel with both IO schedulers (one after another) and in both cases Oracle dbca stuck at same point (74%) when script tries to shutdown database. At this point alert.log files slightly different but in both cases DBWR process was waiting for something. It is exactly the same behaviour as I saw under kernel 11smp (with cfq scheduler) that was recommended by Oracle to use for x86_64 systems and Oracle 10g R2. Thanks, Boris
Novell BZ 151368 seems to be a private case on their site. John, can you provide some details about what that BZ is about?
Sorry, forgot to mention that: 1) I was able to use Oracle and ASM with deadline IO scheduler under kernels 11smp and 34smp (official releases) 2) 11smp 34.5smp kernel did not work for Oracle on raw devices with IO deadline scheduler Thanks, Boris
Oracle processes hang during I/O during database creation. The sysrq-t stack is: Feb 11 05:42:56 gemini2 kernel: oracle D 0000010103e05878 0 6528 Call Trace: <ffffffff8023c2a5>{elv_next_request+238} <ffffffff802f87b4>{io_schedule+37} <ffffffff801933a2>{__blockdev_direct_IO+2899} <ffffffff801563fb>{__generic_file_aio_read+266} <ffffffff801953b3>{sys_io_getevents+685} <ffffffff80193f4d>{__aio_run_iocbs+491} <ffffffff801944a2>{timeout_func+0} <ffffffff801320ea>{default_wake_function+0} <ffffffff80172737>{sys_pread64+86} <ffffffff8011003e>{system_call+126} Makes no sense that deadline is broken in 34.5. Could you attach the patch you used to create 34.5? Thanks, John
It might be interesting but under each test (kernel 11smp, 34smp) I saw kernel errors during database restart phase. But only ASM-based databases were restarting. Raw-based databases just hung indefinetely. Example of kernel errors. "34smp / Oracle ASM" test: Mar 21 14:27:07 gemini2 kernel: end_request: I/O error, dev sdb, sector 1552591 Mar 21 14:27:07 gemini2 kernel: end_request: I/O error, dev sdb, sector 1565391 Mar 21 14:27:07 gemini2 kernel: end_request: I/O error, dev sdb, sector 1569871 P.S. I asked Oracle to double check my config files for RAW test. Sorry if it is my mistake. P.P.S. I can not check deadline scheduler at the moment with ASM because there is no official release for development kernel. I will try to make it from source code. At the moment the only thing I can check is RAW device installation. Thanks, Boris
John, Please note that on Feb 11 I had default IO scheduler (cfq). Not deadline. I was told to try it on Mar 10 only. Thanks, Boris
Boris, please clarify. I don't see why you cannot check deadline and cfq on the same kernel, as they ship with every kernel. I suspect I'm just not understading you.
Hi Joel, I just updated TAR #4855265.992 with request to double check my config files for the "raw" test. I tested both IO schedulers under each kernel (11smp, 34smp, 34.4smp, 34.5smp). Because last two kernels are development therefore there is no official ASMlib release. I also have concern that I'm doing "raw" test incorrectly. I also noticed that ANY test produces kernel "end_request" errors but ASMlib recovers from them and "raw" does not. Hope it shed some light on my chaotic thoughts, Boris
3) I used the following raw config file: system=/dev/raw/raw1 sysaux=/dev/raw/raw2 users=/dev/raw/raw3 temp=/dev/raw/raw4 What do you mean "raw config file"? The only raw config file I know is /etc/sysconfig/raw, and it doesn't have the above format.
Hi Joel, Sorry for confusion. This raw config file is used in dbca to create new database. Proper name is "Raw devices mapping file" and it is used on step 6 of 12 (Storage options) of Oracle 10g R2 dbca-utility. It is NOT /etc/sysconfig/rawdevices. Regards, Boris
Hello, I tried kernel 2.6.9-34.7.ELsmp with "schedule=deadline" under "Oracle raw" test. dbca stopped at usual 74% but /var/log/messages showed just single kernel error instead of three: Mar 24 14:06:11 gemini2 kernel: end_request: I/O error, dev sdc, sector 32136351 I also see the following error from time to time: Mar 24 14:49:48 gemini2 kernel: warning: many lost ticks. Mar 24 14:49:48 gemini2 kernel: Your time source seems to be instable or some driver is hogging interupts Mar 24 14:49:48 gemini2 kernel: rip __do_softirq+0x4d/0xd0 This error existed in all kernels. Thanks, Boris
This request is not planned for inclusion in the next update. The decision is based on weighting the priority and number of requests for a component as well as the impact on the Red Hat Enterprise Linux user-base: other components are considered having higher priority and the number of changes we intend to include in update cycles is limited.
Product Management has reviewed and declined this request. You may appeal this decision by reopening this request.