Description of problem: During the tier testing for RHEL5, we've encountered oprofile on ia64 hanging on something like this: opcontrol --start-daemon --no-vmlinux --verbose 2>&1 | tee $TMPOUTPUT Firstly, we suspected the oprofile itself to hang for some reason, but when I looked closer at it, I've seen that it is the 'tee' command which remains sitting there forever, waiting for some input, even when the opcontrol itself already finished. Then I've dissected what is being done in starting daemon, and I'm suspecting oprofile daemon is not reopening it's streams after the fork, which causes the 'tee' command run forever - the pipe's entrace never gets closed, and thus tee cannot find out that it should end itself. If I do 'opcontrol --shutdown' from another terminal, the command in the first one ends - this supports his theory. The only weird fact is that we observe this behavior on ia64 only, while I would expect that it would be independent on the platform :/ Version-Release number of selected component (if applicable): oprofile-0.9.4-11.el5 but even the old (RHEL5.3) oprofile behaves like this How reproducible: always Steps to Reproduce: 1. opcontrol --deinit; opcontrol --init; opcontrol --start-daemon | tee log 2. wait indefinitely 3. from other terminal, run 'opcontrol --shutdown', and see the command finished Actual results: tee waiting more input, even when opcontrol script ended Expected results: like the other daemons - tee logs just the output of the launching script and ends with it, not logging any output from the daemon itself Additional info:
I attempted to reproduce the problem following the steps above, but was unable to replicate the behavior. This machine is subscribed to RHN and has RHEL 5.3 on it with the exception on newer kernel and the listed oprofile rpm. I wasn't able to reproduce the problem on this ia64 machine. I am setting up a red hat test machine with clean version of rhel5.4 What kernel verion was used for the testing (uname -a)?
I tried a fresh install of RHEL5.4-Server-20090819.0 on a red hat test system ia64 machine and still was unable to recreate the hang using the steps listed. The machine had the following rpms: kernel-2.6.18-164.el5 oprofile-0.9.4-11.el5 What was the original shell script that produced the problem on ia64. Was there something in there that intercepted signals?
I can with no problem reproduce the issue on a RHEL5.3 and RHEL5.4 testing box by simply running # opcontrol --deinit; opcontrol --init; opcontrol --start-daemon | tee log as stated above. I did the original investigation on RHEL5.4 box, so it is weird you cannot reproduce it. The versions: oprofile-0.9.4-11.el5.ia64 on both: 2.6.18-128.el5 (rhel5.3) 2.6.18-162.el5xen (some rhel5.4 candidate) The original script showing the behavior was the runtest.sh of /tools/oprofile/Sanity/opcontrol-options RHTS test, at line 189, doing 'opcontrol --start-daemon --no-vmlinux --verbose 2>&1 | tee $TMPOUTPUT' There is nothing signal-interfering that I know about.
Sample output of the reproducing line: # opcontrol --deinit; opcontrol --init; opcontrol --start-daemon --verbose | tee log Stopping profiling. Killing daemon. Unloading oprofile module Parameters used: SESSION_DIR /var/lib/oprofile LOCK_FILE /var/lib/oprofile/lock SAMPLES_DIR /var/lib/oprofile/samples CURRENT_SAMPLES_DIR /var/lib/oprofile/samples/current CPUTYPE ia64/itanium2 BUF_SIZE 500 BUF_WATERSHED 250 CPU_BUF_SIZE 1000 SEPARATE_LIB 0 SEPARATE_KERNEL 0 SEPARATE_THREAD 0 SEPARATE_CPU 0 CALLGRAPH 0 VMLINUX none KERNEL_RANGE XENIMAGE none XEN_RANGE executing oprofiled --session-dir=/var/lib/oprofile --separate-lib=0 --separate-kernel=0 --separate-thread=0 --separate-cpu=0 --events=CPU_CYCLES:18:0:150000:0:1:1, --no-vmlinux --verbose=all Events: CPU_CYCLES:18:0:150000:0:1:1, Using 2.6+ OProfile kernel interface. Running perfmon child on CPU0. Events: CPU_CYCLES:18:0:150000:0:1:1, Using 2.6+ OProfile kernel interface. Waiting on CPU0 Perfmon child up on CPU0 Daemon started. (... sitting here until ctrl-c or something ...)
Created attachment 359410 [details] Disconnect children running perfmon from stdin/stdout I originally misunderstood the desired behavior. I compared the ia64 behavior with the x86_64 and found how the "--start-daemon" option was suppose to behave. When ia64 oprofiled starts up it creates children processes to run perfmon. These children processes still have file descriptors open for stdin, stdout, and stderr. The attached patch closes those file descriptors to allow the tee operation to continue. This patch in not in the final state, but shows what is going wrong on the ia64 and the basic approach to fix it.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0283.html