Description of problem: ----------------------- On our system of Linux Advance Server 4.6, warm boot test will hang from time to time. By probing the system with "Intel In-Target Probe" and checking with "System.map-2.6.9-67.ELsmp", we found that the CPU falled into a forever loop in __delay(), linux-2.6.9-final/arch/x86_64/lib/delay.c . The kernel source code is listed as follows: void __delay(unsigned long loops) { unsigned long bclock, now; rdtscl(bclock); do { rep_nop(); rdtscl(now); } while((now-bclock) < loops); } And the corresponding assembly code is listed below: rdtsc mov rcx, rax loop: pause rdtsc sub rax, rcx cmp rax, rdi jb loop ret This piece of code may lead problem on TSC value wrap-up. For example, if the rcx (bclock) is 0xfffffffffffffffe in the beginning, and the next rax (now) are 3, 15, 27 .... and so on. The system may hang up on __delay() . Version-Release number of selected component (if applicable): ------------------------------------------------------------- Linux kernel version : 2.6.9-67 How reproducible: ----------------- Just repeat to warm boot via cron job. Steps to Reproduce: 1. add "*/5 * * * * date > reboot.log; /sbin/reboot" into crontab
Created attachment 312012 [details] Intel In-Target Probe snapshot
Cheng, please attach a sysreport from the system. Thanks, P.
I came up with a proposed patch and started testing and came across a similar issue which appears to have been resolved upstream. __delay can be restarted on another processor. When this happens the values of bclock and now are bogus and this causes wackiness within the __delay function. I'll submit a patch for both issues. P.
The more I look at this issue, the more I agree that while this is a bug I wonder if this is really the issue the reporter is hitting. The tsc is a 64-bit counter linked to the frequency of the CPU. For simplicity, let's assume that the CPU frequency is 2.0 GHz. That means the tsc will wrap every 4G X 2 seconds (64 bits divided by 31 bits). AFAICT, that is roughly 2.3 million hours, or ~ 100,000 days, or 200 years. (If I have my math right) I suppose that quantatw could have run a system this long ;). IMO, it is much more likely that the quantatw ran into the strange issue I ran into -- the __delay was suspended and restarted on another CPU. P.
Marking as NOTABUG. P.
1. After refering linux-2.6.26/arch/x86/lib/delay_64.c to modify __delay(), the system passed warm boot testing for more than 5 days. While it will hang up every 2~3 dayes warm boot testing before. The code is listed below for convenience: void __delay(unsigned long loops) { unsigned bclock, now; int cpu; preempt_disable(); cpu = smp_processor_id(); rdtscl(bclock); for (;;) { rdtscl(now); if ((now - bclock) >= loops) break; /* Allow RT tasks to run */ preempt_enable(); rep_nop(); preempt_disable(); /* * It is possible that we moved to another CPU, and * since TSC's are per-cpu we need to calculate * that. The delay must guarantee that we wait "at * least" the amount of time. Being moved to another * CPU could make the wait longer but we just need to * make sure we waited long enough. Rebalance the * counter for this CPU. */ if (unlikely(cpu != smp_processor_id())) { loops -= (now - bclock); cpu = smp_processor_id(); rdtscl(bclock); } } preempt_enable(); } 2. Since all the series of server machines under developing are scheduled to perform other tests. I am sorry that i could not gather sysreport.
Fred, are you saying that you are hitting the issue described in comment #4? That switching between CPUs is causing your problem? I'm confused -- because your initial bug report implies that you thought you had a tsc overflow issue. P.
In the beginning, we guess the problem is due to TSC value wrap-up. But after bug re-producing and investigation, we switch to the direction as described in http://www.chineselinuxuniversity.net/articles/12762.shtml . Therefore, we modify __delay() and verify it. PS. By probing with ITP, the BSP is in __delay() and the other three AP are all in smp_really_stop_cpu(). In principle, the other processors will not restart __delay(). void smp_stop_cpu(void) { /* * Remove this CPU: */ cpu_clear(smp_processor_id(), cpu_online_map); local_irq_disable(); disable_local_APIC(); local_irq_enable(); } static void smp_really_stop_cpu(void *dummy) { smp_stop_cpu(); for (;;) asm("hlt"); }
Fred, AFAICT, in order for this to happen, CONFIG_PREEMPT must be on in the .config -- it isn't in RHEL5. So I suspect that there is something else going on. Could you attach your test program to this BZ? I'll run the test to see if I can hit the issue. P.
Hi Prarit, The OS version in issue is RedHat AS 4 update 6 rather than RHEL5. As i check the system files, CONFIG_PREEMPT in .config is off. Our test procedure is via crontab: */5 * * * * echo "reboot test"; date > reboot.log; /sbin/reboot BTW, in our another project (different hardware architecture) SLES 10 also hang up on __delay() after about 9 days of warm-boot tests.
This seems like a BIOS issue. The passoff back to the firmware (when leaving the OS during a reboot) seems incomplete or broken and as a result the hardware may not be re-initialized properly for the next boot. Can we try some different reboot flags to see if it triggers a proper hardware reset during reboot? Can you try the following boot args and see if the issue goes away? Im guessing a you want to use the 'cold' flag since the warm reboot hangs. Try boot with each (one at a time), then try a reboot and see if it hangs: reboot=hard,cold reboot=triple,cold reboot=bios,cold reboot=kbd,cold For point of reference, here are all the possible flags for RHEL4 (for experimentation purposes): /* reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old] | [a]cpi bios Use the CPU reboot vector for warm reset warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbd Use the keyboard controller. cold reset (default) acpi Use the ACPI reset mechanism defined in the FADT */