Description of problem: The dlm recovery code calls schedule() in a lot of lock loops to prevent: 1) the softlock watchdog from going off 2) openais cluster membership messages for being delayed past the configured timeout We want to investigate: - will cond_resched() work as well, and more efficiently than schedule()? (I expect so) - exactly what loops are taking so long (watchdog is 10 sec) and why? are there really that many locks and/or are we doing that much work on each one that it can take 10 sec? - why does this seem to appear on ia64 regularly and other arch's rarely? Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Dean, do you still get softlockups on ia64?