From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 Description of problem: There is a lock ordering problem in the 2.4.9 series of kernels that can deadlock a system under heavy swap activity. In several crash dump images obtained, we found the following two processes deadlocked in slight variations on the following scenario: Process 1: ... do_exit -> __exit_mm -> mmput -> exit_mmap -> zap_page_range -> zap_pmd_range -> zap_pte_range -> tlb_remove_page -> __free_pte -> free_page_and_swap_cache -> delete_from_swap_cache_nolock ** swapper_space page cached locked via PAGECACHE_LOCK(page) ** -> __delete_from_swap_cache -> swap_free ** swap_free trying to get swaplock via swap_info_get ** Process 2: ... do_exit -> __exit_mm -> mmput -> exit_mmap -> zap_page_range -> zap_pmd_range -> zap_pte_range -> free_swap_and_cache ** swaplock locked via swap_info_get ** -> find_trylock_page ** find_trylock_page trying to lock swapper_space page cache lock via __PAGECACHE_LOCK ** This lock order mismatch appears to have been introduced with the greedyswap patch. Version-Release number of selected component (if applicable): kernel-2.4.9-e.48smp How reproducible: Didn't try Steps to Reproduce: This has not been reproduced yet in a controlled environment. The deadlocks have been seen intermittently on various systems (the install base we were working with is several thousand systems). In the systems analyzed, there appear to be a lot of processes exiting. Additional info:
hmmmm....find_trylock_page does: if (!spin_trylock(__PAGECACHE_LOCK(mapping, offset))) return NULL; So yes the locks are taken in reverse order, but it shoulnd't be a deadlock the way i read this code.
Jason, any code path that takes the locks in reverse order must release BOTH locks when the trylock fails, otherwise the code path taking locks in the forward order might not make progress... Could it be there's a code path left where we trylock, but forget to release the outer lock (inner lock in the "forward locking" case) ?
not that i can see, if the trylock fails we immediately drop the 'other' lock: mm/swapfile.c:326 if (swap_entry_free(p, SWP_OFFSET(entry)) == 1) page = find_trylock_page(&swapper_space, entry.val); swap_info_put(p);
Good catch. I can't believe that I missed that change. To be honest, the customer's systems that I have been chasing the original deadlocks on are running a much older kernel. This bug report was simply based on reviewing the latest release to see if this problem had already been seen and/or fixed. Since it appeared that the delete_from_swap_cache_nolock routine had the order backwards (at least with respect to newer kernels), I had been focusing on that code path, and totally missed that the trylock was added in the swapout patch.
That trylock patch has always been around...that code hasn't changed in rhel2.1
I believe that the kernel I was debugging was based on RH 7.1. I was reviewing the e.48 code, and thought that I was seeing the same changes (that's what I get for keeping too many windows open at once ;-)). Since I could not find any old references to this problem, I decided to file this bug report to warn you of the potential. Worse case, an obvious oversight on my part would be exposed (which is indeed what happened), but the fix would then be documented for other poor souls like me that have to debug moldy old code ;-). Thank you for the sanity check. This bug report should definitely be closed.
ok, thanks for the explanation....closing