Description of problem: I have tracked a problem in which "crash" claims that a matched System.map/vmlinux and a netdump generated vmcore are mismatched to problems in the netconsole driver. In brief, the pages containing the symbols "smp_num_cpus" and "linux_banner" were not being properly retrieved, along with other pages. This cause the contents of these locations to be invalid. There are two reasons I see for not properly retrieving such pages: 1) The code assumes that the physical to virtual address translation uses the identity function; in other words, the netdump server finds the number of phsical pages and the page size, then requests each phsical page, by specifying a page number in the interval [0, number of pages - 1] -- there is no attempt to return the kernel virtual address, as given by "page_address(page)"; at a minumum, the kva should be returned (possibly in the "info" field), better still would be to return the section mapping information, including the "flags" field so that the server is able to reflect this information in the vmcore file 2) The code assumes that if "PageReserved(page)" is true, the page to be returned is to be zero filled, this actually typically is set for pinned pages; the right thing to do when this flag is set is to not map the page but just directly use the kva to access the data (after checking for a NULL kva) These problems could be at least partially addressed with minor changes but a proper fix would probably involve slight changes to the wire-level protocol. If would be good to detect free pages of physical memory and to not bother returning these (!page->reserved && page->count != 0). Also, empty_zero_page could be treated as a special case, though this wouldn't come to much. In the end, the best thing to do would probably be to find a better example of another implementation of the equivalent logic and to be certain netdump handles things at least as well. Here is a patch that takes care of the problem with reserved pages: static void send_netdump_mem (struct net_device *dev, req_t *req) { int i; char *kaddr; char str[1024]; struct page *page; unsigned long nr = req->from; int nr_chunks = PAGE_SIZE/1024; reply_t reply; int mapped = 0; reply.nr = req->nr; reply.info = 0; if (req->from >= max_mapnr) { sprintf(str, "page %08lx is bigger than max page # %08lx!\n", nr, max_mapnr); reply.code = REPLY_ERROR; send_netdump_skb(dev, str, strlen(str), &reply); return; } page = mem_map + nr; if (PageReserved(page)) { kaddr = page_address(page); if (kaddr == NULL) { page = ZERO_PAGE(0); kaddr = (char *)kmap_atomic(page, KM_NETDUMP); mapped = !0; } } else { kaddr = (char *)kmap_atomic(page, KM_NETDUMP); mapped = !0; } for (i = 0; i < nr_chunks; i++) { unsigned int offset = i*1024; reply.code = REPLY_MEM; reply.info = offset; send_netdump_skb(dev, kaddr + offset, 1024, &reply); } if (mapped != 0) { kunmap_atomic(kaddr, KM_NETDUMP); } } Version-Release number of selected component (if applicable): netdump-0.6.8-2 How reproducible: every time Steps to Reproduce: 1. trigger netdump 2. run crash against the resulting vmcore Actual results: crash fails to initialize Expected results: crash is able to make use of vmcore Additional info:
We've solved this for the next product. The solution is to determine whether the page is RAM backed or not - and always dump RAM backed pages. (device-side pages are dangerous to be dumped - even reads can cause side-effects.) I think we should still dump freed pages. For certain types of bugs they can be useful too - eg. use-after-free crashes - and generally it's possible to determine the purpose a page was last used for, by looking at the contents.