Bug 91045

Summary: netconsole driver mishandles virtual memory page descriptors
Product: [Retired] Red Hat Linux Reporter: Allen Nuttle <anuttle>
Component: kernelAssignee: Ingo Molnar <mingo>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: anderson
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-07-05 07:59:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Allen Nuttle 2003-05-16 19:13:51 UTC
Description of problem:

I have tracked a problem in which "crash" claims that a matched
System.map/vmlinux and a netdump generated vmcore are mismatched to
problems in the netconsole driver.

In brief, the pages containing the symbols "smp_num_cpus" and
"linux_banner" were not being properly retrieved, along with other
pages.  This cause the contents of these locations to be invalid.

There are two reasons I see for not properly retrieving such pages:

  1) The code assumes that the physical to virtual address
      translation uses the identity function; in other words, the
      netdump server finds the number of phsical pages and the page
      size, then requests each phsical page, by specifying a page
      number in the interval [0, number of pages - 1] -- there is
      no attempt to return the kernel virtual address, as given by
      "page_address(page)"; at a minumum, the kva should be returned
      (possibly in the "info" field), better still would be to
      return the section mapping information, including the "flags"
      field so that the server is able to reflect this information
      in the vmcore file

  2) The code assumes that if "PageReserved(page)" is true, the page
      to be returned is to be zero filled, this actually typically
      is set for pinned pages; the right thing to do when this flag
      is set is to not map the page but just directly use the kva to
      access the data (after checking for a NULL kva)

These problems could be at least partially addressed with minor
changes but a proper fix would probably involve slight changes to
the wire-level protocol.  If would be good to detect free pages of physical 
memory and to not bother returning these (!page->reserved
&& page->count != 0).  Also, empty_zero_page could be treated as a
special case, though this wouldn't come to much.

In the end, the best thing to do would probably be to find a better
example of another implementation of the equivalent logic and to be
certain netdump handles things at least as well.

Here is a patch that takes care of the problem with reserved pages:


static void send_netdump_mem (struct net_device *dev, req_t *req)
{
	int i;
	char *kaddr;
	char str[1024];
	struct page *page;
	unsigned long nr = req->from;
	int nr_chunks = PAGE_SIZE/1024;
	reply_t reply;
        int mapped = 0;
	
	reply.nr = req->nr;
	reply.info = 0;
	if (req->from >= max_mapnr) {
		sprintf(str, "page %08lx is bigger than max page # %08lx!\n", 
nr, max_mapnr);
		reply.code = REPLY_ERROR;
		send_netdump_skb(dev, str, strlen(str), &reply);
		return;
	}
	page = mem_map + nr;
	if (PageReserved(page)) {
                kaddr = page_address(page);
                if (kaddr == NULL) {
		        page = ZERO_PAGE(0);
	                kaddr = (char *)kmap_atomic(page, KM_NETDUMP);
                        mapped = !0;
                }
        } else {
	        kaddr = (char *)kmap_atomic(page, KM_NETDUMP);
                mapped = !0;
        }


	for (i = 0; i < nr_chunks; i++) {
		unsigned int offset = i*1024;
		reply.code = REPLY_MEM;
		reply.info = offset;
		send_netdump_skb(dev, kaddr + offset, 1024, &reply);
	}

        if (mapped != 0) {
	        kunmap_atomic(kaddr, KM_NETDUMP);
        }
}


Version-Release number of selected component (if applicable):

netdump-0.6.8-2


How reproducible:

every time


Steps to Reproduce:

1. trigger netdump
2. run crash against the resulting vmcore


Actual results:

crash fails to initialize


Expected results:

crash is able to make use of vmcore


Additional info:

Comment 1 Ingo Molnar 2003-07-05 07:59:40 UTC
We've solved this for the next product. The solution is to determine whether the
page is RAM backed or not - and always dump RAM backed pages. (device-side pages
are dangerous to be dumped - even reads can cause side-effects.)

I think we should still dump freed pages. For certain types of bugs they can be
useful too - eg. use-after-free crashes - and generally it's possible to
determine the purpose a page was last used for, by looking at the contents.