Description of problem: Description of problem: * RHEL5.3-Server-20081008.nightly * Error appears through automate RHTS testing * Behavior appears s390 dependent The install of the latest RHTEL5.3 tree fails. It fails during, or immediately after formatting the DASDs. Version-Release number of selected component (if applicable): RHEL5.3-Server-20081008.nightly How reproducible: always Steps to Reproduce: 1. Try a nfs install from pxe with a ks.cfg file. Actual results: console-zseries: Logging into the hosting VM system to restart the VM guest console-zseries: logging in ... sending username LOGON AF ENTER PASSWORD (IT WILL NOT APPEAR WHEN TYPED): console-zseries: pasword prompted ... console-zseries: sending password ... console-zseries: Password authorized 00: z/VM Version 5 Release 2.0, Service Level 0501 (64-bit), 00: built on IBM Virtualization Technology 00: There is no logmsg data 00: FILES: 0003 RDR, 0001 PRT, NO PUN 00: RECONNECTED AT 16:01:06 EDT THURSDAY 10/09/08 console-zseries: Connected to VM 226 File send OK. 9266960 bytes transferred in 2.773 seconds. Transfer rate 3341.85 Kbytes/sec. 00: 0000003 FILES PURGED 00: RDR FILE 0069 SENT FROM AF PUN WAS 0069 RECS 048K CPY 001 A NOHOLD NO KEEP 00: RDR FILE 0070 SENT FROM AF PUN WAS 0070 RECS 0004 CPY 001 A NOHOLD NO KEEP 00: RDR FILE 0071 SENT FROM AF PUN WAS 0071 RECS 116K CPY 001 A NOHOLD NO KEEP 00: 0000003 FILES CHANGED audit: initializing netlink socket (disabled) type=2000 audit(1223582480.627:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API alg: No test for crc32c (crc32c-generic) ksign: Installing public key data Loading keyring io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) RAMDISK driver initialized: 16 RAM disks of 40000K size 4096 blocksize md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 Channel measurement facility using extended format (autodetected) TCP bic registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 Freeing unused kernel memory: 128k freed Starting the zSeries initrd to configure networking. Version is 1.1 dasd(eckd): 0.0.0191: 3390/0A(CU:3990/01) Cyl:25 Head:15 Sec:224 dasd(eckd): 0.0.0191: (4kB blks): 18000kB at 48kB/trk linux disk layout dasda:CMS1/ LI0191: dasda1 qdio: loading QDIO base support version 2 NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver qeth: loading qeth S/390 OSA-Express driver qeth: Device 0.0.0600/0.0.0601/0.0.0602 is a OSD Express card (level: 0757) with link type OSD_100 (portname: FOOBAR) qeth: Hardware IP fragmentation not supported on eth0 qeth: VLAN enabled qeth: Multicast enabled qeth: IPV6 enabled qeth: Broadcast enabled qeth: Using SW checksumming on eth0. qeth: Outbound TSO not supported on eth0 eth0 Link encap:Ethernet HWaddr 00:09:6B:1A:47:D7 inet addr:192.168.5.215 Bcast:192.168.5.255 Mask:255.255.255.0 inet6 addr: fe80::9:6b03:491a:47d7/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:4096 Metric:1 RX packets:1 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:84 (84.0 B) TX bytes:174 (174.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 20:01:27 INFO : inserted /tmp/cdrom.ko 20:01:27 INFO : load module set done 20:01:27 INFO : modules to insert scsi_mod sd_mod sr_mod 20:01:28 INFO : loaded scsi_mod from /modules/modules.cgz 20:01:28 INFO : loaded sd_mod from /modules/modules.cgz 20:01:28 INFO : loaded sr_mod from /modules/modules.cgz SCSI subsystem initialized 20:01:28 INFO : inserted /tmp/scsi_mod.ko 20:01:28 INFO : inserted /tmp/sd_mod.ko 20:01:28 INFO : inserted /tmp/sr_mod.ko 20:01:28 INFO : load module set done 20:01:28 INFO : modules to insert scsi_transport_fc zfcp 20:01:28 INFO : loaded scsi_transport_fc from /modules/modules.cgz 20:01:28 INFO : loaded zfcp from /modules/modules.cgz 20:01:28 INFO : inserted /tmp/scsi_transport_fc.ko 20:01:28 INFO : inserted /tmp/zfcp.ko 20:01:28 INFO : load module set done 20:01:28 INFO : modules to insert scsi_transport_iscsi libiscsi iscsi_tcp 20:01:28 INFO : loaded scsi_transport_iscsi from /modules/modules.cgz 20:01:28 INFO : loaded libiscsi from /modules/modules.cgz 20:01:28 INFO : loaded iscsi_tcp from /modules/modules.cgz Loading iSCSI transport class v2.0-724. 20:01:28 INFO : inserted /tmp/scsi_transport_iscsi.ko 20:01:28 INFO : inserted /tmp/libiscsi.ko iscsi: registered transport (tcp) 20:01:28 INFO : inserted /tmp/iscsi_tcp.ko 20:01:28 INFO : load module set done 20:01:28 INFO : modules to insert iscsi_ibft 20:01:28 INFO : load module set done 20:01:28 INFO : modules to insert dasd_mod 20:01:28 INFO : loaded dasd_mod from /modules/modules.cgz 20:01:28 INFO : inserted /tmp/dasd_mod.ko 20:01:28 INFO : load module set done 20:01:28 INFO : modules to insert dasd_diag_mod dasd_fba_mod dasd_eckd_mod 20:01:28 INFO : loaded dasd_diag_mod from /modules/modules.cgz 20:01:28 INFO : loaded dasd_fba_mod from /modules/modules.cgz 20:01:28 INFO : loaded dasd_eckd_mod from /modules/modules.cgz 20:01:28 INFO : inserted /tmp/dasd_diag_mod.ko 20:01:28 INFO : inserted /tmp/dasd_fba_mod.ko dasd(eckd): 0.0.303f: 3390/0A(CU:3990/01) Cyl:3339 Head:15 Sec:224 dasd(eckd): 0.0.303f: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasda:VOL1/ 0X303F: dasda1 dasda2 dasd(eckd): 0.0.313f: 3390/0A(CU:3990/01) Cyl:3339 Head:15 Sec:224 dasd(eckd): 0.0.313f: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasdb:VOL1/ 0X313F: dasdb1 20:01:29 INFO : inserted /tmp/dasd_eckd_mod.ko 20:01:29 INFO : load module set done 20:01:29 INFO : looking for usb controllers 20:01:29 INFO : no firewire controller found 20:01:29 INFO : Trying to detect vendor driver discs 20:01:29 INFO : found nothing 20:01:29 INFO : getting kickstart file 20:01:29 INFO : only have one network device: eth0 20:01:29 INFO : dnsservers is 172.16.52.28 20:01:29 INFO : setting specified hostname of z215.z900.redhat.com 20:01:29 INFO : doing kickstart... setting it up rtnl_link_change() failure in nl_set_device_mtu(): Operation not supported 20:01:29 ERROR : unable to set eth0 mtu to 4096 (code -5) 20:01:33 INFO : file location: http://192.168.33.237/ks/hosts/z215.z900.redha t.com/ks.cfg 20:01:33 INFO : transferring http://192.168.33.237//ks/hosts/z215.z900.redhat .com/ks.cfg to a fd 20:01:33 INFO : setting up kickstart 20:01:33 INFO : kickstartFromNfs 20:01:33 INFO : results of nfs, host is curly.devel.redhat.com, dir is /vol/e ngineering/devarchive/redhat/nightly/RHEL5.3-Server-20081008.nightly/5/s390x/os, opts are '(null)' 20:01:33 INFO : setting language to en_US.UTF-8 20:01:33 INFO : need to set up networking 20:01:33 INFO : going to pick interface 20:01:33 INFO : going to do getNetConfig 20:01:33 INFO : dnsservers is 172.16.52.28 20:01:33 INFO : setting specified hostname of z215.z900.redhat.com 20:01:33 INFO : doing kickstart... setting it up rtnl_link_change() failure in nl_set_device_mtu(): Operation not supported 20:01:34 ERROR : unable to set eth0 mtu to 4096 (code -5) 20:01:38 INFO : starting to STEP_URL 20:01:38 INFO : going to do nfsGetSetup 20:01:38 INFO : host is curly.devel.redhat.com, dir is /vol/engineering/devar chive/redhat/nightly/RHEL5.3-Server-20081008.nightly/5/s390x/os, opts are '(null )' 20:01:38 INFO : mounting nfs path curly.devel.redhat.com:/vol/engineering/dev archive/redhat/nightly/RHEL5.3-Server-20081008.nightly/5/s390x/os 20:01:39 INFO : can access /mnt/source/images/stage2.img 20:01:39 INFO : mntloop loop0 on /mnt/runtime as /mnt/source/images/stage2.im g fd is 8 Unable to identify CD-ROM format. VFS: Can't find an ext2 filesystem on dev loop0. 20:01:40 INFO : got url nfs://mnt/source/. 20:01:40 INFO : Loading SELinux policy md: raid6 personality registered for level 6 md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 20:01:42 INFO : inserted /tmp/raid456.ko 20:01:42 INFO : inserted /tmp/msdos.ko 20:01:42 INFO : inserted /tmp/jbd2.ko 20:01:42 INFO : inserted /tmp/crc16.ko 20:01:42 INFO : inserted /tmp/ext4dev.ko GFS2 (built Oct 4 2008 00:26:01) installed 20:01:42 INFO : inserted /tmp/gfs2.ko Lock_Nolock (built Oct 4 2008 00:26:10) installed 20:01:42 INFO : inserted /tmp/lock_nolock.ko device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel 20:01:42 INFO : inserted /tmp/dm-mod.ko 20:01:42 INFO : inserted /tmp/dm-zero.ko 20:01:42 INFO : inserted /tmp/dm-log.ko 20:01:42 INFO : inserted /tmp/dm-mirror.ko 20:01:42 INFO : inserted /tmp/dm-snapshot.ko 20:01:42 INFO : inserted /tmp/scsi_dh.ko device-mapper: multipath: version 1.0.5 loaded 20:01:42 INFO : inserted /tmp/dm-multipath.ko device-mapper: multipath round-robin: version 1.0.0 loaded 20:01:42 INFO : inserted /tmp/dm-round-robin.ko device-mapper: multipath emc: version 0.0.3 loaded 20:01:42 INFO : inserted /tmp/dm-emc.ko 20:01:42 INFO : inserted /tmp/dm-crypt.ko 20:01:43 INFO : inserted /tmp/aes_generic.ko 20:01:43 INFO : inserted /tmp/sha256.ko 20:01:43 INFO : load module set done 20:01:43 INFO : looking for USB mouse... 20:01:43 INFO : NFS install method detected, will use RHupdates/ 20:01:43 INFO : Running anaconda script /usr/bin/anaconda Running anaconda, the Red Hat Enterprise Linux Server system installer - please wait... eth0: no IPv6 routers present Running pre-install scripts Please wait while formatting drive dasda... [-- MARK -- Thu Oct 9 16:05:00 2008] dasd(eckd): 0.0.303f: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasda: detected capacity change from 0 to 2461777920 dasda:(nonl)/ : dasda1 dasda:VOL1/ 0X303F: In progress... Please wait while formatting drive dasdb... dasd(eckd): 0.0.313f: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasdb: detected capacity change from 0 to 2461777920 dasdb:(nonl)/ : dasdb1 dasdb:VOL1/ 0X313F: In progress... *** glibc detected *** /usr/bin/python: double free or corruption (!prev): 0x000 0000080d55e90 *** Expected results: This should install Additional info: I may have put this BZ in the wrong bucket (anaconda) But hopefully they will know who to pull in.
I just did an install of RHEL5.3-Server-20081008.nightly on z213 and it worked fine. So maybe this is an intermittent problem?
Are we calling this test infrastructure or a real bug? I don't think it's a bug in pyparted.
I think it is still to early to say. We still have conflicting data. We have see this on several systems now. z215.z900.redhat.com z209.z900.redhat.com z206.z900.redhat.com z202.z900.redhat.com Unless it is unique to certain s/390 systems, or it is a timing issue. The system that Bill used for his test was z213.z900.redhat.com The test infrastructure is pretty static. It may deviate a little based on the tests that are selected to run. Some packages may be added to the install but that is about it. Running a query in RHTS. Bills reserve workflow is the only one that passed. http://tinyurl.com/4rq4og
RHEL5.3-Server-20081012.nightly - Failed Full log: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4634043 <snip> Running anaconda, the Red Hat Enterprise Linux Server system installer - please wait... eth0: no IPv6 routers present Running pre-install scripts Please wait while formatting drive dasda... dasd(eckd): 0.0.303d: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasda: detected capacity change from 0 to 2461777920 dasda:(nonl)/ : dasda1 dasda:VOL1/ 0X303D: In progress... . . . . dasd(eckd): 0.0.363d: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasdg: detected capacity change from 0 to 2461777920 dasdg:(nonl)/ : dasdg1 dasdg:VOL1/ 0X363D: In progress... Please wait while formatting drive dasdh... dasd(eckd): 0.0.373d: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasdh: detected capacity change from 0 to 2461777920 dasdh:(nonl)/ : dasdh1 dasdh:VOL1/ 0X373D: In progress... *** glibc detected *** /usr/bin/python: double free or corruption (!prev): 0x000 0000080da5100 *** </snip>
I'm not sure this is a pyparted bug. Here is the only instance of free() in pyparted: void py_ped_exception_string_clear (void) { if (py_exception_string != NULL) { free (py_exception_string); py_exception_string = NULL; } } I think this is a libparted bug. I seem to remember hitting glibc double free errors under 5.1 or 5.2 (or both) and it being a strange libparted problem. I'm going to reassign to Joel because I think this is a libparted issue at this point. Joel this could be pyparted or libparted or both. I'm not sure yet, but I really don't think it's pyparted after looking through the code. I think libparted is segfaulting for some other reason.
I really doubt this is a regression related with parted because I reverted all the s390 changes that went into parted. Could this be an issue with python or glibc? Brok: Think we can setup a s390 machine to test parted and see how it partitions a dasd drive. If its a parted bug I would expect to see it while parted makes the partition. Im going to check the frees in libparted and see which one is not checked.
I watched some s390 installs last night and it doesn't happen everytime. When it does happen it happens right after all the low level formatting Please wait while formatting drive dasdg... dasd(eckd): 0.0.363d: (4kB blks): 2404080kB at 48kB/trk compatible disk layout dasdg: unknown partition table dasdg:VOL1/ 0X363D: which is where it would be partitioning the drives it just low level formatted.
After talking with Brock we think the biggest difference between RHTS and Kate is that RHTS does an initlabel on all drives before partitioning. I've scheduled a job on RHTS with 4 recipes doing no initlabel and 4 recipes doing initlabel. http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=32485 We'll wait for these results and see if there is anything.
Bill: comment 12, Seem very logical from my point of view. It has yet to reproduce in parted though. I still think patching up the dangling frees is the best thing I can do at this point. Dcantrell: commit 5241e33a5193f5d0a73d36853645ca0ac10e8530 for anaconda. If I'm reading the diff right, you changed the order in which we call dasdFmt and create the pedDevice and pedDisk. Any particular reason for this change? Do you think we would gain something changing this order back? Seeing the reported behavior it probably a race condition between anaconda/pyparted and parted. I'll still make the parted changes but I would still like to know the answer to my quesiton to dcantrell.
(In reply to comment #13) > After talking with Brock we think the biggest difference between RHTS and Kate > is that RHTS does an initlabel on all drives before partitioning. I've How exactly is this done, with dasdFmt? What is the exact line? Its relevant so I can test it with parted.
clearpart --all --initlabel --drives=dasda,dasdb,dasdc,dasdd,dasde,dasdf,dasdg,dasdh The above is all on one line. clearpart --all Thats the difference between the two recipes so far. without the --initlabel arg I've successfully installed 3 out of 4. Just waiting on the last one now. Then it will do the installs with the initlabel. I'm guessing I will have failures when it gets to those recipes.
The RHTS job finished: 4 recipes with no initlabel in them all passed. 2 out of 4 recipes with initlabel showed the double free error. I've asked Brock to confirm my findings.
(In reply to comment #14) > Dcantrell: > commit 5241e33a5193f5d0a73d36853645ca0ac10e8530 for anaconda. If I'm reading > the diff right, you changed the order in which we call dasdFmt and create the > pedDevice and pedDisk. Any particular reason for this change? Do you think we > would gain something changing this order back? This was a patch to support picking up DASDs with the FBA discipline ("VDSK" devices). I didn't mean to move around the order of dasdFmt and pedDevice/pedDisk, I think that was by accident. Here's what I think it should be: diff --git a/partedUtils.py b/partedUtils.py index 77b8511..2ed84c6 100644 --- a/partedUtils.py +++ b/partedUtils.py @@ -1160,11 +1160,11 @@ class DiskSet: # FIXME: need the right fix for z/VM formatted dasd if rhpl.getArch() == "s390" and drive[:4] == "dasd" and \ not self.isDisciplineFBA(drive): - dev = parted.PedDevice.get(deviceFile) - disk = parted.PedDisk.new(dev) - if self.dasdFmt(drive): raise LabelError, drive + + dev = parted.PedDevice.get(deviceFile) + disk = parted.PedDisk.new(dev) else: disk = labelDisk(deviceFile) except parted.error, msg: Like you said, make sure we get the dasdFmt() done before we call parted.
I've verified the behavior using the --initlabel line as bill specified in kate ... looks like we've identified the root cause ...
Can you please test with http://jgranado.fedorapeople.org/temp/initlabel.img to see if the problem persists. This has the change expressed in comment 18. This plus the changes to the parted free statements should do the trick. Be sure that the install tree that you are working with has anaconda version 11.1.2.142. If you have a nightly you should be ok.
Forgot to specify that initlabel.img is an update image and should be used with the update=http....
I have a patch for a couple of places where this *might* be happening. But I would like to test the anaconda change before building parted with the new changes. Bpeck: remeber that its updates=http://..... (dont forget the 's') do you think we can get a test today?
Job is scheduled now. I'll let you know in an hour or two..
The updates image you provided worked perfect with --initlabel.
moving this to modified... thx bpeck
oh, and it was an anaconda bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0164.html