NetBSD/sparc 2.0beta experience

This is a log of my experience with a SPARCstation 10 that has been upgraded from a (more or less) smoothly running NetBSD 1.6.2 installation to a self-built 2.0beta snapshot.

The ss10 is the closest thing to a production machine that I have here.

Hardware setup

Software setup

Installed services

"Issues"

  1. A freeze during a nightly Amanda backup and /etc/daily run.
  2. A panic when I tried to set up the Prism wi adapter. This has been reported as PRs kern/25466 and kern/25604 in the meantime.
  3. Another panic when I pulled out the Prism card during shutdown (no stacktrace), reported as kern/25721.
  4. Two crashes under light load, most likely because the automatically dimensioned NKMEMPAGES (see options(4)) is too small. I have never seen this on NetBSD 1.6 and before.
  5. A panic during a make clean run with the pkgsrc tree and /var/obj union-mounted to /usr/src.
  6. Another freeze in the next night, shortly before /etc/daily kicked in, that I do not have a stack trace of. A kernel build was running at the time.
  7. The bridge(4) appears to be broken in 2.0beta. With a setup that bridges just fine between WLAN (wi) and ethernet (le) on a SPARCstation 2 with a 1.6ZG kernel on top of a 1.6ZC userland, 2.0beta propagates only (BOOTP) broadcasts from wi to le, and nothing in the other direction.
  8. After a panic, getting a core dump is impossible most of the time, since a sync or a reboot in ddb only leads to more panics, and all too often to a final lockup that requires power-cycling the machine.
  9. Several lockups during shutdown, mostly after unmounting filesystems (no stack trace)

Summary

NetBSD/sparc 2.0beta is not exactly a release candidate, yet. While many issues are long-standing (softdep, lockups under high I/O-load with hme and fas), there are some that have not been present in the netbsd-1-6 branch or even in pre-2.0 netbsd-current.

And then...

[ 2004-05-02 ]
A discussion on port-sparc where Manuel Bouyer reported his dual-processor ss10 as stable, whereas Bernd Sieker tells a different story. His setup is similar to mine: An SMP ss10 does dsl/pppoe routing equipped with an additional hme sbus card.

[ 2004-05-03 ]
My ss10 is running a uniprocessor kernel at the moment which has survived the nightly /etc/daily and Amanda run (local disks only) while building a full distribution. The next step from here will be a backup from local disks and clients -- I strongly suspect that the two heads on the SunSwift card (hme and fas) occasionally bump into each other.

Network traffic appears to be the sore spot. 2.0beta on the ss10 can take high cpu and disk activity just fine, but as soon as network activity is added to the mix, the box goes ballistic . And this one is even reproducible .

[ 2004-05-04 ]
This night's routine tasks brought yet another m_copydata related panic . I shall send in a PR when I find the time...
(Reported as kern/25702 -- kern/25608 may be related.)

A no-smp kernel from May 2nd gave me 12 days of uptime since I carefully avoided network activity from ftp and Amanda. For the latter, either clients backing up to the holding disk, or streaming local disks and holding disk contents to tape are fine. Mixing the two tends to lock up the machine.

[ 2004-05-20 ]
A kernel built from yesterday's cvs lets me nicely reproduce the ipnat panic : ftp from a nat client to (e.g.) ftp.netbsd.org, 'bye' and panic the nat router.

Good news and bad...

[ 2004-05-25 ]
Suspicious nfs client behaviour - work on nfs-mounted pkgsrc just hangs - and then an attempted reboot which ends up in a panic.

Early in the /etc/daily cronjob yet another ipnat related crash, probably as one of the client machines tried to download-vulnerability-list from ftp://ftp.netbsd.org.

[ 2004-05-26 ]
Increased stability if I keep my hands off the network. Last night, the machine survived the daily cronjob and an Amanda run from several clients to tape just fine, and it built a 2.0 release with -j3 at the same time. That second cpu lowers the build time nicely...
During reboot, the machine hung after unmounting the filesystems, as usual, and had to be rebooted from db> . I managed to panic the resulting kernel easily by pulling the PRISM 2.5 wi card.

[ 2004-06-15 ]
kern/25721, the pull-wi-card-and-panic issue, has been fixed by David Young in the meantime, as has PR 25604 . Thanks, David! -- kern/25702 , on the other hand, is still painfully present.

There was a union-mount panic in the meantime of which I could not get (and keep) a stack trace.

It is still near impossible to reboot a dual-cpu sparc without breaking into the kernel debugger, and even then the machine sometimes locks up hard and has to be power-cycled. Usually it hangs after unmounting the disks.

netbsd-2-0 builds fail on my sparc ATM.

[ 2004-06-20 ]
Some good news for a change: A kind friend (thanks, Thomas!) has slapped a "SUNW,spif" serial multiport adapter complete with cable and breakout box onto my table, and many more SUN things that come not into this story. I booted a -current GENERIC kernel on my rusty SPARCstation IPC and found that NetBSD does not support the card. After some googling I came across the OpenBSD driver, and it turned out to be a drop-in extension (well, almost ;). Try out the results of a Saturday night, if you're interested.

[ 2004-06-23 ]
I am trying out lfs for an obj dir. My standard layout is to union-mount a source dir beneath the mountpoint, and an obj dir above it. When I tried that with the lfs filesystem, the 'config KERNEL' paniced the system.

[ 2004-06-24 ]
Union mounts and lfs do not mix well . Yet another two panics, I managed to get a stack trace from the second. As always with an MP kernel, dumping the core is impossible, this time resulting in an endless stream of xcall(cpu1,0xf02132c8): couldn't ping cpus: cpu0 and a hard lockup. I have send-pr'ed the issue as kern/26043, and turned /var/obj back into an ffs partition for now. It's no good hunting three bugs at a time...

 

Hauke Fath (hauke [at] causeuse dot org)
Last modified: Tue Jul 6 13:13:45 CEST 2004
 
EuroBSDCon 2004