This is a log of my experience with a SPARCstation 10 that has been upgraded from a (more or less) smoothly running NetBSD 1.6.2 installation to a self-built 2.0beta snapshot.
The ss10 is the closest thing to a production machine that I have here.
'hme' 100BaseT Ethernet,
'fas' fast-wide SCSI controller)
'nell' with optional
Prism2.5 WLAN card 'wi' /etc/daily run. wi adapter. This has been
reported as PRs
kern/25466 and
kern/25604 in the meantime. wi) and ethernet (le) on a
SPARCstation 2 with a 1.6ZG kernel on top of a 1.6ZC userland,
2.0beta propagates only (BOOTP) broadcasts from wi
to le, and nothing in the other direction. NetBSD/sparc 2.0beta is not exactly a release candidate,
yet. While many issues are long-standing (softdep, lockups under
high I/O-load with hme and fas), there
are some that have not been present in the netbsd-1-6 branch or
even in pre-2.0 netbsd-current.
[ 2004-05-02 ]
A discussion on
port-sparc where
Manuel Bouyer reported his dual-processor ss10 as stable,
whereas
Bernd Sieker tells a different story. His setup is
similar to mine: An SMP ss10 does dsl/pppoe routing equipped with an
additional hme sbus card.
[ 2004-05-03 ]
My ss10 is running a
uniprocessor kernel at the moment which has survived the
nightly /etc/daily and Amanda run (local disks only)
while building a full distribution. The next step from here will
be a backup from local disks and clients -- I strongly suspect
that the two heads on the SunSwift card (hme and
fas) occasionally bump into each other.
Network traffic appears to be the sore spot. 2.0beta on the ss10 can take high cpu and disk activity just fine, but as soon as network activity is added to the mix, the box goes ballistic . And this one is even reproducible .
[ 2004-05-04 ]
This night's routine tasks brought yet
another m_copydata related
panic . I shall send in a PR when I find the time...
(Reported as
kern/25702 --
kern/25608 may be related.)
A no-smp kernel from May 2nd gave me 12 days of uptime since I carefully avoided network activity from ftp and Amanda. For the latter, either clients backing up to the holding disk, or streaming local disks and holding disk contents to tape are fine. Mixing the two tends to lock up the machine.
[ 2004-05-20 ]
A kernel built from yesterday's cvs lets me nicely reproduce
the ipnat panic : ftp from a nat client
to (e.g.) ftp.netbsd.org, 'bye' and panic the nat router.
Good news and bad...
[ 2004-05-25 ]
Suspicious nfs client behaviour - work on
nfs-mounted pkgsrc just hangs - and then an attempted
reboot which ends up in a
panic.
Early in the /etc/daily cronjob yet another ipnat related crash, probably as one of the client machines tried to download-vulnerability-list from ftp://ftp.netbsd.org.
[ 2004-05-26 ]
Increased stability if I keep my hands off the network.
Last night, the machine survived the daily cronjob
and an Amanda run from several clients to tape just fine, and it
built a 2.0 release with -j3 at the same time. That second cpu
lowers the build time nicely...
During reboot, the machine hung after unmounting the filesystems,
as usual, and had to be rebooted from db> .
I managed to panic the resulting
kernel easily by pulling the
PRISM 2.5 wi card.
[ 2004-06-15 ]
kern/25721, the pull-wi-card-and-panic issue, has been
fixed by David Young in the meantime, as has
PR 25604 . Thanks, David! --
kern/25702 , on the other hand, is still painfully present.
There was a union-mount panic in the meantime of which I could not get (and keep) a stack trace.
It is still near impossible to reboot a dual-cpu sparc without breaking into the kernel debugger, and even then the machine sometimes locks up hard and has to be power-cycled. Usually it hangs after unmounting the disks.
netbsd-2-0 builds fail on my sparc ATM.
[ 2004-06-20 ]
Some good news for a change: A kind friend (thanks, Thomas!) has
slapped a "SUNW,spif" serial multiport adapter complete with
cable and breakout box onto my table, and many more SUN things
that come not into this story. I booted a -current GENERIC
kernel on my rusty SPARCstation IPC and found that NetBSD does
not support the card. After some googling I came across the
OpenBSD driver, and it turned out to be a drop-in extension
(well, almost ;). Try out the
results
of a Saturday night, if you're interested.
[ 2004-06-23 ]
I am trying out lfs for an obj dir. My standard layout is to
union-mount a source dir beneath the mountpoint, and an obj dir
above it. When I tried that with the lfs filesystem, the 'config
KERNEL' paniced the
system.
[ 2004-06-24 ]
Union mounts and lfs do not mix well
. Yet another two panics, I managed to get a stack trace
from the second. As always with an MP kernel, dumping the core
is impossible, this time resulting in an endless stream of
xcall(cpu1,0xf02132c8): couldn't ping cpus: cpu0 and a
hard lockup. I have send-pr'ed the issue as
kern/26043, and turned /var/obj back into an
ffs partition for now. It's no good hunting three bugs at a
time...