Bugs, Bugs, Everywhere

The Thinkpad T22 and the Hercules eCafé are two machines that have almost nothing in common. Except that linux eats ACPI events on both.

Triggering it on the T22 was easy: waking up from S3 without the ac adapter attached worked in more than half the cases. Later versions were even more hungry, removing the power cord was enough to trigger Tux eating poor little Acpis. On the Hercules eCafé there is a similar problem concerning the power button. This time I won't debug it. It takes days and may just lead to bisecting a monolithic monster. Not again.

However, I will rantwrite about some of my experiences.

First of all, why the hell would somebody want to pull the power cord of his machines at all? Well, these machines are laptops and have batteries attached. They're designed to be mobile.

GNU/Linux is not designed to run on laptops (well, it does not feature a proper design at all[1]). Resource management from the dark age will, if power saving is enabled, torture your hard-drive[2] which may even increase power consumption due to frequent spinups. Wireless LAN? When it starts working again[3], I'll have to pick a fight with the most anti-usable way of configuration I've ever encountered[4]. Standby and hibernation? Works sometimes on some machines[5].

The last point I mentiond brings us back to the topic. I'm aware of driver developers claiming that the hardware is usually even more faulty than the software running on top of it. I don't know whether they're right, yet. But I have the feeling that there is some thruth in this argument.

However, how broken would hardware have to be so that consolefonts are garbled[6] (or garbled[7]), that the VGA output freezes on certain actions[8] and that the mouse can not be disabled in a sane manner[9]?

These are just few of the bugs and problems that I recently encountered. I even stopped reporting bugs until I'm paid for endless hours of reports and debugging in inefficient low-level languages. FYI, this is my package_mask.conf[10] that was started less than six months ago:

# Failed tests or compilations

=app-shells/zsh-4.3.9
#=sys-devel/binutils-2.18-r3 # I just need a version, however buggy!
=sys-devel/binutils-2.16.1-r3
=sys-devel/binutils-2.20
#=sys-devel/libtool-2.2.6b # Same here.
=sys-devel/libtool-2.2.6a
=sys-devel/gdb-7.0
#=dev-java/antlr-2.7.7 # Needed for my exams. *sigh*
=x11-libs/cairo-1.8.8
=dev-util/gtk-doc-1.11
=x11-libs/gtk+-2.18.6
=sys-devel/autogen-5.9.7
=dev-lang/python-2.6.4
=dev-lang/python-2.5.4-r3
=dev-lang/python-3.1.1-r1
=dev-lang/python-2.5.4-r4
=dev-lang/python-2.4.6
=app-text/xpdf-3.02-r2
=dev-db/sqlite-3.6.21
=app-shells/zsh-4.3.4-r1
=sys-devel/autogen-5.9.4
=sys-devel/gcc-4.3.4
=app-shells/zsh-4.3.10
=dev-libs/glib-2.22.4
=dev-scheme/guile-1.8.5-r1
=dev-scheme/guile-1.8.4

# Software broken themself

=net-fs/curlftpfs-0.9.2
<=net-wireless/wpa_supplicant-0.6.9

I don't trust such a community.


Post Scriptum

For the record, I'm aware of the hardware vendors policy and the hard time they give developers of free OSs. Many driver or hardware issues could've probably been prevented if they wouldn't render their hardware almost useless by not providing developer documentation.

Kudos to the OpenBSD crowd and to everyone else who's working on this issue.


References

  1. GNU is just a replica of Unix whose design is obsolete since at least the eighties. Additionally, but less important in this case, monolithic kernels (e.g. Linux) are obsolete since Engler's et al. works on exokernels. But this is another topic.
  2. See post LinuxAndLaptops.
  3. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=559040
  4. http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=4&chap=4
  5. http://www.google.de/search?q=hibernate+broken
  6. It's excused as being a "hardware issuse" on http://bugs.gentoo.org/43925 but I once found a similar report on the fedora forums without any solutions. Nobody has ever explained what this is. I don't believe it's a hardware problem since there are many different machines with this problem.
  7. Another bug that is not fixed since four years: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/60600. Especially annoying since the only way to avoid the SucurityHole is to switch to another VT.
  8. Of course in combination with using X. On my machine, multiple X servers are a way to definitely trigger a freeze. But yeah, it could be a driver/hardware issue...
  9. modprobe -r psmouse aka rmmod is obviously not an option, especially if you've made the magic of running multiple X servers without problems on one machine happen. Now that's a multiuser system...
  10. Paludis' configuration file to explicitly mask packages.
Posted 2010-04-02 12:10 Tags: bug

Linux and Laptops

Linux has a feature called laptop_mode. According to /usr/src/linux/Documentation/laptops/laptop-mode.txt, laptop_mode "is used to minimize the time that the hard disk needs to be spun up, to conserve battery power on laptops. It has been reported to cause significant power savings". However, it has also been reported that GNU/Linux causes harddrives to fail far too early[1].

The issue I'm referring to in this post is a bug[2] in the Ubuntu distribution of GNU/Linux. In short, Ubuntu did a

hdparm -B 1 /dev/sd*

which causes all block devices to use the most aggressive form of power saving possible. As a result, the head gets parked and the platters will spin down after a short idle period. Having the disk spin down early is good. Having it spin up and down again frequently is bad. And that's where laptop_mode comes into play.

Laptop_mode allows to automatically delay writes to the disk and to flush all data to be written once the disk has to be accessed. Sounds clever? No, it's a hack. It's an exemplary semi-solution since the disk is not only accessed by writing but also by reading[3].

Linux, like all other traditional systems, has the inconvenient bugfeature that nearly everything has to access the disk[4], sometimes described as the disk-ram-dichotomy[5]. While information that is read from the disk can be cached so that subsequent accesses won't trigger another drive spinup, caches have to be filled before being useful. GNU/Linux' semi-solution provides no means to accomplish this, leaving users with drives that frequently have to spin up. Oh, how should I know you want to access those files in your home dir you always use? And why should I automatically cache them when there is memory left and CPU not in active usage? This is a deep-rooted conceptual issue of traditional systems.

Judging from my own experience, laptop_mode does not even seem to delay writes properly. I got frequent spinups even when not working on the machine, and I'm using a quite minimal install[6] (traditional GUIs are so bad that even the Unix Shell is better). Well, it might just've been another bug or a hole in this kludgy concept.

But there is another point to mention. Let's assume there is an operating system that features proper transparent persistence[7]. Wouldn't it be nice to have the disk spin up in a user-defined interval like fifteen minutes (or in case of low battery) to write all changed data and to instantly stay shut afterwards until the end of the next interval? In other words, let's assume an operating system supporting aggressive power management. Wouldn't it be nice if the hardware supported such aggressive power management, too?

Some people think the above mentioned bug was not an operating system issue[8]. Blaming the hardware vendors for supporting aggressive power management when the OS misuses it is, however, a sign of blatant ignorance.

And not being able to temporarily disable the disk during normal system usage[9] is a sign of a bad operating system[10].


Footnotes

  1. http://paul.luon.net/journal/hacking/BrokenHDDs.html
  2. http://hardware.slashdot.org/article.pl?sid=07/10/30/1742258
  3. Additionaly, harddrives may get probed (e.g. SMART), which triggers spinups always. Now the combination of a SMART deamon and hdparm -B 1...
  4. The disk is commonly accessed rather directly (through filesystems) always when persistence is desired. All data is usually available stored only on the disk.
  5. http://c2.com/cgi-bin/wiki?ProgrammingWithoutRamDiskDichotomy
  6. This includes not using a SMART deamon or similar.
  7. Read: High-level transparent persistence and economic resource management, and that includes intelligent pre-caching data when possible and probably beneficial.
  8. https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695
  9. We assume there is enough RAM, which usually is the case.
  10. Of course it can be partially hacked onto GNU/Linux, it's basically mount -t tmpfs tmpfs $DIR with moving data around before and afterwards. The resulting workflow is, however, not straight-forward and the mechanism very error prone if one does not glue all those loose ends together that might cause data loss. A sophisticated solution can only be part of a sophisticated OS.
Posted 2010-03-30 11:57 Tags: bug

The Squeak VM is Broken

SqueakIsAnInsultToSmalltalk and conceptually broken, as I explained in that post. There are more flaws in its concept, but I think I already said enough. Let's focus on its implementation.

That the VM forces users to use a GUI and that it segfaults occasionally, despite being "mathematically perfect C code" as Alan Key said at OOPSLA97[1], was already mentioned. I just stepped over additional information. Squeak is part of Gentoo GNU/Linux's Portage tree (the package repository), but will be removed because of security issues[2]. Turns out the developers of the VM jammed several libraries directly into their own project -- most of them without changes to the codebase. I doubt that this had to be neccessary.

This approach is considered bad even by *nix people. It violates all principles of modular systems, has a severe impact on software maintenance, disables users from controlling which software (versions) to install, smells of Duplicate Code, and whatnot. Additionally, it increases compile time and disk space usage without providing any benefit.

As a result, the Squeak VM has joyfully imported several security issues from the projects it jammed into its own source directory. If the authors had paid attention to some concepts of software engineering (yes, even *nix has a few of them, probably more than the Squeak VM), vulnerable packages would've automatically been updated on the next system update. And if Squeak would've depended on a specific version, the package manager would've directly reported the resulting problems to the user. In any case, Squeak itsself would not have been an issue.

That's what you get for disregarding basic software engineering principles. Glad that somebody found out.


References

  1. I strongly recommend watching this talk, titled "The Computer Revolution hasn't happened yet". It's somewhere on the net. Try starting there: http://c2.com/cgi/wiki/?AlanKay
  2. http://bugs.gentoo.org/show_bug.cgi?id=247363
Posted 2010-03-07 12:52 Tags: bug

Squeak is an insult to Smalltalk

Squeak is, actually, the platform I used for getting in (practical) touch with Smalltalk. It was a very influencing and important experience. However, Squeak is not only stuck in the 1980s, it is also broken.

Smalltalk has always been a kind of operating system. Squeak is no exception and carries on the spirit of (pre) 1980: It is intended as a single-user system and has no security concept (object capability model would be easy to implement, though). AFAIK, program windows were originally invented in Smalltalk. To this day, Squeak adheres to this obsolete WIMP (windows, icons, menus, pulldowns) concept which limits direct interaction with objects on an otherwise rather live and reflexive system. Additionally, the GUI is hard-coded: you can't run Squeak properly without using its GUI. The "headless" parameter renders it anti-interactive (unusable, in other words). Not only is the user interface inflexible, its distinguishes between "novice mode" and "expert mode", which is anti-pedagogic. The reason why there supposedly must be an expert mode is that you can perform several actions that could destroy your Squeak image.

That brings us to the next issue: Squeak is irreversibly destructive. AFAIK, Objects in Squeak are not versioned, you can't simply restore a previous state, neither individual nor global, after you've screwed up. Screwing up is actually fairly easy. Sometimes the update of (what I believe seems to be) the basic image from within Squeak is enough to render it unusable. Even if you just want to uninstall a package you installed using Squeaks native package manager, you won't find an "uninstall" button or menu entry -- because there is no possibility to properly uninstall formerly installed software. (No, I'm not kidding.)

Well, since it is easy to irreversibly damage an image, at least Squeak doesn't automatically write it to disk (i.e. transparent persistence). Oh, wait, that's not a feature, that's a hack. I want transparent persistence!

What would be even better is error-free persistence. Squeak somehow managed to lose my original source code and seems to display silently reverse-compiled bytecode. Parameters are named tn where n is the parameter number and all comments are lost. Except for class comments -- these are referenced by RemoteStrings that point beyond end of file, neatly throwing an exception each time I try to access a class in the system browser. Considering these severe bugs, I suppose the next time the virtual machine segfaults, which it does once in a while, I should be happy -- at least I have only lost few data instead of corrupting it all.

Ultimately, even though a proper smalltalkish system would be far more sophisticated, powerful and efficient than a unix shell, I enjoy working on my command line more than trying to click my way through Squeak, having to tediously move, drag and drop windows and objects all over the place.

It's not even fast enough for my neurons.

Posted 2010-03-01 01:01 Tags: bug

On Debian

The Debian hackers have recently proven how serious they are about their expertise[1]. I have to mention that they also focus strongly on interaction design. For example: the Debian installer.

If your disk is encrypted, Debian knows what to do: Overwriting your partition encryption headers so that your keys and data are lost forever[2]. And, of course, Debian installer knows that you want this even on non-system partitions for which you provided the old passphrase and did not select "kill my data". Since Debian just knows what you want it does not need -- and, in fact, will not -- ask or print a warning or any kind of information.

This bug has been reported over two years ago[3] and is still not fixed. Well, real hackers keep backups anyway and its not like the system is intended for people that dare to use software without wanting to know how it works in detail. Which, in fact, is a necessary hackwould be a good idea on traditional systems, considering their fundamentally broken security concept.

Damn that culture.


References

  1. OpenSSL desaster, see http://lists.debian.org/debian-security-announce/2008/msg00152.html
  2. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=529343
  3. See merged bug http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451535
Posted 2010-02-14 17:51 Tags: bug

Installation of Broken Systems

Why would somebody want to install broken systems? Well, when there is no alternative to broken systems, you'll have to chose one of them.

Gentoo GNU/Linux is my broken system of choice. It gives me the most freedom and allows me to use the actual operating system, i.e. I can use software that just sucks instead of software that sucks more, and I get exposed to all of the pitfalls and unconceptualized semi-solution of unixoid systems. Valuable lessons.

And, of course, I had not to wait long for the next flaw.

The basic GNU system was installed and I had entered a chroot. Next was installation of various software using Portage, Gentoo's packet manager. Portage, or rather wget, refused to to download sources. The mentioned address resolution problems were a bad excuse, since downloading exactly the same urls by calling wget myself in the very same instance of bash worked.

Allright, I wanted to try Paludis, the other package mangler anyways since they claim they had concepts and well-factored code as opposed to Portage, which had neither[1]. And, lucky me, this one could fetch source packages. Installation could continue.

By the way, the machine to be equipped with Gentoo is a netbook featuring an AMD Geode LX800. Of course it would be a dumb idea to compile a whole OS on a single machine equipped with a 500MHz CPU. But that's what distribution is for -- and that's where the fun begins.

Unix in itsself has no support for distribution whatsoever. Distcc has been written to "address" this issue by providing a hackish solution for compilation on multiple machines. Once installed, the first workaround has to be applied[2].

Configuration is pretty straight-forward. There is /etc/env.d/02distcc, /etc/distcc/hosts and the program distcc-config. The latter is a workaround to hide the ugliness of multiple configuration mechanisms. (FYI, the list of configuration files I mentioned does not include the seperate configuration file for the included server).

To actually use distcc, $PATH has to be modified additionally. Now, compilation will be distributed.

At least in theory.

Distcc couldn't distribute the compilation because "failed to distribute". How smart. Okay, let's enable verbosity. Output now contains various information like time in milliseconds the compilation took, and I'm told that distcc could not distribute because "failed to distribute". Fuck you, I'm pretty aware of that.

After some time fighting configuration files and distcc-config, I noticed that both "distcc-config --get-hosts" and "distcc --show-hosts" return, regardless of configuration setting, "+zeroconf". Yeah, distributing compilation to a host named +zeroconf won't work. And what's the sanest message to report this problem when the maximum of verbosity is requested? "Failed to distribute"? Bullshit.


References

  1. See Paludis FAQ on http://paludis.pioto.org.
  2. See Gentoo Distcc Manual.
Posted 2010-02-14 17:02 Tags: bug

Blogging in HTML

Before even publishing this blog, I switched from plock to handwritten XHTML.

No big deal, but no features either. This is a temporary solution until plock outputs valid and proper XHTML -- or until I have written a better alternative.

Posted 2009-10-20 00:00 Tags: bug

C++ vs. logic

Ever declared a destructor pure virtual? Do so and try to compile and link it.

I'll spare you from trying it out. Despite being pure virtual, a destructor has to be implemented anyways.

You can have a pure virtual destructor, but it is not pure virtual. Thank you very much.

Posted 2009-10-10 13:00 Tags: bug

This blog, plock, and python distutils madness

Over the past few years I've learned quite a bit. Most of it boils down to criticism on traditional software because of insecurity, low usability, missing concepts and consistency, elitism and various other ills. Add a couple of program or operating system crashing bugs to the already large list of pitfalls and the only thing you can rely on is that you will primarily be fighting against the system instead of using it to get work done.

This blog is primarily intended to document my experience with such software. The goal is to not only criticize an isolated piece of software because it is somehow "weird" or "buggy" but to also look at the whole picture and to analyze the causes.

Traditional operating systems are obsolete -- and so is third party software that sticks to those standards or even worsens them.

I'm still on my way to (re)learn software and operating system design -- Unix and Windows hell has ruined so much. So this blog is also about the stuff I'm learning; about ideas, concepts or even projects that could lead to software that is user-friendly, more flexible and powerful, and reliable.

The software I'm using is plock (git://r0q.ath.cx/plock.git). On installing it, I've been bitten by another one of the bugs I described above.

I successfully executed setup.py, an installer of Python's distutils. But on running plock, I got "no module named plock". Turns out python's distutils will naively install the software with roots current umask. Publicly installing data using private permissions is a really good idea.

The default umask on a "secure" operating system like GNU is 022, meaning everyone can read root's files if the chmod is not changed after or on creation. But if you're sane, you might want to change it to 077, denying access for everyone but the owner. This should, of course, not affect software that is installed system-wide.

To top off that dumbness, python prints an error message that is just wrong. The module is there, but not accessible. 404 != 403.

It seems that not many care for the umask defaulting to insecure, otherwise this bug in python's distutils should have been noticed prior release.

Posted 2009-10-09 12:00 Tags: bug

You might want to check out the archive of posts tagged "bug".