agg is a news aggregator that stays focused on its goal, namely reading a feed and creating a representation that can be worked with in the whole system. The process is straight-forward: store all items[1] that have a publication date newer than that of the latest item received previously.
That works remarkably well in the common case. A less common case is feeds without publication dates for their items. In fact, the pubDate tag is not required according to the RSS 2.0 specification. What to do in this case?
The solution is simple: if the author ignored publication dates, the aggregator shall do so, too. This has the inconvenient effect that once you've read all items and probably deleted them afterwards, running agg again on that feed will lead to all these items being there again!
So we need some form of caching[2]. However, such a rare case that hasn't much to do with agg's job anyways should not end up in the same piece of software. Here are my tools for the output format of versions 0.2:
agg_filter to remove an item iff it has already been cached:
ITEM_NAME="$1"
FEED_PATH="`dirname "$ITEM_NAME"`"
FEED_NAME="`basename "$FEED_PATH"`"
grep -q "`agg_hash "$ITEM_NAME"`" ".$FEED_NAME.cache"
[ $? -eq 0 ] && nomtime "rm -rf" "$ITEM_NAME"
agg_cache to cache an item:
ITEM_NAME="$1"
FEED_PATH="`dirname "$ITEM_NAME"`"
FEED_NAME="`basename "$FEED_PATH"`"
agg_hash "$ITEM_NAME" >> ".$FEED_NAME.cache"
agg_hash to create a hash of an item:
cat "$1/title" "$1/desc" "$1/link" 2>/dev/null | sha1sum | awk '{ print $1; }'
With the agg_each script posted in
using agg 0.2, the scripts are usually used as follows:
> $fetch_all_feeds_and_pipe_to_agg
> agg_each agg_filter "feed without pubDates"
> agg_each agg_cache "feed without pubDates"
> agg_each agg_read *
But why this hassle? Why not just integrate it into agg, or use a real newsreader for that matter?
To be fair, on operating systems with more powerful concepts, agg is unnecessary since it basically does nothing more than performing a deserialization of a news feed.
UNIX, however, is a rudimentary operating system and lacks powerful concepts. In this case the problem boils down to the completely rudimentary native objects and methods of communication.
When a set of people communicates openly with the requirement that everyone should be able to take part in the communication, every person in the set must speak in a way everyone can understand. Thus, the communication can only be as smart as the dumbest person in that set. Else, you'd have to introduce even more people into the set because translators between the smart and dumb ones are required. Not only is this process cumbersome, but also you'll be occasionally lost in translation.
Speaking in terms of this metaphor, UNIX is dumb. Processes are only expected to communicate via streams of lines of text, files and directories[3] this is especially true for the whole base system.
So, in order for users to be able to actually use the system (as opposed to using yet another application), agg either has to be file-based or needs even more (and even more complex) auxiliary tools, which indirectly leads to exactly the problems mentioned here.
Yes, it's a hassle. But it's the only practical way to at least partially achieve something Alan Kay has explained in his talk The Computer Revolution Hasn't Happened Yet at OOPSLA 1997:
Well I had programmed Caesar Franck's heroic piece—and if you know this piece, it is made for the largest organs that have ever been made. The loudest organs that have ever been made, in the largest cathedrals that had ever been made, because it's a nineteenth century symphonic type organ work, and Biggs was asking my friend to play this on this dinky, little organ.—He said, But how can I play this, on this? Biggs, he said, Just play it grand. Just play it grand. To stay with the future as it moves, is to always play your systems more grand than they seem to be right now.
- Currently there's a bug, but this is the concept anyways.
- A cache of your brain, that is, so that the software doesn't try make you read data that's already in there again.
- Yes, and
*argv[],fooenv(),shmfoo()etc. They don't matter in our case since we have "complex" data structures and no sane person would ever imagine using a computer by writing C and following the ancient edit-compile-debug cycle. Also, this issue has already been covered in TraditionalApplicationsConfigurationInterfaces and DisconnectedMonoliths.
Introduction
In this post, it is assumed that current operating systems are multi user systems[1].
There exist several reasons for users of a computer to use encryption. To name two common cases, protecting private data is interesting (a) for everyone that owns a laptop because it might easily get lost or stolen. Less interesting, but a probably far more important reason is (b) the protection against political repression, e.g. in form of police raids.
State Of The Art
Current operating systems provide three primary solutions for encryption:
1. Use no encryption at all.
While this is no solution, it is the default for people using operating systems from a certain vendor that cripples your version even further depending on how much money you've paid. If you've bought it, that is.
I assume that this is also the default on systems of the vendor that has recently been satirized in South Park.
But even GNU/Linux distributions install an unencrypted system by default. However, depending on the specific distribution it might be rather easy to install an encrypted one. At least for technically versed people. As often on GNU/Linux, Joe Regular loses.
2. Individual encryption of private data.
On all major operating systems currently used it is possible for users to set up their own cryptographic container of some sort.
Having to manage the container more or less by hand is, however, inconvenient and not completely orthogonal to regular system usage.
Worse, it is even quite insecure. As pointed out in SucurityHole, the environment in which the user has to enter the passphrase may be compromised. This solves (most likely) problems of type (a), but if someone wouldn't mind spending a tiny effort to get your data, he will be able to get it. So this solution is not reliable in case (b).
So, let's assume (a) has been solved. How can we solve (b)?
3. Full-system encryption.
Full-system encryption is usally realized using a single master passphrase that encrypts the whole (and all) hard disks. If multiple users want to access the system, there are huge problems in case you want to revoke the access of someone. Namely, a new key might have to be chosen and distributed to the specific users. Afterwards, the whole hard disk might have to be reencrypted.
On Linux there exists LUKS, the Linux Unified Key Setup. Among other things, LUKS hides the master passphrase behind an interface allowing seven keys to be set independently. This also allows for a single "master" passphrase to be changed without having to reencrypt the whole medium.
By using a special boot procedure, for example by booting from an certain USB stick that is assumed to be uncompromisable, we get a system that is safe[2]. It can only be accessed by people knowing a passphrase.
But seven individual passphrases implies that only seven individual users are allowed to use the machine physically, which is a problem in theory.
IIRC, there exists another problem on LUKS: Every passphrase is allowed to delete any other one or add new ones. It is even allowed to delete all key slots, rendering all data inaccessible until the cipher can be broken (economically). In other words, it is impossible to enforce a user management policy on this level because of the lack of any suitable mechanism.
The real problem however is that everyone who knows any passphrase has access to all data on the system. For someone knowing a passphrase, the data is as accessible as a regular hard disk is for regular people. In other words, a single user can compromise the entire system and everyones data[3].
Again, since the system itself can be compromised, an individual encrypted storage makes it only harder for attackers, but can't prevent anything reliably. If it was used, checking the compromisable part of the system[4] before handing execution to it seems to be very costly, but would solve the issue. However, sharing data between users is not possible in a space-efficient manner and publishing new modifications is cumbersome and very inelegant.
Conclusion
Encryption on current systems is intrinsically non-multiuser. Either it applies only to single users, or everyone is permitted to compromise the system, or it is hella slow and renders user interactions unusable.
A new encryption mechanism is needed that works on the basis of objects instead of complete partitions, and allows objects to be shared securely between a dynamic set of users.
Comment
In other words, obviously the initial creator has the capability to access and modify the object. She should now be able to pass (a copy of) this capability to other users, which gain access to the resource by using exactly this reference.
Sounds to me like the capability security model on steroids.
- As already implied, this claim is false. Unless we're talking about a very rudimentary concept of multi-userness. But this is not the point of this post.
- At least as long as it is turned off.
- It currently seems to me that this is only possible with intent, though. However, in case of (b) it might be possible to force such intent onto single users.
- Which is essentially the current install of the OS.
What follows is a post that has been transcribed from an (complete, but orthographically awful) electronical manuscript, probably dating to mid 2010.
At first, I'd like to say that I grew up with several terms and don't want to complicate this post by introducing new ones. It should be clear that just because "we always did it this way" is not an argument for not improving the concept of programming.
To make things clear, in this article, code is a graph of statements that can be executed by the OS[1]. Writing code means creating code via one or multiple ways of interaction with a system (text files, graphical programming languages, etc). Said operating system is assumed to be a end-user system, desktop operating system, consumer system, PC OS, or whatever you want to call it.
Any serious operating system provides a (primary) mechanism to be programmed, to which I in following refer to as "the language".
Traditional operating systems require that you have mastered programming before you can use them fully. And it does not much to help you on this way.
With traditional concepts, the bare operating system is a really bad teacher. Both in how it works and can be used, and how it can be programmed. Using and programming are not this different in concept, actually, if you're abstracting enough. It is essentially the concept of traditional systems that keeps the distinction between programmers and mere users intact.
When you can not learn the language from the OS you have to learn it from elsewhere. For starters tutorials on the web provide the first teacher substitute. They have you writing code to feed the OS and give some comments on what you have been doing. However, the OS could be your personal tutor that could show you programmatical equivalents of your day to day interactions, explain stuff and teach you what really is relevant to you.
Traditional operating systems have the additional problem that their language is low-level and/or not powerful[2]. The problem of the latter is obvious. The former makes it significantly harder for beginners to learn it.
So, let's assume our hypothetical user has managed to learn both programming and the language of the system, and encountered a problem that she wants to solve. Finally, all these hours will pay off! But where to start? What is she going to do? Read tons of documentation provided by the OS in hope to find a pointer that gets her further? Most likely she'll ask Google. Afterwards she read some specific documents of the OS. But maybe she's already found a solution because other people weren't able to come up with one to a similar problem themselves.
With third party programs like script language interpreters that come with batteries included[3], it is significantly easier to solve problems. However, they suffer from the same flaw: They usually do not completely teach you programming and do not allow you to programmatically explore the system. They even may not provide mechanisms for controlling every part of the underlying hardware that could be controlled. If they do, they're rather evolving to an OS themselves[4].
I understand every user that is afraid of trying to learn programming a traditional OS. And I understand every user that does not want to spend many hours with static tutorials[5] to solve a simple problem each few weeks.
Simple things should be simple, complex things should be possible. -- Dr. Alan Kay
I understand every user that wants using the OS to be fun.
And I also understand that it is the job of the OS designers to make it easy for users to solve their problems. Making the whole system itsself dynamically programmable, creating a reflective, unified user interface that includes and teaches programming facilities, and employing progressive disclosure throughout the system could be a good way to give users the power and comfort they deserve.
- fuzzyness intended
- namely C and sh
- hint hint
- or an OS API, but the user doesn't really care about hidden implementation details
- that might actually be outdated and/or completely wrong
Intuitively, it was already clear to me that we have to overcome the traditional way of programming if we want to achieve progress.
In CodingStandardsAreMisleading I explained that writing text that then is translated into business logic is an aritrary decision. This process is well-understood and we always did it this way. In the days of slow CPUs and scarce memory it actually made sense. But these days are over, at least on PCs. So should be the age of textual programming.
Introduce Subtext, yet another programmable system I discovered some days ago. But this one is different. Not different in the sense of various new Smalltalk forks compared to C, no, this one is different.
I always felt that some kind of GUI that increases efficiency drastically would be the interface for programming in the future. But your gut should not replace your head, and the latter is what Jonathan Edwards obviously used.
We're doing boolean algebra in our heads when there's a computer sitting right in front of us. Meanwhile, the computer is emulating a teletype.
Basically, Edwards moved programming from the first dimension (programs as more or less sequential list of instructions) into the second one (and added appropriate IDE support).
How he did do this? See for yourself.
Disconnected Monoliths
This is a follow up to TraditionalApplicationsConfigurationInterfaces, which claimed
[on] traditional operating systems, programs [...] run as monolithic objects that are detached from the environment.
At first, let me correct the above statement. It is just wrong that "programs run [...] detached from the environment". Parameters and configuration files have been mentioned, environmental variables have been forgotten. However, what that article was trying to say remains valid, but has not been explained properly.
The original article showed how it practically impossible to dynamically change the runtime behaviour of an individual application.
This article will explain why it is practically impossible for application developers on traditional systems to allow for dynamical reuse of their application
As explained, on traditional systems all input to programs is considered to be text[1]. But so is output.
On static class-oriented systems, one could pass any object of any class that inherits from the expected base class or implements the neccessary interface. On dynamical object-oriented systems, one could pass any object that understands the required message protocol[2]. These are valid demands[3].
On current operating systems, even if application developers factor their code thoughtfully, there is still no possibility to pass in arbitrary[4] objects. This is not only because the input is just text, but also because the output is just text, too.
There exists no space between different processes. There are no objects. There is only void; and null-terminated byte strings.
Of course, since these systems are turing complete one could make passing objects possible. However, is it the job of application developers to write code that enables their applications to accept and pass arbitrary, live objects? Or is it the job of operating system designers to provide an easily usable environment?
I will create a universe.
- Actually, its not even text but strings of bytes.
- Or, you could pass any object but get an exceptions if it does not understand a message it has been sent.
- Again, see Open-Closed-Principle.
- As explained in TraditionalApplicationsConfigurationInterfaces
Linux and Laptops
Linux has a feature called laptop_mode. According to /usr/src/linux/Documentation/laptops/laptop-mode.txt, laptop_mode "is used to minimize the time that the hard disk needs to be spun up, to conserve battery power on laptops. It has been reported to cause significant power savings". However, it has also been reported that GNU/Linux causes harddrives to fail far too early[1].
The issue I'm referring to in this post is a bug[2] in the Ubuntu distribution of GNU/Linux. In short, Ubuntu did a
hdparm -B 1 /dev/sd*
which causes all block devices to use the most aggressive form of power saving possible. As a result, the head gets parked and the platters will spin down after a short idle period. Having the disk spin down early is good. Having it spin up and down again frequently is bad. And that's where laptop_mode comes into play.
Laptop_mode allows to automatically delay writes to the disk and to flush all data to be written once the disk has to be accessed. Sounds clever? No, it's a hack. It's an exemplary semi-solution since the disk is not only accessed by writing but also by reading[3].
Linux, like all other traditional systems, has the
inconvenient bugfeature that nearly
everything has to access the disk[4], sometimes described
as the disk-ram-dichotomy[5]. While information that is
read from the disk can be cached so that subsequent
accesses won't trigger another drive spinup, caches have to
be filled before being useful. GNU/Linux' semi-solution
provides no means to accomplish this, leaving users with
drives that frequently have to spin up. Oh, how should I
know you want to access those files in your home dir you
always use? And why should I automatically cache them when
there is memory left and CPU not in active usage? This is a
deep-rooted conceptual issue of traditional systems.
Judging from my own experience, laptop_mode does not even seem to delay writes properly. I got frequent spinups even when not working on the machine, and I'm using a quite minimal install[6] (traditional GUIs are so bad that even the Unix Shell is better). Well, it might just've been another bug or a hole in this kludgy concept.
But there is another point to mention. Let's assume there is an operating system that features proper transparent persistence[7]. Wouldn't it be nice to have the disk spin up in a user-defined interval like fifteen minutes (or in case of low battery) to write all changed data and to instantly stay shut afterwards until the end of the next interval? In other words, let's assume an operating system supporting aggressive power management. Wouldn't it be nice if the hardware supported such aggressive power management, too?
Some people think the above mentioned bug was not an operating system issue[8]. Blaming the hardware vendors for supporting aggressive power management when the OS misuses it is, however, a sign of blatant ignorance.
And not being able to temporarily disable the disk during normal system usage[9] is a sign of a bad operating system[10].
Footnotes
- http://paul.luon.net/journal/hacking/BrokenHDDs.html
- http://hardware.slashdot.org/article.pl?sid=07/10/30/1742258
- Additionaly, harddrives may get probed (e.g. SMART), which triggers spinups always. Now the combination of a SMART deamon and hdparm -B 1...
- The disk is commonly accessed rather directly (through filesystems) always when persistence is desired. All data is usually available stored only on the disk.
- http://c2.com/cgi-bin/wiki?ProgrammingWithoutRamDiskDichotomy
- This includes not using a SMART deamon or similar.
- Read: High-level transparent persistence and economic resource management, and that includes intelligent pre-caching data when possible and probably beneficial.
- https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695
- We assume there is enough RAM, which usually is the case.
- Of course it can be partially hacked onto GNU/Linux,
it's basically
mount -t tmpfs tmpfs $DIRwith moving data around before and afterwards. The resulting workflow is, however, not straight-forward and the mechanism very error prone if one does not glue all those loose ends together that might cause data loss. A sophisticated solution can only be part of a sophisticated OS.
Coding Standards Are Misleading
The functional aspect of software is almost always disregarded in coding standards. At best, they focus on technical aspects, but most of them focus primarily on syntax. They explain how source code should be formatted to be readable. It has been proven that maintainability is important. However, consistent and clear syntax gives a fake illusion of clean and understandable code.
Syntax is just an arbitrary representation of software. In a sophisticated environment, you could redefine the syntax of the language easily. E.g. changing the characters for blocks from [] to {} would be one command (two commands if there would be conflicts). Complex redifinitions would be possible, too, as well as displaying and modifying the same code in a graphical way (21st century, anyone?).
We have to think of the source code that we write as just one of several ways to design software. Software does not need to be directly bound to what we write. Software is rather a form of byte code -- semantically correct and independent from any representation. If we think of software as being objects and message sends that exist beyond textual (or any other) source, we have laid the foundation for a language that can evolve and adapt its representation to various display formats or users.
But what does this mean for coding standards?
Well, if the code you write does not have a syntax but will be displayed using the syntax rules of the one viewing it, most (parts of) coding standards have lost their right to exist.
However, software engineering has standards.
Few coding standards imply or mention that programmers will or should also regard the source-level design. But that topic is far too large to be discussed in a coding standard and could fill several books. And, in fact, it already has: There are, for example, works on design principles and smells[1], design patterns[2], refactoring[3][4] and code smell[5][6]. Developers should also be able to chose the right data structures and algorithms for the task at hand -- an issue that fills a lot of other books[7][8][9].
Yet, the most important point has not been mentioned: The reason why we write software. Usually, we program (directly or indirectly) for some kind of customer, even if it's only ourself. Coding standards do in no way specify any goals of or approaches to interaction design, usability engineering or how one might call it. There are, again, various books but since I don't know whether they're worth it, I'll spare you.
Coding standards seem to favor syntax over technical aspects over functionality. A sure way to create conceptually and technically broken software.
References
- Martin 2003: Agile Software Development. Principles, Patterns, Practices.
- The Gang of Four 1994: Design Patterns: Elements of Reusable Object-Oriented Software
- Fowler & Beck 1999: Refactoring. Improving the design of existing code.
- Kerievsky 2004: Refactoring to Patterns.
- Fowler & Beck 1999: Refactoring. Improving the design of existing code.
- Mäntylä 2003: Bad Smells in Software -- a Taxonomy and an Empirical Study
- Knuth 1968
- Knuth 1969
- Knuth 1973
Sanity ranking, take two
Suppose there is a directory that contains files which shall be sorted according to the second character of their name.
Windows
Launch a browser and search for a tool that can do this, if not already installed. If you don't find one, you're screwed.
Unix
ls | sed "s/\(.\)\(.*\)/\2 \1/g" | sort | sed "s/\(.*\) \(.\)/\2\1/g"
Sorting is done by manipulating the data you're querying so that it can be sorted. Afterwards the corrupted data is restored manually.
Irrational, potentially faulty and requires a little knowledge of regex.
Smalltalk
filenames asSortedCollection: [ :a :b | (a at: 2) < (b at: 2) ].
Readable and understandable, logically correct and less tedious to type than the unixish solution.
Splitting a mailbox (sanity ranking I)
Splitting up a mailbox depending on the recipient's addresses sounds like an easy task: Look at each mail and move it into the mailbox named after the recipient.
Simple things should be simple, complex things should be possible. -- Alan Kay
I'm stuck with GNU/Linux because of the lack of significantly better alternatives. I'll take this system as an example for unixoid systems, as it should mirror the Unix way of doing things quite well.
Many of the basic tools in Unix work on a stream of information, which is processed line by line. Information is typically stored in plaintext files which are organized in a hierarchy of directories. As email is plaintext, too and maildir stores each email in a seperate file, splitting such a mailbox should be an easy task for unixoid systems.
I don't want to go into detail about the script that I had to hack together -- it doesn't even work properly. Anyway, here's the code:
#!/bin/sh
ADDR="`cat $1 | grep '^To: ' | sed -e 's/To: //'`"
echo $ADDR | tr '[:space:]' '\n' | line::each line::noempty > /tmp/msplit
ADDR="`cat /tmp/msplit | \
egrep -i 'foo@example.org|bar@example.org' | \
tr -d '<' | tr -d '>' | tr -d '"' `"
if [ -n "$ADDR" ]; then
mkdir "$ADDR"
mv "$1" "$ADDR"
fi
The script had to be called for every email (see find and xargs).
The question is: How could one accomplish this task on a sane operating system?
newBoxes := Set new.
mails collect: #recipient thenDo: [ :each | newBoxes at: each put: Mailbox new ].
mails do: [ :each | (newBoxes at: each recipient) add: each ].
You might want to check out the archive of posts tagged "utopia".