I'm currently hearing a lecture on programming paradigms which has left me rather fascinated. Up to now, it has primarily been about the Haskell Language, which seems to be based on simple and elegant concepts.

As a side note, this is my first time getting in practical touch with functional programming and working with such a language, so I might get things wrong.

Basically, in functional languages there's no state and all functions calls lead to some kind of evaluation (or rather transformation) of expressions that return a value. This is totally different from what is contemporarily known as OOP.

Like every so often I have some gut feeling but can't put a finger on it. So I feel like dumping a WIP.

As you might know, I design an object system. Or maybe I'm creating some Frankensteinian monster out of several systems and concepts that influenced my thoughts, most notably Smalltalk, Self, Newspeak and maybe Lisp (despite not ever having written a line of code in any of them, except for the first one).

Anyways, let me explain how the concept looked the last time I worked on it (which already was several months ago).

struct Object
{
    const struct Object* const slots[SLOTS_SIZE];
};

(Of course, the level of detail has been reduced heavily.)

A message send in this system, for example a foo: b (to use Smalltalk syntax), is just syntactic sugar for the evaluation of a code object stored in the specified slot (for example in slot foo:), using appropriate parameters (foo value: b).

A code object, or rather code generally, is a closure (comparable to a Smalltalk block), which basically is a lambda expression whose free variables have been bound.

When an execution context is set up, a certain variable (this/self) is defined to reference the object receiving the message. This way, we can use the well-known programming style of methods.

Since we're using the capability security model, there are no globally accessible objects and no objects can be accessed that have not been passed as a parameter. Note that since everything is an object, it follows that there is no global state. The accessibility relation, however, is transitive, so one can chain multiple message sends and probably have access to a vast part of the object graph.

Also note that both pointers and pointees are const. This means that all objects and all indirectly reachable ones are immutable[1]. In other words, our message sends are referentially transparent or maybe even pure, probably with the downside of complex "hidden" state.

But this seems backwards. What use is a immutable operating system? The real world does have state.

This is a requirement of versioning. At each point in time, the system is in a fixed state. A historic state can not be changed by definition. However, with the blink of an eye you can go backwards im time, fork a new branch of evolution and switch between several of them -- on an object/subtree level, of course.

Currently, I plan to use few special types of transparent native references, primarily for improving capability security. In this text, the focus lies on an update-following reference used to give its owner the impression that the current state of the system is mutable by allowing certain objects to always automatically reference the latest version of another object.

Unfortunately, a update-following reference reintroduces side effects, even if they are restricted to certain situations.

Consider the following snippet of code for such a system:

f := [ ... ].
((f value: x) = (f value: x)) ifFalse: [
    self error: 'Referential transparency?'.
].

Now that snippet will always pass if it is run in a context in which references are not automatically updated when a new version of an object is created[2].

Such modification of the context is required anyways. After all, we want to be able to "check out" historical state and continue to work from there. This requires even automatically updated references to point to the latest version of an object as it was at that point in time. Similarly, the context can be set up to only include object versions whose date is smaller or equal to the date the snippet was started to be evaluated.

This raises several questions. Can this simple and elegant[3] scheme reduce side-effects significantly? Can it improve the quality of the system? Is object-orientation without/with less/with controlled side effects possible?

Furthermore, code evaluation in this system is still following the execution-pattern and is not designed to be thought of as transformation of expressions. So, can such an object-oriented scheme provide a practically usable way to evaluate expressions lazily?


  1. Let's ignore const-casts, the code above is just an illustration
  2. This might not be completely true. How is nondeterminism handled, e.g. random numbers and external events/input?
  3. Well, at least I hope it is simple and elegant.
Posted 2011-11-21 20:42 Tags: Smalltalk

Newspeak is a programming language in the spirit of Smalltalk. In fact, its current prototype is even hosted in Squeak. It is also influenced by Self, being completely based on message sends (there are not even variables, at least in concept). But unlike Self, Newspeak is class based.

We're not talking about classes in the sense of Java, nor in the sense of Smalltalk. Newspeak features an entirely different approach, with classes representing namespaces and modules.

I've not read much about that language and I've not programmed a single line in it, so I'm not going to do a write up here. Instead, I refer you to a series of articles that was a very interesting read.

Posted 2011-05-15 21:20 Tags: Smalltalk

Today, I wrote some code calculating the digital root of a (positive) integer, just for fun.

Again, I used a hypothetical language, here presented using the syntax of Smalltalk. The resulting function reads pretty much straight-forward:

i := 4528.
[ :i | i := (i asString asArray collect: #asInteger) map: #+ onto: 0 ] until: [ i < 10 ].

Obviously, we can not collect the single digits of a string as integers. The result would be a string... of integers?

map:onto: has been used as a hypothetical object oriented implementation of the ideas behind inject:into:. The latter can not be used in conjunction with the "method" +[1]. This may come suprising for someone who used the Symbol>>value: hack too frequently. In fact, that hack creates the impression that certain methods of the Collection framework send messages to the individual elements. Which is wrong; it's all about evaluating a block with them as parameter.

Additionally, I encountered a really ugly inconsitency in Integer>>asCharacter and Character>>asInteger. Consider the following snippet:

3 asCharacter asInteger.

What does it evaluate to? 3? Wrong. It's 51. Because that's the ASCII value of the character 3.

It is blatantly obvious that these methods should be inverses of each other and that no low-level detail like character encoding should be leaked on such a high level.

Having considered these problems, here's some real Smalltalk code that runs in Squeak:

i := 4528.
[ i < 10 ] whileFalse:
    [ i := ((i asString asArray collect: #asString)
        collect: #asInteger) inject: 0 into:
            [:sum :each | sum + each]].

Actually, we don't need to collect: #asInteger, since converting characters or strings in arithmetical operations is done implicitly in Squeak. In other words, we're losing type safety.

Yeah, and that's how to calculate the digital root!


  1. Yeah, it's not a method, see the following link.
Posted 2011-04-09 21:42 Tags: Smalltalk

Traditional Applications and Configuration Interfaces

Introduction

In this post I will explain how traditional software systems are hard to reuse and thus are hostile to their users. While the solution is pretty simple, I am not aware of any system that has really solved this issue.


In traditional operating systems, programs are monolithic blobs. They may be compiled to machine language or written in a scripting language; in both cases they run as monolithic objects that are detached from the environment.

Configuration of runtime behavior is commonly achieved through parameters and configuration files[1], which I will be calling the configuration interface below.

The most important thing to mention is that all configuration happens through text. Text, however, does not provide any functionality, it simply describes. Thus, the application under configuration must interpret the input.

For this reason there are various configuration formats for user's to adhere to. They ensure that the application syntactically understands the configuration. There are two choices for configuration languages: The configuration language may either resemble a programming language, allowing user's flexible and powerful configuration. However, developers have to write a complex parser and engine for that language, or at least bindings in the applications source code. Or the configuration language may be simplistic, easy to parse but not turing-complete and almost useless for complex problems.

In any case the input has to be interpreted by the application, allowing to only pass objects[2] that it semantically understands. This is a blatant violation of the Open-Closed-Principle[3] (OCP). You can not, by concept, pass objects of any kind the original software author did not anticipate (and hard-code into the source). You can not modify the behavior of the program by passing a modified object. You can not do anything the author did not explicitly program. There is absolutely no polymorphism, users are completely controlled by what the software author dictates[4]. The only way to "configure" traditional applications beyond the limitations forced unto users is to edit the source code, which may result in major trouble considering the usage of low-level languages and the lack of software engineering discipline.

The OCP was stated more than twenty years ago by Bertrand Meyer and is a basic principle of software engineering. However, current operating systems are still not designed according to such concepts, not even Smalltalk[5].

Still a long way to go.


Footnotes

  1. More complex user interfaces contain functionality to change the settings in the running program via various ways. In any case, the theory and functionality behind simple and complex configuration mechanism in regard to the current topic is the same, so there's no need to make distinctions here.
  2. Not to be confused with the OOP-philosophy of objects.
  3. Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification, see also http://c2.com/cgi/wiki?OpenClosedPrinciple.
  4. Even when using so-called "free software".
  5. Squeak Smalltalk mirrors traditional operating systems remarkably well on the user-interface level. Unfortunately, I'm not aware of a Smalltalk system that makes a significant difference. A shame considering that Smalltalk provides the foundation for many improvements that are impossible in traditional systems.
Posted 2010-04-01 12:54 Tags: Smalltalk

The Squeak VM is Broken

SqueakIsAnInsultToSmalltalk and conceptually broken, as I explained in that post. There are more flaws in its concept, but I think I already said enough. Let's focus on its implementation.

That the VM forces users to use a GUI and that it segfaults occasionally, despite being "mathematically perfect C code" as Alan Key said at OOPSLA97[1], was already mentioned. I just stepped over additional information. Squeak is part of Gentoo GNU/Linux's Portage tree (the package repository), but will be removed because of security issues[2]. Turns out the developers of the VM jammed several libraries directly into their own project -- most of them without changes to the codebase. I doubt that this had to be neccessary.

This approach is considered bad even by *nix people. It violates all principles of modular systems, has a severe impact on software maintenance, disables users from controlling which software (versions) to install, smells of Duplicate Code, and whatnot. Additionally, it increases compile time and disk space usage without providing any benefit.

As a result, the Squeak VM has joyfully imported several security issues from the projects it jammed into its own source directory. If the authors had paid attention to some concepts of software engineering (yes, even *nix has a few of them, probably more than the Squeak VM), vulnerable packages would've automatically been updated on the next system update. And if Squeak would've depended on a specific version, the package manager would've directly reported the resulting problems to the user. In any case, Squeak itsself would not have been an issue.

That's what you get for disregarding basic software engineering principles. Glad that somebody found out.


References

  1. I strongly recommend watching this talk, titled "The Computer Revolution hasn't happened yet". It's somewhere on the net. Try starting there: http://c2.com/cgi/wiki/?AlanKay
  2. http://bugs.gentoo.org/show_bug.cgi?id=247363
Posted 2010-03-07 12:52 Tags: Smalltalk

Generic Programming

Admittedly, I don't really know what to write about this topic. Programming is generic by nature: You specify a list of instructions to be applied to abstract input.

Well, "Generic Programming" is in vogue, which is not surprising since it allows for better abstraction. Better abstraction than the average mainstream language provides, that is. "Generic Programming" is nothing more than hacking statically typed languages with a more dynamic approach.

To state the obvious again, here's a tiny example.

// C++ std::max
template <class T> const T& max(const T& a, const T& b)
{
    return b < a ? a : b;
}

"Same algorithm in Smalltalk"
maxOf: a and: b
    b < a ifTrue: [ ^a ] ifFalse: [ ^b ].

"Or, using proper OOP:"
max b
    self < b ifTrue: [ ^self ] ifFalse: [ ^b ].

It is somewhat funny that major languages and/or their compilers have problems implementing features to allow for "Generic Programming" -- and monstrous standards and a complex and inconsistent syntax are problems, too. Languages like Smalltalk support this approach out-of-the-box, programmers use it intuitionally.

Let's just let the term "Generic Programming" die -- it is the most natural form of programming and deserves special treatment only in flawed languages.

Posted 2010-03-03 23:17 Tags: Smalltalk

Squeak is an insult to Smalltalk

Squeak is, actually, the platform I used for getting in (practical) touch with Smalltalk. It was a very influencing and important experience. However, Squeak is not only stuck in the 1980s, it is also broken.

Smalltalk has always been a kind of operating system. Squeak is no exception and carries on the spirit of (pre) 1980: It is intended as a single-user system and has no security concept (object capability model would be easy to implement, though). AFAIK, program windows were originally invented in Smalltalk. To this day, Squeak adheres to this obsolete WIMP (windows, icons, menus, pulldowns) concept which limits direct interaction with objects on an otherwise rather live and reflexive system. Additionally, the GUI is hard-coded: you can't run Squeak properly without using its GUI. The "headless" parameter renders it anti-interactive (unusable, in other words). Not only is the user interface inflexible, its distinguishes between "novice mode" and "expert mode", which is anti-pedagogic. The reason why there supposedly must be an expert mode is that you can perform several actions that could destroy your Squeak image.

That brings us to the next issue: Squeak is irreversibly destructive. AFAIK, Objects in Squeak are not versioned, you can't simply restore a previous state, neither individual nor global, after you've screwed up. Screwing up is actually fairly easy. Sometimes the update of (what I believe seems to be) the basic image from within Squeak is enough to render it unusable. Even if you just want to uninstall a package you installed using Squeaks native package manager, you won't find an "uninstall" button or menu entry -- because there is no possibility to properly uninstall formerly installed software. (No, I'm not kidding.)

Well, since it is easy to irreversibly damage an image, at least Squeak doesn't automatically write it to disk (i.e. transparent persistence). Oh, wait, that's not a feature, that's a hack. I want transparent persistence!

What would be even better is error-free persistence. Squeak somehow managed to lose my original source code and seems to display silently reverse-compiled bytecode. Parameters are named tn where n is the parameter number and all comments are lost. Except for class comments -- these are referenced by RemoteStrings that point beyond end of file, neatly throwing an exception each time I try to access a class in the system browser. Considering these severe bugs, I suppose the next time the virtual machine segfaults, which it does once in a while, I should be happy -- at least I have only lost few data instead of corrupting it all.

Ultimately, even though a proper smalltalkish system would be far more sophisticated, powerful and efficient than a unix shell, I enjoy working on my command line more than trying to click my way through Squeak, having to tediously move, drag and drop windows and objects all over the place.

It's not even fast enough for my neurons.

Posted 2010-03-01 01:01 Tags: Smalltalk

Sanity ranking, take two

Suppose there is a directory that contains files which shall be sorted according to the second character of their name.

Windows

Launch a browser and search for a tool that can do this, if not already installed. If you don't find one, you're screwed.

Unix

ls | sed "s/\(.\)\(.*\)/\2 \1/g" | sort | sed "s/\(.*\) \(.\)/\2\1/g"

Sorting is done by manipulating the data you're querying so that it can be sorted. Afterwards the corrupted data is restored manually.

Irrational, potentially faulty and requires a little knowledge of regex.

Smalltalk

filenames asSortedCollection: [ :a :b | (a at: 2) < (b at: 2) ].

Readable and understandable, logically correct and less tedious to type than the unixish solution.

Posted 2009-10-18 13:00 Tags: Smalltalk

Splitting a mailbox (sanity ranking I)

Splitting up a mailbox depending on the recipient's addresses sounds like an easy task: Look at each mail and move it into the mailbox named after the recipient.

Simple things should be simple, complex things should be possible. -- Alan Kay

I'm stuck with GNU/Linux because of the lack of significantly better alternatives. I'll take this system as an example for unixoid systems, as it should mirror the Unix way of doing things quite well.

Many of the basic tools in Unix work on a stream of information, which is processed line by line. Information is typically stored in plaintext files which are organized in a hierarchy of directories. As email is plaintext, too and maildir stores each email in a seperate file, splitting such a mailbox should be an easy task for unixoid systems.

I don't want to go into detail about the script that I had to hack together -- it doesn't even work properly. Anyway, here's the code:

#!/bin/sh

ADDR="`cat $1 | grep '^To: ' | sed -e 's/To: //'`"
echo $ADDR | tr '[:space:]' '\n' | line::each line::noempty > /tmp/msplit
ADDR="`cat /tmp/msplit | \
      egrep -i 'foo@example.org|bar@example.org' | \
      tr -d '<' | tr -d '>' | tr -d '"' `"

if [ -n "$ADDR" ]; then
  mkdir "$ADDR"
  mv "$1" "$ADDR"
fi

The script had to be called for every email (see find and xargs).

The question is: How could one accomplish this task on a sane operating system?

newBoxes := Set new.
mails collect: #recipient thenDo: [ :each | newBoxes at: each put: Mailbox new ].
mails do: [ :each | (newBoxes at: each recipient) add: each ].
Posted 2009-10-10 14:00 Tags: Smalltalk

You might want to check out the archive of posts tagged "Smalltalk".