Full-system encryption has obvious benefits which I won't explain here. However, there are several downsides, some of them I explained in cryptography on multi user systems. Another downside is that a fully-encrypted system is... secure.
Consider the following two cases:
- Your machine is stolen, which is especially likely with notebooks. Even if you can get it back, the thief or the next illegitimate user finds the system completely unusable and will format your disk. I assume you're keeping backups anyways, but that sucks nevertheless.
- You got no plausible deniability. In fact, having obviously encrypted[1] your system can make you look suspicious or even guilty, even if you only wanted to preserve your privacy. Something like that happens often. After all, a righteous citizen has nothing to hide, right?
With an unencrypted system, you might have a higher chance in catching the thief, as I've seen yesterday. You can watch that talk on youtube[2]. Seriously, watch it, it's hilarious!
Today, then, it turns out that something similar has happened recently.
Additionally, with an unencrypted system, you could prove that you are, in fact, not guilty when being accused of some crime. Okay, you could just hand out the keys, but there is something better: plausible deniability.
Since we still want to use an encrypted system, the idea is to use two of them. The first one, intended for everyday usage, is encrypted and hidden, the second one completely unencrypted and unprotected, but silently tracks the machine, monitors the users interactions and is backdoored and accessible for you by default.
I'm quite sure that something like what has been described in above links won't work often, at least not completely. But it might help. And setting something up like this costs time only once and does not waste that much hard drive space. It might be worth trying.
I'll save the concrete setup for another post, but here's the basic approach[3]:
- Perform a regular setup of the fake system of your choice. Remember: it has to be usable for everyone with physical access. Leave a large enough part of the disk unpartitioned for the real thing.
- It also has to be usable for you via the net. Setup the required software.
- Start a LiveCD or similar. You don't want to leave traces in the fake system.
- Set up loopback devices to somewhere in the unformatted space. These are your systems partitions. Note the start addresses and sizes.
- Set up the encryption mechanism of your choice (e.g. dm-crypt[4]) on the loopback devices.
- Install.
- Setup an USB memory stick or similar as your boot device. It is unencrypted and holds boot loader, an initial system (e.g. initramfs) that knows addresses and sizes of your hidden partitions, sets up the loopback devices, checks whether the user is authorized and, if so, decrypts and boots the real system.
- Configure the BIOS to try to boot from USB before trying to boot from hard disk.
Congratulations. Your machine runs a neatly backdoored system that boots by default and that everyone can use[5]. The only thing that might tell attackers something is the fact that the system hasn't been used for months and that the disk has some unpartitioned space left. But we all like to have a bit of emergency storage for future partitions, right?
Oh, and keep in mind to completely overwrite the disk with random data before performing the steps above.
- The MBR is still on the disk, readable, and one can see that the content of each of your partitions is, for example, encrypted using dm-crypt.
- In the likely event that this video will get removed, that talk is named "pwned by the owner: what happens when you steal a hacker's computer" and was held at DEF CON 18.
- This approach has not been tested by me in detail, but something along these lines should work.
- But beware! This mechanism might write unencrypted meta-information which may tell that there might be some encrypted data.
- Which might be, in fact, another security problem as they are able to store any data there and claim it was yours.
agg is a news aggregator that stays focused on its goal, namely reading a feed and creating a representation that can be worked with in the whole system. The process is straight-forward: store all items[1] that have a publication date newer than that of the latest item received previously.
That works remarkably well in the common case. A less common case is feeds without publication dates for their items. In fact, the pubDate tag is not required according to the RSS 2.0 specification. What to do in this case?
The solution is simple: if the author ignored publication dates, the aggregator shall do so, too. This has the inconvenient effect that once you've read all items and probably deleted them afterwards, running agg again on that feed will lead to all these items being there again!
So we need some form of caching[2]. However, such a rare case that hasn't much to do with agg's job anyways should not end up in the same piece of software. Here are my tools for the output format of versions 0.2:
agg_filter to remove an item iff it has already been cached:
ITEM_NAME="$1"
FEED_PATH="`dirname "$ITEM_NAME"`"
FEED_NAME="`basename "$FEED_PATH"`"
grep -q "`agg_hash "$ITEM_NAME"`" ".$FEED_NAME.cache"
[ $? -eq 0 ] && nomtime "rm -rf" "$ITEM_NAME"
agg_cache to cache an item:
ITEM_NAME="$1"
FEED_PATH="`dirname "$ITEM_NAME"`"
FEED_NAME="`basename "$FEED_PATH"`"
agg_hash "$ITEM_NAME" >> ".$FEED_NAME.cache"
agg_hash to create a hash of an item:
cat "$1/title" "$1/desc" "$1/link" 2>/dev/null | sha1sum | awk '{ print $1; }'
With the agg_each script posted in
using agg 0.2, the scripts are usually used as follows:
> $fetch_all_feeds_and_pipe_to_agg
> agg_each agg_filter "feed without pubDates"
> agg_each agg_cache "feed without pubDates"
> agg_each agg_read *
But why this hassle? Why not just integrate it into agg, or use a real newsreader for that matter?
To be fair, on operating systems with more powerful concepts, agg is unnecessary since it basically does nothing more than performing a deserialization of a news feed.
UNIX, however, is a rudimentary operating system and lacks powerful concepts. In this case the problem boils down to the completely rudimentary native objects and methods of communication.
When a set of people communicates openly with the requirement that everyone should be able to take part in the communication, every person in the set must speak in a way everyone can understand. Thus, the communication can only be as smart as the dumbest person in that set. Else, you'd have to introduce even more people into the set because translators between the smart and dumb ones are required. Not only is this process cumbersome, but also you'll be occasionally lost in translation.
Speaking in terms of this metaphor, UNIX is dumb. Processes are only expected to communicate via streams of lines of text, files and directories[3] this is especially true for the whole base system.
So, in order for users to be able to actually use the system (as opposed to using yet another application), agg either has to be file-based or needs even more (and even more complex) auxiliary tools, which indirectly leads to exactly the problems mentioned here.
Yes, it's a hassle. But it's the only practical way to at least partially achieve something Alan Kay has explained in his talk The Computer Revolution Hasn't Happened Yet at OOPSLA 1997:
Well I had programmed Caesar Franck's heroic piece—and if you know this piece, it is made for the largest organs that have ever been made. The loudest organs that have ever been made, in the largest cathedrals that had ever been made, because it's a nineteenth century symphonic type organ work, and Biggs was asking my friend to play this on this dinky, little organ.—He said, But how can I play this, on this? Biggs, he said, Just play it grand. Just play it grand. To stay with the future as it moves, is to always play your systems more grand than they seem to be right now.
- Currently there's a bug, but this is the concept anyways.
- A cache of your brain, that is, so that the software doesn't try make you read data that's already in there again.
- Yes, and
*argv[],fooenv(),shmfoo()etc. They don't matter in our case since we have "complex" data structures and no sane person would ever imagine using a computer by writing C and following the ancient edit-compile-debug cycle. Also, this issue has already been covered in TraditionalApplicationsConfigurationInterfaces and DisconnectedMonoliths.
Version 0.2 of agg, has just been released. It is mostly consistent with the 0.1 versions in terms of bugs, but the output format is completely different.
The previous version created (absolutely poorly formatted) HTML files to represent the news items. This was good enough for my use case. But when I briefly told a friend about this project, he proposed a different use case that was not possible in the current concept.
As always, my subconscious started working on this issue, and some time later something popped into my mind. I've been claiming that agg does the simplest thing that could possibly work, namely only dumping news feed items. But this was not entirely true. In fact, agg also knew a bit about HTML and formatted the output accordingly. This not only violated one of agg's goals (having a single responsibility) but also virtually discarded all meta information. Such meta information, however, is required for use cases like the one proposed by said friend of mine.
Starting with the 0.2 versions, agg represents news items as directories with all supported (and available) properties as single files (currently title, desc(ription) and link). For starters, here's a new (compacted) version of the CLI I posted for the previous versions.
agg_each:
CMD="$1"
while [ $# -gt 1 ]; do
shift
find "$1" -mindepth 1 -maxdepth 1 -type d -exec "$CMD" '{}' \;
done
agg_read:
ITEM=$1
function delete() { nomtime "rm -rf" "$1"; }
function tui() { agg_htmlize "$1" | elinks; }
function gui() {
TMPFILE="/tmp/`basename "$1"`.html"
agg_htmlize "$1" > "$TMPFILE"
opera "$TMPFILE" # asynchronous if already running
sleep 3
rm "$TMPFILE"
}
function TUI() { elinks "`cat "$1/link"`"; }
function GUI() { opera "`cat "$1/link"`"; }
function prompt()
{
echo "$1"
CMD=
while [ "$CMD" != t -a "$CMD" != T -a "$CMD" != g -a "$CMD" != G -a "$CMD" != d -a "$CMD" != n ]; do
echo -n "[t]ui, [g]ui, [T]UI, [G]UI, [d]elete, [n]ext: "
read CMD
done
}
while [ "$CMD" != n -a "$CMD" != d ]; do
prompt "$ITEM"
case "$CMD" in
t) tui "$ITEM";;
g) gui "$ITEM";;
T) TUI "$ITEM";;
G) GUI "$ITEM";;
d) delete "$ITEM";;
n) ;;
*) exit 1
esac
done
Htmlizing can work as follows:
TITLE="`cat "$1"/title`"
BODY="`cat "$1"/desc`"
LINK="`cat "$1"/link`"
cat << EOF
<html>
<h1>$TITLE</h1>
<p>$BODY</p>
<p><a href="$LINK">Link: $LINK</a></p>
</html>
EOF
The workflow then looks as follows:
> cd ~/feeds
> $script_to_fetch_all_feeds
> agg_each agg_read *
./Foo Feed/Just in: Bar happened
[t]ui, [g]ui, [T]UI, [G]UI, [d]elete, [n]ext:
...
Each item, one after another, its link and all the links it contains can be browsed using the browser (text/gui) selected. By using the capitalized key, the respective browser directly opens the link specified by the news item (if any), which is useful for sites that crop their feed items excessively.
Additionally, browsers should not have problems with the file contents anymore, provided you perform proper htmlization that the browser recognizes as such.
Now the interface is truly flexible.
While the value of UNIX is at least questionable, not even adhering to UNIX standards is worse.
C has one considerable advantage over C++ regarding efficiency for the average UNIX programmer: man pages.
One day I started searching for whether someone has fixed this problem. And the search was successful.
Finally I do not have to try to remember the whole STL in detail.
Today I released the first version of agg, a news aggregator following the UNIX philosophy.
As clearly stated, it has many bugs. However, they are predictable and agg is working fine for all of the other feeds I've subscribed to.
Since a news aggregator that "just dumps" the feed's contents into a directory hierarchy provides only rudimentary efficiency[1] (but much higher flexibility and freedom than those monolithic, unprogrammable, totalitarian systems usually used), I've written a small interface for it (or rather for the file system structure).
These scripts are nothing special. They provide only the necessary features and are simple enough to achieve high efficiency[2].
How agg can be used to subscribe to multiple feeds is
already shown in the man page. For reading the news items,
I'm using two scripts: agg_read to read a
specific item of a specific feed, and
agg_read_all that calls the former for every
item.
agg_read_all is trivial:
find -type f -exec agg_read '{}' \;
agg_read is still simple:
### CONFIG
TUI_READER=elinks
GUI_READER=opera
### END CONFIG
set -e
ITEM=$1
function delete()
{
owd="`pwd`"
cd "`dirname \"$1\"`"
nomtime "rm '`basename \"$1\"`'"
cd $owd
}
function prompt()
{
echo "$1"
CMD=
while [ "$CMD" != t -a "$CMD" != g -a "$CMD" != d -a "$CMD" != n ]; do
echo -n "[t]ui, [g]ui, [d]elete, [n]ext: "
read CMD
done
}
while [ "$CMD" != n -a "$CMD" != d ]; do
prompt "$ITEM"
case "$CMD" in
t) $TUI_READER "$ITEM";;
g) $GUI_READER "$ITEM";;
d) delete "$ITEM";;
n) ;;
*) exit 1
esac
done
The workflow then looks as follows:
> cd ~/feeds
> agg_fetch_all
> agg_read_all
./Foo Feed/Just in: Bar happened
[t]ui, [g]ui, [d]elete, [n]ext:
...
And each item, one after another, its link and all the links it contains can be browsed using the browser (text/gui) selected.
A simple solution, and only an example of the seemingly endless amount of interfaces that could be written for the output of agg. None of them needs to give a damn about XML, RSS, Atom[3] or download logic.
Heck, even agg itsself knows nothing of networking!
- human-time
- as usual, efficiency measured in human-time
- Atom not implemented as of yet
Installation of Broken Systems
Why would somebody want to install broken systems? Well, when there is no alternative to broken systems, you'll have to chose one of them.
Gentoo GNU/Linux is my broken system of choice. It gives me the most freedom and allows me to use the actual operating system, i.e. I can use software that just sucks instead of software that sucks more, and I get exposed to all of the pitfalls and unconceptualized semi-solution of unixoid systems. Valuable lessons.
And, of course, I had not to wait long for the next flaw.
The basic GNU system was installed and I had entered a chroot. Next was installation of various software using Portage, Gentoo's packet manager. Portage, or rather wget, refused to to download sources. The mentioned address resolution problems were a bad excuse, since downloading exactly the same urls by calling wget myself in the very same instance of bash worked.
Allright, I wanted to try Paludis, the other package mangler anyways since they claim they had concepts and well-factored code as opposed to Portage, which had neither[1]. And, lucky me, this one could fetch source packages. Installation could continue.
By the way, the machine to be equipped with Gentoo is a netbook featuring an AMD Geode LX800. Of course it would be a dumb idea to compile a whole OS on a single machine equipped with a 500MHz CPU. But that's what distribution is for -- and that's where the fun begins.
Unix in itsself has no support for distribution whatsoever. Distcc has been written to "address" this issue by providing a hackish solution for compilation on multiple machines. Once installed, the first workaround has to be applied[2].
Configuration is pretty straight-forward. There is /etc/env.d/02distcc, /etc/distcc/hosts and the program distcc-config. The latter is a workaround to hide the ugliness of multiple configuration mechanisms. (FYI, the list of configuration files I mentioned does not include the seperate configuration file for the included server).
To actually use distcc, $PATH has to be modified additionally. Now, compilation will be distributed.
At least in theory.
Distcc couldn't distribute the compilation because "failed to distribute". How smart. Okay, let's enable verbosity. Output now contains various information like time in milliseconds the compilation took, and I'm told that distcc could not distribute because "failed to distribute". Fuck you, I'm pretty aware of that.
After some time fighting configuration files and distcc-config, I noticed that both "distcc-config --get-hosts" and "distcc --show-hosts" return, regardless of configuration setting, "+zeroconf". Yeah, distributing compilation to a host named +zeroconf won't work. And what's the sanest message to report this problem when the maximum of verbosity is requested? "Failed to distribute"? Bullshit.
References
- See Paludis FAQ on http://paludis.pioto.org.
- See Gentoo Distcc Manual.
Sucurity Hole
It seems that the use of a single account (that is regularly in contact with the outside) as primary account is fairly widespread. This account seems to be often allowed to log into root by running su.
This is a very dumb idea. Typing the passphrase for root into a potentially compromised system environment is about as secure as working as root directly.
Let's just assume that an application had a security hole and someone tampered with your environment. Since traditional operating systems use the broken concept of access control lists instead of the capability security model, this is not hard to imagine. Run the following line, and then run su.
alias su='echo -n "Password: "; read -s PASS; echo; echo "Hello $YOUR_NAME,"; echo "You have been hacked."; echo "Password was \"$PASS\"."; echo'
The su you're calling might always be a trojan. You might not notice until it's too late.
If you want to do something as root, switch to a virtual console. Since the login screen might not be the login screen but a phishy process, press CTRL + ALT + SYSRQ + k to kill it and force a reload of the real login screen.
SIGSEG
Write a program the Unix way -- C or its bastard offspring C++, no unit testing and lots of sharp pointers -- and what will you get? Segmentation fault.
I think that a segfault is the most common reason for programs to crash -- and the worst approach to achieve memory protection.
The application is killed without been given a chance to recover or even print a description of what went wrong. If a segfault is trapped the application is left in an undefined state. It can try to print a description or do something else, but nobody knows what will happen.
This could have been solved much cleaner and more practical by throwing an exception at the point of failure.
File Transfer Pain
EverythingIsAFile, right? Wrong.
Unix is undistributed. You can't, for example, just work with files regardless of where they are located.
One good example is FTP. Traditionally, working with files via FTP on a user interface level is done by using an FTP client. This is probably the most brain-dead way how such a task could be accomplished:
- The user's shell loses part of its control,
- input is handled by the shell that the FTP client provides.
- No access to external programs or at least not in the usual way.
- No output redirection or similar shell features.
- Files can not be created in the usual manner, if at all.
In short, there is no consistency at all in administering files locally vs. via FTP. Simple things are inconvenient, complex things nearly impossible. The client is the limiting factor of all the features your operating system or your shell provide.
A workaround is setting up an FTP filesystem which allows you to mount a directory of an FTP server to your local filesystem hierarchy, thus allowing you to almost work with them like usually.
This is, AFAIK, the default on Plan9. On GNU/Linux it can be done by using curlftpfs:
curlftpfs -f ftp.example.org ~/ftp.example.org
When the domain is mounted, you can use rsync to upload new files:
rsync -Pprvu --delete ~/mirror/ftp.example.org ~/ftp.example.org
Everything is a file, right?
Wrong.
Some data is represented as a file (system hierarchy), but much of it is not: Sessions, users, groups, processes (/proc just sucks) and many more are not represented as files. Even files on remote machines are often not represented as live files.
Files and directories are the sole components of traditional operating systems that can be understood quite intuitionally: People usually think in terms of objects; both file and directory are objects. A file is a very primitive one, whereas a directory is a container that can hold multiple objects (but is still very primitive). Going a step backward and storing/handling/presenting data on a non file-based manner strips it off its last concrete substance. How's the user supposed to work with something s/he can't grasp?
This is no plea for file-based programs. File systems and especially file-based programs are, after all, only rudimentary object-orientation surrogates. The step backwards that I mentioned above is necessary, to leap over all those decades for which user interfaces have not improved at all, to discard file systems and create an intuitive, consistent and transparent object oriented system.
You might want to check out the archive of posts tagged "UNIX".