- agg, the news aggregator
- about
- news
- changelog
- dependencies
- install
- faq
- Writing file names that are are specified in the feed? What about security?
- But a malicious feed could use up all space/inodes.
- Why no download mechanism?
- But do I have to download the feed by hand?
- But it only works on a single feed!
- Why no user interface?
- How to fetch only new items from feeds that don't use publication dates?
- bugs to be fixed
- authors
- repo
- homepage
- download
- license
agg, the news aggregator
about
agg is a news aggregator (currently RSS 2.0 only) for POSIX-compliant systems (currently tested on GNU/Linux only).
It follows the UNIX philosophy and simply reads a news feed from stdin and creates or updates a filesystem representation of that feed.
No command line parameters, no user interface, not even networking.
news
- 2011-06-26 agg-0.3.0 released
- 2011-05-11 agg-0.2.1 released
- 2011-05-10 agg-0.2.0 released
- 2011-04-16 agg-0.1.1 released
- 2011-04-08 agg-0.1.0 released
- 2011-04-01 development started
changelog
2011-06-26 agg-0.3.0
- When items have conflicting names, the one with the most recent publication date will now be stored.
- Items are now allowed to be ordered arbitrarily.
- Properties of items are now allowed to be ordered arbitrarily.
- Fixed minor bugs in handling broken feeds.
2011-05-11 agg-0.2.1
- Adjusted documentation.
- Fixed install target of makefile.
2011-05-10 agg-0.2.0
- Tests and refactoring.
- New output format, no HTML output anymore.
- Now requiring that title or description of items come first, and title has to come before description.
- Made nomtime work from outside of feed directory
2011-04-16 agg-0.1.1
- Included proper README.
- Included nomtime in make targets.
2011-04-08 agg-0.1.0
Initial release.
dependencies
- libexpat
install
make test install
For configuration see Make.config.
Please, run the test suites, they've been written for you and take few seconds on a 500 MHz CPU anyways.
faq
Writing file names that are are specified in the feed? What about security?
agg removes all slashes from file and directory names before they are written, so everything ends up where it belongs. You should run it in a dedicated directory, though.
But a malicious feed could use up all space/inodes.
Depends on your operating system (configuration). It's not the job of a news aggregator to enforce quotas.
Why no download mechanism?
Because it's a news aggregator, not a download-and-news-aggregation-program.
But do I have to download the feed by hand?
wget $URL -O - | agg
But it only works on a single feed!
for feed in `cat feeds`; do
(wget $feed -qO - | agg) &
done
You get the point.
Why no user interface?
Because it's a news aggregator, not a download-and-news-aggregation-and-news-reader-program. The file system hierarchy is pretty much usable using various unixoid tools.
Sky is the limit. Feel free to write your own frontend; you should be able to find mine on my blog.
How to fetch only new items from feeds that don't use publication dates?
Not supported by agg itsself, since it would require a second level storage that contains (hashes of) everything the agg directory contained -- including items you explicitly deleted. You can easily build such functionality on top using a few lines of shell code.
Again, its a news aggregator not a caching program.
bugs to be fixed
- Supports only RSS.
- Currently only tested on GNU/Linux.
- Uses fixed size buffers to simplify code. May lead cut-off news texts. The chances for this to happen are rather low and without much consequences (you can always follow the link). If you encounter a link that is larger than 8KiB, let me know.
- Assumes items only change if their publication date changes. Again, for simplicity.
- Creation of a "sub-feed" directory if the channel contained an element that had a title tag but is not an item.
- Supports only dates that have their time zone formatted as +xxxx, not as their abbreviation.
- Item titles may conflict, especially if they were too long and have been cutted. In this case, the item with the most recent publication date will be stored on the disk.
- Standard mtime for items without pubDate should be now.
- Sometimes, mtime of feed directory is set to current time. This seems to happen only when a "new" item is not already stored locally. If it is, the mtime is not modfified.
authors
- Andreas Waidler arandes@programmers.at
repo
- git://repo.or.cz/agg.git
- http://www.repo.or.cz/w/agg.git
homepage
download
- http://programmers.at/work/on/agg/agg-0.3.0.tar.gz
- http://programmers.at/work/on/agg/agg-0.2.1.tar.gz
- http://programmers.at/work/on/agg/agg-0.2.0.tar.gz
- http://programmers.at/work/on/agg/agg-0.1.1.tar.gz
- http://programmers.at/work/on/agg/agg-0.1.0.tar.gz
license
Copyright (C) 2011 Andreas Waidler arandes@programmers.at
Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.