Archive for April, 2011
Dear ghod I wish so much to be able to type:
man js.Math or man
Perhaps I should write them as a perl module on cpan that’s just pure pod :-D
I just wanted to post a note here about the fact that there’s a couple of ways to run regexps in JS.
- Via the string itself: str.match(regexp)
- Via the regexp: regexp.exec(str)
Both of those methods of execution return the same thing (an array of stuff, I won’t bore with the details here).
But one way has a distinct advantage IMHO: If you run regexp.exec() it will call toString() on the str if it is an object that isn’t a string. This is more like the perl way of doing things. Whereas if you have an object that can stringify, you can’t call object.match() on it and expect it to work.
Also you don’t have to have a regexp object there – JS supports perl’s shorthand notation: /foo/i.exec(…)
The one thing that I find it staggering to believe that is missing though is any sort of quotemeta function. There’s no safe/secure way to create a regexp out of an external variable without rolling your own quotemeta(). That’s unbelievably dumb to me. I hope the ECMA guys realise this and put it into the RegExp object for the next major revision.
So it turns out, reading the source code, that node’s “fs” module does expose fsync. It’s simply undocumented…
Having been in the Node community for about 2 months now it’s very clear that 99% of users of node are using it to build some sort of web site or framework. This is obviously one of Node’s strong points, due to the stellar HTTP support built into the API.
So then I built Haraka, a very capable SMTP server written in Node that uses plugins for all of its functionality. Those who have used Qpsmtpd or Lamson will find this concept familiar. The key idea with these is to be a front-end to an email delivery system such as Postfix or Qmail, and provide easier configuration and hackability in a much easier to code language.
Node’s strong point is network servers. Unfortunately that does mean it’s weak point is command line tools. There’s no “getoptions” parsing provided in the Node API which is a big downer, and furthermore there’s the fact that a number of libraries use console.log() for debug information, which by default on Node goes to stdout, which is bad if you’re piping tools together and you get unexpected output on stdout. There’s been various discussions recently about this on the mailing list. My basic feeling on this is that libraries should do no logging by default, but have a way to turn debugging on, and that should go to stderr (probably via util.debug()). If you find any libraries that don’t do this then patch them.
There’s another problem I’m coming across just now – there’s no fsync exposed in Node. There’s callbacks to write() which get called “when your data is written” according to the docs, but I’m pretty sure that just means it’s in the hardware buffers, not synced to disk. There’s a callback to close() which *might* be enough, but I remain unconvinced. This stuff is important when writing an SMTP server as you can’t return “250 Queued” to the sender until you can guarantee that the data is on the disk.
I guess overall this post was meant as a callout to anyone looking at writing network-y type stuff regardless of whether that’s for the web or not. Node.js is great for writing web stuff, but it’s also pretty damn good at writing non-web stuff too. More importantly it’s really fast for a dynamic language. Current benchmarks seem to show it is second only to Lua (with JIT) (and maybe PyPy, but that only seems faster in microbenchmarks). But nobody seems to write anything major with Lua, apart from using it as a language to embed in some application.
I’ve just created a nice little plugin which blocks mail (via FROM address) based on mails I forward to a certain address (which I’m not telling you what it is!).
It takes advantage of one of the nice features of Haraka – that config files are automatically reloaded when they change (if you don’t want this to happen, load your config in exports.register() into the transaction object or a plugin-global variable). So by simply appending to the relevant config file it auto-updates the blocklist configuration and is used immediately without a restart.
I’m hoping this will help deal with my foreign spam problem. Most of it seems to come from the same addresses over and over.
Currently the only spam that makes it into my inbox (aside from the odd 419 here and there) is foreign spam.
It seems like these spams don’t end up in SURBL/URIBL or in SpamHaus SBL, possibly because of low visibility by those places.
I imagine that creating some sort of general rule for these is going to be pretty hard. They all tend to be in fairly normal character sets (ISO-8859-1 or one of the Windows-* types) and so I’m going to have to do some level of language analysis.
One thing that seems fairly consistent in Spanish/Portugese spams is the use of “e” as a word on its own. However given the number of geek mailing lists I’m on that might come up as a variable name (or the mathematic constant), so that alone won’t be enough.
Another option is some sort of heuristic language detection like “TextCat”, but that won’t work terribly well for these as a lot of them are mostly images.
Any suggestions here would be most welcome.
Haraka has been sitting there doing all my inbound mail receiving overnight, and no hiccups so far.
It’s using 10M of ram (the outbound server is using just 1.7M), which is a LOT less compared to the equivalent Qpsmtpd. I guess this is due to the fact that v8 actually properly garbage collects and releases memory to the system (which perl doesn’t – it just keeps hold of memory).
So last night I got enough done on Haraka that I could replace Qpsmtpd for my inbound mail. It’s sitting on my server running quite nicely now, and you can see real-time graphs of what rejects what at http://www.sergeant.org/graphs/.
Please try it out and let me know what you think!
I got mail parsing in Haraka implemented now.
And the signatures plugin fell into place immediately after that…
I’ve got URIBL working up to extracting the URLs… Now I just need to get it sending DNS queries. Then we’re done!
This weekend was my birthday so I didn’t make a whole lot of progress on Haraka, but I got mail header parsing working (mostly – I have yet to do the character set decoding) and that allowed me to implement a bunch of plugins which look for specific headers.
I really only have two plugins left to implement before I can replace sergeant.org’s inbound SMTP server with this.
The first one is a simple signatures plugin. My current implementation of that in Qpsmtpd doesn’t decode the email so this should be trivial.
The second one presents more work – it’s the URIBL plugin (which checks URLs in emails against databases like SURBL). To do this I need to decode the email (means looking at multipart and MIME encoded emails and doing the right thing), and I need to write a URL extractor, and a HTML parser. All a fair bit of work.