Recently by Seth Schroeder

Emacs is one of those polarizing topics among software developers. Maybe it's tripolar: Camp Emacs spars with Camp Vi to the great amusement of Camp Everyone Else. The heckling is especially raucous from Camp Java/C# IDE whose devotees ritually launch The Environment to press the '.' and ';' keys all day.

To make up for that rotten attempt at humor I decided to spend some time explaining why and how I use Emacs. This blog post covers the background, the why and when I switched. The next post will cover how I use/depend on it in my daily workflow.

When MacOS X first came out I was a certified Metrowerks CodeWarrior fanatic. They had the coolest booths at MacWorld, a killer logo, and everyone drooled over the new CWDev tools CD arrived. That powerful, graceful IDE killed Apple's Macintosh Programmers' Workshop (MPW). Now that IDE was a strange beast:

  • Ugly, ugly, ugly, and really unintuitive; its icon had binary on it (decimal value 13 I think? 01101?)
  • A C compiler with a good sense of humor
  • Regular expressions with bizzaro 8-bit characters in the syntax
  • Projector, the unlovable source code pseudo-control software
  • Locked in a mortal battle with Jasik's Debugger, which
    1. Was even uglier and less intuitive than Macsbug.
    2. Was the only game in town for source level debugging of system extensions.
    3. Had "shortcut" commands which used multiple meta keys at once
    4. Was so uncool it was trés cool

Metrowerks really ruled the Classic/Carbon Mac development roost for a while. It was really good software. On awesome days I got use MWPro, Jasik, Resorcerer, and Installer VISE.

The fate of these tools was sad but clear when MacOS X arrived. Apple shed everything except its logo. Lots of good reasons to be sure but the process was messy. All the cool tools on my nerd belt could never keep up with the free, bundled tools built with asymmetric knowledge of Darwin and dyld(1).

At this point my current project was written in C++ and built on several platforms. It used imake to build all platforms from a single imakefile... except the Mac build which was a maintenance drag. MacOS X was Unix, right, so all we had to do was port imake to a new platform (arggggggghhhhh). All the imake work in Terminal.app needed a good text editor. XCode wasn't it. CodeWarrior was doomed. I needed something else.

15 seconds with vi was quite enough, thank you.

I remembered this program called Emacs. Sure enough, it was installed. And boy howdy was it weird. Seriously, I had to press control-x and then control-s to save a file???! control-x and control-c to quit?!!! escape-w to paste??? ... for just one brief moment vi started to look better...

... but then Jasik's Debugger came to mind. It used multiple meta keys at once and that was okay, despite them appearing with Ω characters in the menu bar.

That transition to accepting Emacs as probably worth learning happened 5-6 years ago. Emacs is the only program I have used literally almost every work day since then (even using it right now :-). Initially I just used it to flit between a bunch of files quickly and keep a shell prompt up all day. Just in the last year Emacs has become even much more important as a task organizer and scheduler. But more on that in the next post!

Emacs is just one of the horses for a plethora of courses. I prefer vi for quick edits / wimpy *nix installations, and Eclipse for Java coding. Also, while GNU Emacs is a good distribution Carbon Emacs is not to be missed if you are running a Mac. Grab a copy and have some fun with...

  • M-x tetris
  • M-x list-colors-display
  • M-x animate-birthday-present
  • M-x glasses-mode (BestAppliedToCamelCaseText)
  • M-x artist-mode, then M-x artist-select-op-spray-can, then click and drag

Task organization, SQL, and VCS chores are the killer apps within Emacs for me. I'll be discussing those in the next blog post. Until then please respond with questions and comments and (good) jokes about Emacs. If by any chance your curiousity was piqued then run, don't walk to the Emacs user community wiki.

(This article was previously posted to my personal blog.)

What's a Bloom filter?

A Bloom filter is a memory and cpu efficient way to remember which things have been seen and which haven't. They are more familiar than programmers realize; the key set of a hash map is loosely a simple Bloom filter. I haven't had a need for this level of efficiency, but it's a neat tool to keep in mind.

A hash map works by applying a hash function to data, and associating the output with the input data. Considering a simple hash map which stores only values, and not buckets of values:

Traits of a hash map key set

  1. After an item has been stored, its existance will be known. It will never go missing. There is no chance of saying an item has not been stored after it has been stored.
  2. After an item has been stored, it may be identified as a different item. This happens when the hash function generates the same value for distinct data. So there is a risk of falsely confirming item B has been stored after only storing item A.
  3. The space required to hold these keys will be the product of the key size and key count. While key size is probably fixed, key count is variable and will lead to more space needed as more items are added.

Introducing Bloom filters

A Bloom filter shares traits 1 and 2 of a hash map key set, but not trait 3. The space required by a Bloom filter does not increase as more items are stored. Rather than processing data with one hash function and using the result as a lookup value, a Bloom filter uses multiple hash functions to identify booleans in a fixed size array.

Bloom filter traits

  1. Same as hash map key set trait 1
  2. Same as hash map key set trait 2
  3. Storage space is small, but depends on the desired false positive rate
  4. Lookup time depends on number of hash functions; can be run in parallel
  5. Items cannot be removed
These traits are thoroughly discussed by the articles in the reference section. They are required reading for serious implementations of a Bloom filter. That's outside the scope of this blog post -- so onto a trivial implementation:

Code

This tacky, easily improvable code was released to the public domain and promises nothing.
 0: class MediocreBloomFilter
 1:   M = 16                # number of boolean fields in filter
 2:   K = 3                 # number of hash functions to find boolean fields
 3: 
 4:   attr_accessor :v      # the filter, a vector of boolean fields
 5: 
 6:   def initialize()
 7:     @v = Array.new(M, 0)
 8:   end
 9: 
10:   def track_item(item)
11:     indicies_for?(item).each do |i|
12:       @v[i] = 1
13:     end
14:   end
15: 
16:   def item_tracked?(item)
17:     hits = 0
18:     indicies_for?(item).each do |i|
19:       hits = hits + @v[i]
20:     end
21:     hits == K
22:   end
23: 
24:   def indicies_for?(item)
25:     indicies = []
26:     md5 = Digest::MD5.new
27:     hash = ""
28: 
29:     md5.update(item.to_s)
30:     hash = md5.hexdigest
31: 
32:     K.times do |k|
33:       indicies << HEX.index(hash[k].chr.to_s.upcase)
34:     end
35: 
36:     indicies
37:   end
38: 
39:   HEX = [ "0", "1", "2", "3", "4", "5", "6", "7",
40:           "8", "9", "A", "B", "C", "D", "E", "F" ]
41: end
      
  • Line 7 sets the filter storage to 16 integers. It could have been two bytes if I wanted to do bitwise operations.
  • Line 24 runs the hash functions against the target data. Since this is tacky sample code only one hash function is executed (line 30), and multiple slices are taken from the value and converted into array indicies (at line 33).
  • Using integers rather than booleans has a nice side effect. It permits removal of items by using the integers as reference counts. I haven't tested it, but minor changes should be needed around lines 12 and 19, and a removal function to decrement the counts.
So that's all well and good, but let's take a quick look at the filter as it goes through some testing. Here is the filter immediately after creation: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] Now let's add the string "abc" and inspect things:
  • item: "abc"
  • md5: 900150983cd24fb0d6963f7d28e17f72
  • indicies: [9, 0, 0]
  • filter: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Adding "abd"...
  • item: "abd"
  • md5: 4911e516e5aa21d327512e0c8b197616
  • indicies: [4, 9, 1]
  • filter: [1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
Finally, adding my name:
  • item: "seth schroeder"
  • md5: 2c38f5fd13873c139567551ca1b7496e
  • indicies: [2, 12, 3]
  • filter: [1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
This code is obviously weak, but enough to demonstrate the approach. One thing that stands out is how many hash functions should be used? False positives will be high when too few are used, and when too many are used. Fortunately smarter people have thought this out -- that's my final plea to please peruse the references.

References

  1. Using Bloom Filters
  2. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
  3. Burton Bloom, Space/time trade-offs in hash coding with allowable errors, CACM, 13(7):422-426, July 1970.
  4. Paul E. Black, "Bloom filter", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 May 2008. Available from: http://www.nist.gov/dads/HTML/bloomFilter.html

P.S.: If you've read this far THANK YOU. This entry has also been posted to my personal blog.

Problem 10 of Project Euler is to sum the prime numbers less than two million. Solutions are supposed to take no more than one minute, so a decent prime number detector is mandatory. I'm sure lots of free, high quality solutions are available, but I have this problem with DIY / NIH syndrome. Somehow I found the sieve of Eratosthenes and a quick, tacky swipe at it was good enough.

This process is to make a list of numbers from 2 through some limit. Start with the lowest number in the list, and remove all multiples from the list. Next time through, pick the next smallest number that hasn't been crossed off yet and eliminate its multiples. Stop when your current number meets the square root of the limit. After that, the remaining numbers are prime:

def eratosthenes_sieve(ceiling)
  vals = (0..ceiling).to_a
  vals[0..1] = [nil, nil]

  Math.sqrt(ceiling).floor.times do |val|
    next unless vals[val]
    
    ((val ** 2)..ceiling).step(val) do |mult|
      vals[mult] = nil
    end
  end
  
  vals
end

A big hash like this is useful for O(1) checking for prime/not prime, but the solution requires the sum of the primes:

def primes_through(ceiling)
  eratosthenes_sieve(ceiling).compact
end

sum = 0
primes_through(ARGV[0].to_i).each { |val| sum = sum + val }
puts sum

I'm sure there's a more Ruby-esque way to write that. I wanted to use Python for xrange goodness, but I'm still not at peace with its quirks. Groovy would have been nice, but the free editor support is still quite young. Ruby fit right in the gap.

Updated 2008-05-02: new Python implementation using sum() -- thanks Pete!

My blog post about the first problem in Project Euler went strange places. I was just trying to learn Python by solving a barely non-trivial problem. Maybe it was a full moon, not sure, but it wound up in a tangle of gcc, groovyc, and .elc. Happily, some good performance improvements were suggested and I've tried to incorporate them below.  

This is a really long post. Here's the skeleton:

  1. Runtime environment
  2. Results
  3. Code (alphabetically by language)
    1. C
    2. Groovy
    3. Lisp
    4. Perl
    5. Python
    6. Ruby
  4. Big O(rly)?
  5. Wrap up
  6. Extras

 

Runtime environment

  • Machine: Powerbook Pro, 10.5.2, 2x 2.16 GHz Core 2 Duo
  • Userspace programs: Finder, Terminal
  • Battery conservation mode: Better Performance (plugged in)
  • C: GCC 4.0.1 (Apple Inc. build 5465)
  • Groovy: 1.5.5 JVM: 1.5.0_13-119
  • Lisp: GNU Emacs 22.1.1
  • Perl: v5.8.8, built for darwin-thread-multi-2level
  • Python: 2.5.1
  • Ruby: 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
(top)  

Results

No cheesy graphs this time, just better organized numbers. Each row has one implementation of the problem. The 10**x columns show how many milliseconds it took the implementation to sum the multiples of three and five less than that number. Each implementation was run seven times for each column, and the average runtime is shown below. A test harness was written to execute the scripts automatically; I disabled the screensaver, started the tests, and went to bed.

Code 10**1 10**2 10**3 10**4 10**5 10**6 10**7 Comments
first.c55551165648
first.groovy928929106516432155799852732
second.groovy9201004118913531806324319620
third.groovy91488093611811421254512835
first.pl881026192188618845
second.pl88811363783062
first.py22222335160150614939
second.py147222124534224242
third.py22222224514054081
fourth.py21222122261561580Credit goes to Pete below
first.rb98920134142014288
second.rb98917100107510818
first.el2014110^1...6 not available
second.el8172Actually first.elc
instant.rb8888888

 

Code

 

C

#include <stdio.h>
#include <math.h>

#define multiple_of(a, b) ((a == 0 || b == 0) ? 0 : a % b == 0)

int main(int argc, char** argv)
{
    int exp = atoi(argv[1]);
    double sum = 0;
    int i;

    for (i = 3; i < pow(10, exp); i++)
        if (multiple_of(i, 3) || multiple_of(i, 5))
            sum += i;

    printf("%0.0f\n", sum);

    return 0;
}

 

Groovy

 
// first.groovy
exp = Integer.parseInt(args[0])

mof = { val, test -> val && test && test % val == 0 }
mof3 = mof.curry(3)
mof5 = mof.curry(5)

sum = 0 as BigDecimal

(3..<(10**exp)).each {
    if (mof3(it) || mof5(it)) {
        sum += it
    }
}

println sum

 
// second.groovy
def exp = Integer.parseInt(args[0])

def mof = { val, test -> val && test && test % val == 0 }
def mof3 = mof.curry(3)
def mof5 = mof.curry(5)

def sum = 0 as BigInteger

(3..<(10**exp)).each {
    if (mof3(it) || mof5(it)) {
        sum += it
    }
}

println sum

 
// third.groovy
def exp = Integer.parseInt(args[0])

def sum = 0 as BigInteger

for (int i = 3; i < 10 ** exp; i++)
    if (i % 3 == 0 || i % 5 == 0)
        sum += i

println sum
 

Lisp

; first.el
(defun multiple_of (a b)
  (= 0 (% a b))
)

(defun val-to-add (val)
  (cond
   ((multiple_of val 3) val)
   ((multiple_of val 5) val)
   (0)
   )
  )

(defun sum-of-3or5-multiples (exp)
  (let ((num 3) (sum 0.0))
    (while (< num (expt 10 exp))
      (setq sum (+ sum (val-to-add num)))
      (setq num (1+ num))
      )
    sum
    )
  )

(sum-of-3or5-multiples 7)
 

Perl

 
#!/usr/bin/env perl -w
# first.pl
use strict;

my $exp = $ARGV[0];
my $sum = 0;

sub multiple_of {
    my ($val, $tgt) = @_;
    return $val % $tgt == 0;
}

for my $i (3 ... 10 ** $exp - 1) {
    if (multiple_of($i, 3) || multiple_of($i, 5)) {
        $sum += $i
    }
}

print "$sum\n"

 
#!/usr/bin/env perl -w
# second.pl

use strict;

my $exp = $ARGV[0];
my $sum = 0;

for my $i (3 ... 10 ** $exp - 1) {
    if ($i % 3 == 0 || $i % 5 == 0) {
        $sum += $i
    }
}

print "$sum\n"
 

Python

 
# first.py
import sys

def multiple_checker(val):
    return lambda test: test and val and test % val == 0

mof3 = multiple_checker(3)
mof5 = multiple_checker(5)

sum = 0

for i in range(10 ** int(sys.argv[1])):
    if mof3(i) or mof5(i):
        sum = sum + i

print sum

 
# second.py
import sys

exp = int(sys.argv[1])
sum = 0

for mof3or5 in set(range(3, 10**exp, 3)) | set(range(5, 10**exp, 5)):
    sum = sum + mof3or5

print sum

 
# third.py
import sys

exp = int(sys.argv[1])
sum = 0

for mof3or5 in set(xrange(3, 10**exp, 3)) | set(xrange(5, 10**exp, 5)):
    sum = sum + mof3or5

print sum

 
# fourth.py
import sys

exp = int(sys.argv[1])
total = sum(xrange(3, 10**exp, 3)) + sum(xrange(5, 10**exp, 5)) - sum(xrange(15, 10**exp, 15))

print total
 

Ruby

 
# first.rb
exp = ARGV[0].to_i
sum = 0

def is_mult(large, small)
  large % small == 0
end

for i in 3..(10 ** exp - 1)
  next unless is_mult(i, 3) or is_mult(i, 5)
  sum = sum + i
end

puts sum

 
# second.rb
exp = ARGV[0].to_i
sum = 0

for i in 3..(10 ** exp - 1)
  next unless i % 3 == 0 or i % 5 == 0
  sum = sum + i
end

puts sum
(top)

 

Big O(rly)?

Well... sigh... all of the code above works correctly but is slow and naïve. All of the approaches above take more time as the largest multiple increases. I was ignorantly, happily tweaking code down this road until fortunately Bob clued me in. The best approach takes the same amount of time as the largest multiple increases. Why? There's a simple equation for this, calculating an arithmetic series.  

# instant.rb
ceil = 10 ** ARGV[0].to_i - 1

def series_of_multiples(val, ceiling)
  diff = val
  vals = ceiling / val
  last_val = val + diff * (vals - 1)
  (val + last_val) / 2.0 * vals
end

def som(v, c) series_of_multiples(v, c).to_i end

puts som(3, ceil) + som(5, ceil) - som(15, ceil)
 

Wrap up

  • I got a lot more out of this than a quick glance at Python. It was good to calculate a series or two, but much better to build a little test harness.
  • Groovy performance tested out well here. I would avoid using some of its features in tight loops. I'm confident that Monsieur LaForge et son amis will optimize that concern away eventually.
  • I thought I could love it, I really wanted to love it, it's loved by many for many good reasons, but Python started to chafe me. It's so hard to nail down exactly why... it's chock full of goodness, but just seems to need more Kool-Aid than Groovy and especially Ruby. Perl's Kool-Aid threshold is a well-known constant, at least until the Parrot changes things drastically.
  • Cosine similarity + arithmetic series = it's good to work with a friendly, helpful PhD. in math. Thanks Bob :-)

Extras

The Python test harness. Probably needs tweaking for Wintel boxes.

import sys
import re
from datetime import timedelta, datetime
from os import system

#
# prefix used to run scripts. Trailing whitespace matters!
#
vms = { 
        'el':           'emacs --script ',
        'elc':          'emacs --script ',
        'groovy':       'groovy ',
        'out':          './',
        'pl':           'perl ',
        'py':           'python ',
        'rb':           'ruby '
        }

# Configurable parameters
# 
DEFAULT_MIN = 1
DEFAULT_MAX = 7
min = DEFAULT_MIN       # min "max" val = 10 ** min
max = DEFAULT_MAX       # max "max" val = 10 ** max
reps = 7                # times to repeat each test, then average the results


# Helper method
def td_to_ms(delta):
    val = delta.microseconds / 1000.0
    val = val + delta.seconds * 1000
    val = val + delta.days * 24 * 60 * 60 * 1000
    return val

#
# "main"
#
for script in sys.argv[1:]:
    ext = script.split('.').pop()
    if not ext in vms:
        print 'ignoring', script, 'unrecognized extension'
        continue

    vm = vms[ext]

    # the lisp files hardcode the max values :(
    if re.search('^el', ext):
        max = min
    else: # reset to defaults, needed when running multiple types of tests
        max = DEFAULT_MAX
        min = DEFAULT_MIN

    # gotta like list comprehensions
    runs = [0 for x in range(max - min + 1)]

    for rep in xrange(reps):
        for exp in xrange(min, max + 1):
            start = datetime.now()
            system('%s%s %s >/dev/null' % (vm, script, exp))
            end = datetime.now()
            runs[exp - min] = runs[exp - min] + td_to_ms(end - start)

    for i in xrange(len(runs)):
        runs[i] = int(runs[i] / reps)

    print script.split('/').pop(), runs

    # see, isn't it strange how python leaves you hanging at the end
    # of the last scope in the file?

Nice, Python.

| | Comments (4) | TrackBacks (0)

Edit 2008-04-29: more and better results available here.

I've been waffling about glancing at Python. It comes off as a little ... dry? Indentation drives scope? I was afraid it would be another Objective-C; popular, modeled after successful languages, but just gets stuck in my mental parser. Even Visual Basic and Pascal (q.v. Inside Macintosh) passed that gate.

Anyways, Project Euler seemed like a good way to learn a new language. Problem #1 is summing multiples of 3 or 5 less than one thousand. Some basic iteration handles the task pretty easily -- might as well be all trendy and mix in lambda functions. And, while the hood is open, why not try it in a couple of languages and generate an unscientific performance graph?

So back to the implementation -- the iterative approach is easy. But Python 2.5-ish natively supports set operations like union and intersection. Combine that with a built-in method to generate numeric sequences and the problem is basically solved. For the win -- it outperformed the iterative approach! Oh, and every language tested except Python showed negative numbers when the sum scrolled past 2**31.

Groovy, Python, C, and Emacs Lisp competed for best performance. And by competed, I mean quick and dirty code (esp. Lisp) run a few times on my MacBook Pro. Graphics first, then the code.


perf graph


perf stats

The lisp numbers are the sketchiest -- (current-time) only spit out seconds.

Basic info: Groovy 1.5.5 binary release, Python 2.5.1 shipped with Leopard, GCC 4.0.1, Carbon Emacs Spring 2008


Python, iterative

import sys

def multiple_checker(val):
    return lambda test: test and val and test % val == 0

mof3 = multiple_checker(3)
mof5 = multiple_checker(5)
sum = 0

for i in range(10 ** int(sys.argv[1])):
    if mof3(i) or mof5(i):
        sum = sum + i

print sum

Python, set oriented
import sys

exp = int(sys.argv[1])
sum = 0

for mof3or5 in set(range(3, 10**exp, 3)) | set(range(5, 10**exp, 5)):
    sum = sum + mof3or5

print sum

Groovy, iterative
exp = Integer.parseInt(args[0])

mof = { val, test -> val && test && test % val == 0 }
mof3 = mof.curry(3)
mof5 = mof.curry(5)

sum = 0 as BigDecimal

(3..<(10**exp)).each {
    if (mof3(it) || mof5(it)) {
        sum += it
    }
}

println sum

Emacs Lisp, iterative (be nice, I'm really weak here)
(defun multiple_of (a b)
  (= 0 (% a b))
)

(defun val-to-add (val)
  (cond
   ((multiple_of val 3) val)
   ((multiple_of val 5) val)
   (0)
   )
  )

(defun sum-of-3or5-multiples (exp)
  (let ((num 3) (sum 0.0))
    (while (< num (expt 10 exp))
      (setq sum (+ sum (val-to-add num)))
      (setq num (1+ num))
      )
    sum
    )
  )

C, iterative. Not very much code here, would have probably taken more lines of Java!
#include <stdio.h>
#include <math.h>

#define multiple_of(a, b) ((a == 0 || b == 0) ? 0 : a % b == 0)

int main(int argc, char** argv)
{
    int exp = atoi(argv[1]);
    double sum = 0;
    int i;

    for (i = 3; i < pow(10, exp); i++)
        if (multiple_of(i, 3) || multiple_of(i, 5))
            sum += i;

    printf("%0.0f\n", sum);

    return 0;
}