Recently by Seth Schroeder
Emacs is one of those polarizing topics among software developers. Maybe it's tripolar: Camp Emacs spars with Camp Vi to the great amusement of Camp Everyone Else. The heckling is especially raucous from Camp Java/C# IDE whose devotees ritually launch The Environment to press the '.' and ';' keys all day.
To make up for that rotten attempt at humor I decided to spend some time explaining why and how I use Emacs. This blog post covers the background, the why and when I switched. The next post will cover how I use/depend on it in my daily workflow.
When MacOS X first came out I was a certified Metrowerks CodeWarrior fanatic. They had the coolest booths at MacWorld, a killer logo, and everyone drooled over the new CWDev tools CD arrived. That powerful, graceful IDE killed Apple's Macintosh Programmers' Workshop (MPW). Now that IDE was a strange beast:
- Ugly, ugly, ugly, and really unintuitive; its icon had binary on it (decimal value 13 I think? 01101?)
- A C compiler with a good sense of humor
- Regular expressions with bizzaro 8-bit characters in the syntax
- Projector, the unlovable source code pseudo-control software
- Locked in a mortal battle with Jasik's Debugger, which
- Was even uglier and less intuitive than Macsbug.
- Was the only game in town for source level debugging of system extensions.
- Had "shortcut" commands which used multiple meta keys at once
- Was so uncool it was trés cool
Metrowerks really ruled the Classic/Carbon Mac development roost for a while. It was really good software. On awesome days I got use MWPro, Jasik, Resorcerer, and Installer VISE.
The fate of these tools was sad but clear when MacOS X arrived. Apple shed everything except its logo. Lots of good reasons to be sure but the process was messy. All the cool tools on my nerd belt could never keep up with the free, bundled tools built with asymmetric knowledge of Darwin and dyld(1).
At this point my current project was written in C++ and built on several platforms. It used imake to build all platforms from a single imakefile... except the Mac build which was a maintenance drag. MacOS X was Unix, right, so all we had to do was port imake to a new platform (arggggggghhhhh). All the imake work in Terminal.app needed a good text editor. XCode wasn't it. CodeWarrior was doomed. I needed something else.
15 seconds with vi was quite enough, thank you.
I remembered this program called Emacs. Sure enough, it was installed. And boy howdy was it weird. Seriously, I had to press control-x and then control-s to save a file???! control-x and control-c to quit?!!! escape-w to paste??? ... for just one brief moment vi started to look better...
... but then Jasik's Debugger came to mind. It used multiple meta keys at once and that was okay, despite them appearing with Ω characters in the menu bar.
That transition to accepting Emacs as probably worth learning happened 5-6 years ago. Emacs is the only program I have used literally almost every work day since then (even using it right now :-). Initially I just used it to flit between a bunch of files quickly and keep a shell prompt up all day. Just in the last year Emacs has become even much more important as a task organizer and scheduler. But more on that in the next post!
Emacs is just one of the horses for a plethora of courses. I prefer vi for quick edits / wimpy *nix installations, and Eclipse for Java coding. Also, while GNU Emacs is a good distribution Carbon Emacs is not to be missed if you are running a Mac. Grab a copy and have some fun with...
- M-x tetris
- M-x list-colors-display
- M-x animate-birthday-present
- M-x glasses-mode (BestAppliedToCamelCaseText)
- M-x artist-mode, then M-x artist-select-op-spray-can, then click and drag
Task organization, SQL, and VCS chores are the killer apps within Emacs for me. I'll be discussing those in the next blog post. Until then please respond with questions and comments and (good) jokes about Emacs. If by any chance your curiousity was piqued then run, don't walk to the Emacs user community wiki.
(This article was previously posted to my personal blog.)
What's a Bloom filter?
A hash map works by applying a hash function to data, and associating the output with the input data. Considering a simple hash map which stores only values, and not buckets of values:
Traits of a hash map key set
- After an item has been stored, its existance will be known. It will never go missing. There is no chance of saying an item has not been stored after it has been stored.
- After an item has been stored, it may be identified as a different item. This happens when the hash function generates the same value for distinct data. So there is a risk of falsely confirming item B has been stored after only storing item A.
- The space required to hold these keys will be the product of the key size and key count. While key size is probably fixed, key count is variable and will lead to more space needed as more items are added.
Introducing Bloom filters
A Bloom filter shares traits 1 and 2 of a hash map key set, but not trait 3. The space required by a Bloom filter does not increase as more items are stored. Rather than processing data with one hash function and using the result as a lookup value, a Bloom filter uses multiple hash functions to identify booleans in a fixed size array.Bloom filter traits
- Same as hash map key set trait 1
- Same as hash map key set trait 2
- Storage space is small, but depends on the desired false positive rate
- Lookup time depends on number of hash functions; can be run in parallel
- Items cannot be removed
Code
This tacky, easily improvable code was released to the public domain and promises nothing.0: class MediocreBloomFilter 1: M = 16 # number of boolean fields in filter 2: K = 3 # number of hash functions to find boolean fields 3: 4: attr_accessor :v # the filter, a vector of boolean fields 5: 6: def initialize() 7: @v = Array.new(M, 0) 8: end 9: 10: def track_item(item) 11: indicies_for?(item).each do |i| 12: @v[i] = 1 13: end 14: end 15: 16: def item_tracked?(item) 17: hits = 0 18: indicies_for?(item).each do |i| 19: hits = hits + @v[i] 20: end 21: hits == K 22: end 23: 24: def indicies_for?(item) 25: indicies = [] 26: md5 = Digest::MD5.new 27: hash = "" 28: 29: md5.update(item.to_s) 30: hash = md5.hexdigest 31: 32: K.times do |k| 33: indicies << HEX.index(hash[k].chr.to_s.upcase) 34: end 35: 36: indicies 37: end 38: 39: HEX = [ "0", "1", "2", "3", "4", "5", "6", "7", 40: "8", "9", "A", "B", "C", "D", "E", "F" ] 41: end
- Line 7 sets the filter storage to 16 integers. It could have been two bytes if I wanted to do bitwise operations.
- Line 24 runs the hash functions against the target data. Since this is tacky sample code only one hash function is executed (line 30), and multiple slices are taken from the value and converted into array indicies (at line 33).
- Using integers rather than booleans has a nice side effect. It permits removal of items by using the integers as reference counts. I haven't tested it, but minor changes should be needed around lines 12 and 19, and a removal function to decrement the counts.
- item: "abc"
- md5: 900150983cd24fb0d6963f7d28e17f72
- indicies: [9, 0, 0]
- filter: [1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "abd"
- md5: 4911e516e5aa21d327512e0c8b197616
- indicies: [4, 9, 1]
- filter: [1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
- item: "seth schroeder"
- md5: 2c38f5fd13873c139567551ca1b7496e
- indicies: [2, 12, 3]
- filter: [1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0]
References
- Using Bloom Filters
- Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
- Burton Bloom, Space/time trade-offs in hash coding with allowable errors, CACM, 13(7):422-426, July 1970.
- Paul E. Black, "Bloom filter", in Dictionary of Algorithms and Data Structures [online], Paul E. Black, ed., U.S. National Institute of Standards and Technology. 16 May 2008. Available from: http://www.nist.gov/dads/HTML/bloomFilter.html
P.S.: If you've read this far THANK YOU. This entry has also been posted to my personal blog.
Problem 10 of Project Euler is to sum the prime numbers less than two million. Solutions are supposed to take no more than one minute, so a decent prime number detector is mandatory. I'm sure lots of free, high quality solutions are available, but I have this problem with DIY / NIH syndrome. Somehow I found the sieve of Eratosthenes and a quick, tacky swipe at it was good enough.
This process is to make a list of numbers from 2 through some limit. Start with the lowest number in the list, and remove all multiples from the list. Next time through, pick the next smallest number that hasn't been crossed off yet and eliminate its multiples. Stop when your current number meets the square root of the limit. After that, the remaining numbers are prime:
def eratosthenes_sieve(ceiling)
vals = (0..ceiling).to_a
vals[0..1] = [nil, nil]
Math.sqrt(ceiling).floor.times do |val|
next unless vals[val]
((val ** 2)..ceiling).step(val) do |mult|
vals[mult] = nil
end
end
vals
end
A big hash like this is useful for O(1) checking for prime/not prime, but the solution requires the sum of the primes:
def primes_through(ceiling)
eratosthenes_sieve(ceiling).compact
end
sum = 0
primes_through(ARGV[0].to_i).each { |val| sum = sum + val }
puts sum
I'm sure there's a more Ruby-esque way to write that. I wanted to use Python for xrange goodness, but I'm still not at peace with its quirks. Groovy would have been nice, but the free editor support is still quite young. Ruby fit right in the gap.
Updated 2008-05-02: new Python implementation using sum() -- thanks Pete!
My blog post about the first problem in Project Euler went strange places. I was just trying to learn Python by solving a barely non-trivial problem. Maybe it was a full moon, not sure, but it wound up in a tangle of gcc, groovyc, and .elc. Happily, some good performance improvements were suggested and I've tried to incorporate them below.
This is a really long post. Here's the skeleton:
- Runtime environment
- Results
- Code (alphabetically by language)
- Big O(rly)?
- Wrap up
- Extras
Runtime environment
- Machine: Powerbook Pro, 10.5.2, 2x 2.16 GHz Core 2 Duo
- Userspace programs: Finder, Terminal
- Battery conservation mode: Better Performance (plugged in)
- C: GCC 4.0.1 (Apple Inc. build 5465)
- Groovy: 1.5.5 JVM: 1.5.0_13-119
- Lisp: GNU Emacs 22.1.1
- Perl: v5.8.8, built for darwin-thread-multi-2level
- Python: 2.5.1
- Ruby: 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
Results
No cheesy graphs this time, just better organized numbers. Each row has one implementation of the problem. The 10**x columns show how many milliseconds it took the implementation to sum the multiples of three and five less than that number. Each implementation was run seven times for each column, and the average runtime is shown below. A test harness was written to execute the scripts automatically; I disabled the screensaver, started the tests, and went to bed.
| Code | 10**1 | 10**2 | 10**3 | 10**4 | 10**5 | 10**6 | 10**7 | Comments |
|---|---|---|---|---|---|---|---|---|
| first.c | 5 | 5 | 5 | 5 | 11 | 65 | 648 | |
| first.groovy | 928 | 929 | 1065 | 1643 | 2155 | 7998 | 52732 | |
| second.groovy | 920 | 1004 | 1189 | 1353 | 1806 | 3243 | 19620 | |
| third.groovy | 914 | 880 | 936 | 1181 | 1421 | 2545 | 12835 | |
| first.pl | 8 | 8 | 10 | 26 | 192 | 1886 | 18845 | |
| second.pl | 8 | 8 | 8 | 11 | 36 | 378 | 3062 | |
| first.py | 22 | 22 | 23 | 35 | 160 | 1506 | 14939 | |
| second.py | 147 | 22 | 21 | 24 | 53 | 422 | 4242 | |
| third.py | 22 | 22 | 22 | 24 | 51 | 405 | 4081 | |
| fourth.py | 21 | 22 | 21 | 22 | 26 | 156 | 1580 | Credit goes to Pete below |
| first.rb | 9 | 8 | 9 | 20 | 134 | 1420 | 14288 | |
| second.rb | 9 | 8 | 9 | 17 | 100 | 1075 | 10818 | |
| first.el | 20141 | 10^1...6 not available | ||||||
| second.el | 8172 | Actually first.elc | ||||||
| instant.rb | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
Code
C
#include <stdio.h>
#include <math.h>
#define multiple_of(a, b) ((a == 0 || b == 0) ? 0 : a % b == 0)
int main(int argc, char** argv)
{
int exp = atoi(argv[1]);
double sum = 0;
int i;
for (i = 3; i < pow(10, exp); i++)
if (multiple_of(i, 3) || multiple_of(i, 5))
sum += i;
printf("%0.0f\n", sum);
return 0;
}
Groovy
// first.groovy
exp = Integer.parseInt(args[0])
mof = { val, test -> val && test && test % val == 0 }
mof3 = mof.curry(3)
mof5 = mof.curry(5)
sum = 0 as BigDecimal
(3..<(10**exp)).each {
if (mof3(it) || mof5(it)) {
sum += it
}
}
println sum
// second.groovy
def exp = Integer.parseInt(args[0])
def mof = { val, test -> val && test && test % val == 0 }
def mof3 = mof.curry(3)
def mof5 = mof.curry(5)
def sum = 0 as BigInteger
(3..<(10**exp)).each {
if (mof3(it) || mof5(it)) {
sum += it
}
}
println sum
// third.groovy
def exp = Integer.parseInt(args[0])
def sum = 0 as BigInteger
for (int i = 3; i < 10 ** exp; i++)
if (i % 3 == 0 || i % 5 == 0)
sum += i
println sum
Lisp
; first.el
(defun multiple_of (a b)
(= 0 (% a b))
)
(defun val-to-add (val)
(cond
((multiple_of val 3) val)
((multiple_of val 5) val)
(0)
)
)
(defun sum-of-3or5-multiples (exp)
(let ((num 3) (sum 0.0))
(while (< num (expt 10 exp))
(setq sum (+ sum (val-to-add num)))
(setq num (1+ num))
)
sum
)
)
(sum-of-3or5-multiples 7)
Perl
#!/usr/bin/env perl -w
# first.pl
use strict;
my $exp = $ARGV[0];
my $sum = 0;
sub multiple_of {
my ($val, $tgt) = @_;
return $val % $tgt == 0;
}
for my $i (3 ... 10 ** $exp - 1) {
if (multiple_of($i, 3) || multiple_of($i, 5)) {
$sum += $i
}
}
print "$sum\n"
#!/usr/bin/env perl -w
# second.pl
use strict;
my $exp = $ARGV[0];
my $sum = 0;
for my $i (3 ... 10 ** $exp - 1) {
if ($i % 3 == 0 || $i % 5 == 0) {
$sum += $i
}
}
print "$sum\n"
Python
# first.py
import sys
def multiple_checker(val):
return lambda test: test and val and test % val == 0
mof3 = multiple_checker(3)
mof5 = multiple_checker(5)
sum = 0
for i in range(10 ** int(sys.argv[1])):
if mof3(i) or mof5(i):
sum = sum + i
print sum
# second.py
import sys
exp = int(sys.argv[1])
sum = 0
for mof3or5 in set(range(3, 10**exp, 3)) | set(range(5, 10**exp, 5)):
sum = sum + mof3or5
print sum
# third.py
import sys
exp = int(sys.argv[1])
sum = 0
for mof3or5 in set(xrange(3, 10**exp, 3)) | set(xrange(5, 10**exp, 5)):
sum = sum + mof3or5
print sum
# fourth.py import sys exp = int(sys.argv[1]) total = sum(xrange(3, 10**exp, 3)) + sum(xrange(5, 10**exp, 5)) - sum(xrange(15, 10**exp, 15)) print total
Ruby
# first.rb exp = ARGV[0].to_i sum = 0 def is_mult(large, small) large % small == 0 end for i in 3..(10 ** exp - 1) next unless is_mult(i, 3) or is_mult(i, 5) sum = sum + i end puts sum
# second.rb exp = ARGV[0].to_i sum = 0 for i in 3..(10 ** exp - 1) next unless i % 3 == 0 or i % 5 == 0 sum = sum + i end puts sum(top)
Big O(rly)?
Well... sigh... all of the code above works correctly but is slow and naïve. All of the approaches above take more time as the largest multiple increases. I was ignorantly, happily tweaking code down this road until fortunately Bob clued me in. The best approach takes the same amount of time as the largest multiple increases. Why? There's a simple equation for this, calculating an arithmetic series.
# instant.rb ceil = 10 ** ARGV[0].to_i - 1 def series_of_multiples(val, ceiling) diff = val vals = ceiling / val last_val = val + diff * (vals - 1) (val + last_val) / 2.0 * vals end def som(v, c) series_of_multiples(v, c).to_i end puts som(3, ceil) + som(5, ceil) - som(15, ceil)
Wrap up
- I got a lot more out of this than a quick glance at Python. It was good to calculate a series or two, but much better to build a little test harness.
- Groovy performance tested out well here. I would avoid using some of its features in tight loops. I'm confident that Monsieur LaForge et son amis will optimize that concern away eventually.
- I thought I could love it, I really wanted to love it, it's loved by many for many good reasons, but Python started to chafe me. It's so hard to nail down exactly why... it's chock full of goodness, but just seems to need more Kool-Aid than Groovy and especially Ruby. Perl's Kool-Aid threshold is a well-known constant, at least until the Parrot changes things drastically.
- Cosine similarity + arithmetic series = it's good to work with a friendly, helpful PhD. in math. Thanks Bob :-)
Extras
The Python test harness. Probably needs tweaking for Wintel boxes.
import sys
import re
from datetime import timedelta, datetime
from os import system
#
# prefix used to run scripts. Trailing whitespace matters!
#
vms = {
'el': 'emacs --script ',
'elc': 'emacs --script ',
'groovy': 'groovy ',
'out': './',
'pl': 'perl ',
'py': 'python ',
'rb': 'ruby '
}
# Configurable parameters
#
DEFAULT_MIN = 1
DEFAULT_MAX = 7
min = DEFAULT_MIN # min "max" val = 10 ** min
max = DEFAULT_MAX # max "max" val = 10 ** max
reps = 7 # times to repeat each test, then average the results
# Helper method
def td_to_ms(delta):
val = delta.microseconds / 1000.0
val = val + delta.seconds * 1000
val = val + delta.days * 24 * 60 * 60 * 1000
return val
#
# "main"
#
for script in sys.argv[1:]:
ext = script.split('.').pop()
if not ext in vms:
print 'ignoring', script, 'unrecognized extension'
continue
vm = vms[ext]
# the lisp files hardcode the max values :(
if re.search('^el', ext):
max = min
else: # reset to defaults, needed when running multiple types of tests
max = DEFAULT_MAX
min = DEFAULT_MIN
# gotta like list comprehensions
runs = [0 for x in range(max - min + 1)]
for rep in xrange(reps):
for exp in xrange(min, max + 1):
start = datetime.now()
system('%s%s %s >/dev/null' % (vm, script, exp))
end = datetime.now()
runs[exp - min] = runs[exp - min] + td_to_ms(end - start)
for i in xrange(len(runs)):
runs[i] = int(runs[i] / reps)
print script.split('/').pop(), runs
# see, isn't it strange how python leaves you hanging at the end
# of the last scope in the file?
Edit 2008-04-29: more and better results available here.
I've been waffling about glancing at Python. It comes off as a little ... dry? Indentation drives scope? I was afraid it would be another Objective-C; popular, modeled after successful languages, but just gets stuck in my mental parser. Even Visual Basic and Pascal (q.v. Inside Macintosh) passed that gate.
Anyways, Project Euler seemed like a good way to learn a new language. Problem #1 is summing multiples of 3 or 5 less than one thousand. Some basic iteration handles the task pretty easily -- might as well be all trendy and mix in lambda functions. And, while the hood is open, why not try it in a couple of languages and generate an unscientific performance graph?
So back to the implementation -- the iterative approach is easy. But Python 2.5-ish natively supports set operations like union and intersection. Combine that with a built-in method to generate numeric sequences and the problem is basically solved. For the win -- it outperformed the iterative approach! Oh, and every language tested except Python showed negative numbers when the sum scrolled past 2**31.
Groovy, Python, C, and Emacs Lisp competed for best performance. And by competed, I mean quick and dirty code (esp. Lisp) run a few times on my MacBook Pro. Graphics first, then the code.
The lisp numbers are the sketchiest -- (current-time) only spit out seconds.
Basic info: Groovy 1.5.5 binary release, Python 2.5.1 shipped with Leopard, GCC 4.0.1, Carbon Emacs Spring 2008
Python, iterative
import sys
def multiple_checker(val):
return lambda test: test and val and test % val == 0
mof3 = multiple_checker(3)
mof5 = multiple_checker(5)
sum = 0
for i in range(10 ** int(sys.argv[1])):
if mof3(i) or mof5(i):
sum = sum + i
print sum
Python, set oriented
import sys
exp = int(sys.argv[1])
sum = 0
for mof3or5 in set(range(3, 10**exp, 3)) | set(range(5, 10**exp, 5)):
sum = sum + mof3or5
print sum
Groovy, iterative
exp = Integer.parseInt(args[0])
mof = { val, test -> val && test && test % val == 0 }
mof3 = mof.curry(3)
mof5 = mof.curry(5)
sum = 0 as BigDecimal
(3..<(10**exp)).each {
if (mof3(it) || mof5(it)) {
sum += it
}
}
println sum
Emacs Lisp, iterative (be nice, I'm really weak here)
(defun multiple_of (a b)
(= 0 (% a b))
)
(defun val-to-add (val)
(cond
((multiple_of val 3) val)
((multiple_of val 5) val)
(0)
)
)
(defun sum-of-3or5-multiples (exp)
(let ((num 3) (sum 0.0))
(while (< num (expt 10 exp))
(setq sum (+ sum (val-to-add num)))
(setq num (1+ num))
)
sum
)
)
C, iterative. Not very much code here, would have probably taken more lines of Java!
#include <stdio.h>
#include <math.h>
#define multiple_of(a, b) ((a == 0 || b == 0) ? 0 : a % b == 0)
int main(int argc, char** argv)
{
int exp = atoi(argv[1]);
double sum = 0;
int i;
for (i = 3; i < pow(10, exp); i++)
if (multiple_of(i, 3) || multiple_of(i, 5))
sum += i;
printf("%0.0f\n", sum);
return 0;
}

