A Grails plugin for fuzzy string matching
By Seth Schroeder
Feb 27, 2008
Long ago at a company which shall not be named, I made the mistake of shunning contemporary technology. Never again! Since Near Infinity honors its training commitment I was able to attend and enjoy the Groovy / Grails NFJS conference.
Apache Commons Codec implemented soundex, double metaphone, base64, and more. in the spirit of Grails why not stitch them into java.lang.String?
import org.apache.commons.codec.language.*
import org.apache.commons.codec.net.*
class FuzzstrGrailsPlugin {
def version = 0.1
def dependsOn = [core:grails.util.GrailsUtil.getGrailsVersion()]
def doWithDynamicMethods = {
def encodingClosure = { it.encode(delegate) }
def decodingClosure = { it.decode(delegate) }
String.metaClass.toSoundex = encodingClosure.curry(new Soundex())
String.metaClass.toRefinedSoundex = encodingClosure.curry(new RefinedSoundex())
String.metaClass.toMetaphone = encodingClosure.curry(new Metaphone())
String.metaClass.toDoubleMetaphone = encodingClosure.curry(new DoubleMetaphone())
String.metaClass.toBase64 = encodingClosure.curry(new BCodec())
String.metaClass.fromBase64 = decodingClosure.curry(new BCodec())
String.metaClass.toQPrintable = encodingClosure.curry(new QCodec())
String.metaClass.fromQPrintable = decodingClosure.curry(new QCodec())
String.metaClass.toQuotedPrintable = encodingClosure.curry(new QuotedPrintableCodec())
String.metaClass.fromQuotedPrintable = decodingClosure.curry(new QuotedPrintableCodec())
String.metaClass.toUrlEncoded = encodingClosure.curry(new URLCodec())
String.metaClass.fromUrlEncoded = decodingClosure.curry(new URLCodec())
String.metaClass.similarTo = { Cosine.stringSimilarity(delegate, it) }
String.metaClass.mostSimilarTo = { Cosine.mostSimilar(delegate, it) }
String.metaClass.rankedSimilarity = { Cosine.horseShoes(delegate, it) }
}
}
So what? Well, see for yourself:
Loading with installed plug-ins: ["fuzzstr"] ...
Groovy Shell (1.5.4, JVM: 1.5.0_13-119)
Type 'help' or '\h' for help.
-------------------------------------------------------------------------------
groovy:000> "literal".toSoundex()
===> L364
groovy:000> "literal".toDoubleMetaphone()
===> LTRL
groovy:000> "literal".toBase64().fromBase64()
===> literal
groovy:000> "literal".similarTo("litteral")
===> 93
groovy:000> "litteral".mostSimilarTo(['literature', 'litter', 'literal'])
===> literal
The plugin's not ready to be released yet but hopefully within a few weeks; localization and caching need some love. Pondering whether GORM might benefit from some of these methods.
seth _dot_ schroeder _at_ nearinfinity _dot_ com