Testing data -- get everything you want, none of what you don't

| | Comments (0) | TrackBacks (0)

I believe that most software tests favor code over data. It might be an issue of tools; code testing tools are easy to use, have aggressive ide and framework support, and produce coverage metrics. Maybe better data testing tools would increase data coverage. Here's some code to demonstrate one need when testing lots of data: catch wrong results and absent results.

One approach to testing data involves these parts:

  1. Define the input data
  2. Define the expected output data
  3. Implement code to process the input data
  4. Generate the actual output data by evaluating #1 with #3
  5. Compare the results of #4 with #2

Easy as pie, right?

I think #5 is more interesting than it seems. The challenge is to report every difference between the expected output and the actual output. Detecting wrong results is easy. Detecting absent results is harder.

Blue results were found, but not expected. Yellow results were expected, but not found. Green results were found and expected. I expected this to work out easily as differences and intersections of sets, but hit a little snag in the tool implementation.

About this huge chunk of code:

  • Lines 6-8 define the input and expected output. The values are stored in a map, where the keys are the input and the values the output.
  • Thank Groovy! It makes passing code around so easy. Line 12 defines the test logic inline.
  • test1 demonstrates all good, no bad results
  • test2 demonstrates wrong results
  • test3 demonstrates absent results
  • licensed with the Apache 2 license

And putting the cart before the horse, here are the results of the code:

--Output from test1--
--Output from test2--
a: wanted A, got B
--Output from test3--
caught exception java.lang.Exception: c!?!?
c was ignored!

Enough, here's the code:

 1: class TestStuff extends GroovyTestCase {
 2:
 3:     def inputOutput = [:]
 4:  
 5:     void setUp() {
 6:         inputOutput['a'] = 'A'
 7:         inputOutput['b'] = 'B'
 8:         inputOutput['c'] = 'C'
 9:     }
10:    
11:     void test1() {
12:         assert verifyAll(inputOutput, { it.toUpperCase() } )
13:     }
14: 
15:     void test2() {
16:         assert verifyAll(inputOutput, {
17:             if (it == 'a') ++it
18:             it.toUpperCase()
19:         })
20:     }
21: 
22:     void test3() {
23:         assert verifyAll(inputOutput, {
24:             if (it == 'c') throw new Exception("c!?!?")
25:             it.toUpperCase()
26:         })
27:     }
28: 
29:     def verifyAll(expected, method) {
30:         def actual = runTests(expected, method)
31:         def missed = diffSets(expected.keySet(), actual.keySet())
32:         def unexpected = diffMaps(actual, expected)
33: 
34:         missed.each { println("${it} was ignored!") }
35: 
36:         unexpected.each { key, val ->
37:             println("${key}: wanted ${expected[key]}, got ${val}")
38:         }
39: 
40:         return actual == expected
41:     }
42: 
43:     def runTests(control, method) {
44:         def test = [:]
45: 
46:         control.each { input, value ->
47:             try { test[input] = method(input) }
48:             catch (e) { println ("caught exception ${e}") }
49:         }
50:             
51:         return test
52:     }
53: 
54:     def diffSets(a, b) {
55:         def diff = []
56:         a.each { if (!b.contains(it)) diff << it }
57:         return diff
58:     }
59: 
60:     def diffMaps(a, b) {
61:         def diff = [:]
62:         a.each { key, val -> if (a[key] != b[key]) diff[key] = val }
63:         return diff
64:     }

Wait wait wait -- real programs are built with classes!

// ... imagine Ahhbject.groovy (say it with a Boston accent :)
class Ahhbject {
    Object val
    String toString() { "${this.class.name}@${hashCode()} ${val}" }
}

65:     void test4() {
66:         inputOutput.a = new Ahhbject(val:'A')
67:         inputOutput.b = new Ahhbject(val:'B')
68:         inputOutput.c = new Ahhbject(val:'C')
69: 
70:         assert verifyAll(inputOutput, {
71:             new Ahhbject(val: it.toUpperCase())
72:         })
73:     }

--Output from test4--
a: wanted Ahhbject@3260319 A, got Ahhbject@1806345 A
b: wanted Ahhbject@9259311 B, got Ahhbject@12565457 B
c: wanted Ahhbject@12823094 C, got Ahhbject@14416801 C
    

Curses! It doesn't just work. The cause is one of my pet peeves -- a class which can't identify distinct but equivalent instances. Grails makes this easy to fix by bundling the Apache commons lang library. HashCodeBuilder and EqualsBuilder make quick work of implementing hashCode and equals:

    int hashCode() { HashCodeBuilder.reflectionHashCode(this) }
    boolean equals(Object obj) { EqualsBuilder.reflectionEquals(this, obj) }

Add those two lines and ta-da! Fit for data testing!

Leave a comment


Type the characters you see in the picture above.

0 TrackBacks

Listed below are links to blogs that reference this entry: Testing data -- get everything you want, none of what you don't.

TrackBack URL for this entry: http://www.nearinfinity.com/mt/mt-tb.cgi/525