13 August 2017

Generate testing data using Python

In a previous article, I talked about generating random data using Python. Let's try to use this capability for a practical purpose: to generate testing data for another program.

Example program

We will test a program from Chapter 4 of K.N. King's C Programming - A Modern Approach. The program reads Universal Product Codes (UPCs) and verifies the check digit of each code. Input UPCs are expected in the following format:

0 13800 15173 5

A UPC is a 12 digit code, where the first 11 digits identify a product, and the last digit (5 in this example) is the check digit, which should normally be consistent with the first 11 digits. The example program reads a series of UPC codes and verifies whether the check digit is consistent:

#include <stdio.h>

void skip_line()
{
    int c;
    while ((c = getchar()) != EOF && c != '\n')
        ;
}

int main(void)
{
    int d, i1, i2, i3, i4, i5,
        j1, j2, j3, j4, j5,
        check0;
    while (1) {
        int ret = scanf("%1d %1d%1d%1d%1d%1d %1d%1d%1d%1d%1d %1d",
                        &d, &i1, &i2, &i3, &i4, &i5,
                        &j1, &j2, &j3, &j4, &j5, &check0);
        if (ret != 12) break;
        skip_line();
 
        int first_sum = d + i2 + i4 + j1 + j3 + j5;
        int second_sum = i1 + i3 + i5 + j2 + j4;
        int total = 3 * first_sum + second_sum;
        int check1 = 9 - ((total - 1) % 10);
        if (check1 != check0) {
            fprintf(stderr, "*** ");
            fprintf(stderr, "%1d %1d%1d%1d%1d%1d %1d%1d%1d%1d%1d %1d: "
                "inconsistent with computed check digit (%d)\n",
                d, i1, i2, i3, i4, i5,
                j1, j2, j3, j4, j5, check0, check1);
        } else {
            printf("%1d %1d%1d%1d%1d%1d %1d%1d%1d%1d%1d %1d\n",
                d, i1, i2, i3, i4, i5,
                j1, j2, j3, j4, j5, check0);
        }
    }
}

Consistent UPC codes are output to stdout, while inconsistent ones are output to stderr. Given appropriate test data, we can feed the program with input and examine its stdout and stderr.

Generate test data

Using Python's randint function from the random module, a basic approach is to generate 12 random digits, which represent a candidate UPC code (possibly valid, possibly invalid):

code = list(range(12))
for i in range(len(code)):
    code[i] = str(random.randint(0,10))
scode = "".join([ str(d) for d in code ])
print "%s %s %s %s" % (scode[0], scode[1:6], scode[6:11], scode[11])

I create a list of length 12 using list(range(12)) and then fill each position in that list in with random digits using the for loop. The line

scode = "".join([ str(d) for d in code ])

converts the list of digits into a string of characters of length 12. Finally, the format string prints the result in the expected format, e.g.

0 13800 15173 5

By running the above code in an outer for loop, I can generate any desired number of random candidate UPC codes:

$ python gen_upc_codes.py 3
0 04190 53913 3
2 04060 34911 8
0 10629 83826 5

$

If desired, the output of this generator could be piped into the upc program to exercise it. However, a more convenient option is to use a Makefile with a target for generating the test data:

codes.in.txt: gen_upc_codes.py
        python gen_upc_codes.py 1000 > codes.in.txt

Now, when asked to generate codes.in.txt, Make will run the generator script and save the output in codes.in.txt. For convenience, I also add a test target that builds the program under test (upc), generates the test data, and runs the program with codes.in.txt as input:

test: upc codes.in.txt
        ./upc < codes.in.txt > upc.out.txt 2> upc.eout.txt

Now I can write make test to build and test the program:

$ make test
gcc -std=gnu99 -Wall [...] -lm  upc.c   -o upc
python gen_upc_codes.py 1000 > codes.in.txt
./upc < codes.in.txt > upc.out.txt 2> upc.eout.txt

$

The stdout stream output saved in upc.out.txt should contain all the generated UPC codes that were lucky enough to include a consistent check digit. For a random draw, I would expect this to happen about 1 out of 10 times since the possible check digits are 0, 1, ..., 9:

$ wc -l upc.out.txt
91 upc.out.txt

$ wc -l upc.eout.txt
909 upc.eout.txt


Thus, about 90% of the generated codes were deemed unacceptable and have found their way into the standard error output upc.eout.txt, while 91 generated codes happened to have consistent check digits and were saved in upc.out.txt. Since upc.out.txt should represent only 'valid' UPC codes, using the contents of upc.out.txt as input to upc should produce the same output:

$ cat upc.out.txt | upc | wc -l
91
$

Files

upc_test-0.0.1.zip

References

K.N. King - C Programming - A Modern Approach [goodreads.com]

Python Random Module


No comments:

Post a Comment