debuggable

 
Contact Us
 

Quickly generate tons of test data

Posted on 23/1/09 by Felix Geisendörfer

I am lazy. I think it's because I was raised a programmer and grew up imagining a world where computers would do all the work for me.

Unfortunately however, our binary offspring is still asking for a great amount of our attention and time before we can retire and hand the work over to our fabulous creations.

But that doesn't mean we can't make the kids clean our dishes, wash our clothes or generate our test data! So at a point earlier last year I was working on my own test data generator, but it turns out it is quite a ton of work to aggregate good data from various places and create a nice generator based upon it.

No worries, the story has a happy end. The universe is kind to the lazy and hearing my cry of need created a whole array of products and web services to address the problem!

Hello generatetestdata.com

From the solutions I tried to far, generatedata.com made by the friendly folks at Black Sheep Web Software has worked the best for me.

However, it took me a little time to streamline my workflow when using it, so here are a few tips you should definitely know about when using the service:

Tip 1: Generate UUID's

If you are using UUIDs in your application and you want to generate random UUIDs there is currently no option for that at generatetestdata.com. However, selecting the type "Alpha-Numeric" and leaving the drop down at "Please select", you can enter your own pattern for the string to be created. Here is the one I came up with for UUIDs:

lxlxlxlx-lxlx-lxlx-lxlx-lxlxlxlxlxlx

Tip 2: Generate custom strings

If you need even more customized strings, you can use the placeholders provided by the service. As you can see I simplified UUID generation above by simply combining random lowercase letters with random numbers - not even close to perfect UUIDs, but very kick-ass for test data ; ).

L	An uppercase Letter.
l	A lowercase letter.
D	A letter (upper or lower).
C	An uppercase Consonant.
c	A lowercase consonant.
E	A consonant (upper or lower).
V	An uppercase Vowel.
v	A lowercase vowel.
F	A vowel (upper or lower).
x	Any number, 0-9.
X	Any number, 1-9.

Tip 3: Connect to foreign keys

The whole generator is kind of nice and all, but if the table you are generating data for has foreign keys that point to different tables you need those links to work, right? After all what do you need test data for if not to see the parts of your app working together ; ).

Anyway, it turns out this is not a particular difficult problem either. From the data type drop down select "Custom List" and again leave the "Please select"option in the second drop down where it is. Now you can provide your own list of strings separated by the "|" (pipe character) to be used for this field.

How do you get a good list that matches your foreign keys? Easy. Let's say you have a key "user_id" in your table and you need a good list for that, simply run this MySql query:

SELECT GROUP_CONCAT(id SEPARATOR '|') FROM users;

This will return an already perfectly formatted list ready to be copy & pasted into the list field! And as you can see you can get very fancy at this point by providing conditions and other things to get the data you want! This can also be handy if you already have a set of data (like path's to uploaded images <- sry, you'll still have to do this manually) that you want to re-use.

Tip 4: Quickly getting the records into your table

For me the quickest way to get the records in the table so far has been to select "Sql" as the result type on top, enter my table name at the bottom and disable the "create table" option. Then after hitting generate I hit Command+A to select all items and paste the whole thing into my MySql terminal (you can also use a guy client like phpMyAdmin for that).

If you are on OSX you can also pipe your clipboard into mysql directly like this:

pbpaste > mysql my_db

Start using big sets of test data now

There is no point in always testing a sophisticated app with a set of < 10 records you entered by hand. Especially if those are full of profanity and can't be handed of to a client anyway ; ).

For example I currently am working on a very nice dashboard for our client to help analyze some click data we are tracking. I started out with just a handful of clicks I generated manually, but realized I needed 1000++ records to see if my Sql records are good enough and to see the charts become all pretty. Needless to say, I caught several issues that only became apparent through the amount of data.

So if you develop a system, and even if it's very small - generate some good test data for it today. Nothing is a worse productivity killer then not being able to test something properly due to the lack of data.

-- Felix Geisendörfer aka the_undefined

PS: If you are re-doing an existing system or otherwise have the chance to use "actual" data to test with, spend some time on getting that going. Real records are always better (read worse but in a good way) to test your application.

 
&nsbp;

You can skip to the end and add a comment.

Abhimanyu Grover said on Jan 23, 2009:

Nice post. Also, You can use Faker class to generate tons of test data locally: http://github.com/caius/php-faker/tree/master

SayB said on Jan 25, 2009:

This is a great scrip ! - Thanks for sharing. All this needs now is a cool plugin for Cake :P

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.