The generate program creates a random data set in the following way. It creates random noise points. It then creates clusters by choosing points which will be the centres of circles of radius . We then randomly choose a cluster and a point within the circle for each of the cluster points being creating.
No input is required on standard input, and the unclustered dataset is written to standard output.
The annotation of each point is the cluster number it belongs to, or -1 for outliers. This aids greatly in evaluating how well a given clustering algorithm performs on the dataset.