In choosing the datasets to be used as the basis for the comparison, several factors had to be evaluated. Obtaining existing results for the standard ML algorithms was of considerable importance, to avoid the peripheral task of sourcing existing standard ML algorithms and running them on the datasets. Fortunately, [Lim et al, 1999] provides an excellent and current summary of the performance of numerous standard ML algorithms - twenty two decision tree, nine statistical and two neural network - on sixteen datasets from [Blake et al, 1998, the UCI ML dataset repository], both with and without added noise.
The other main criteria for the datasets was a restriction in the size and complexity of the dataset. This is because when constructing the fuzzy rules the human will ultimately be examining all the training data, and the scale of this project was limited. Thus, datasets were preferred to have a small to medium number of training instances (no more than, say 1000), covering all of the attribute space if possible, small to medium number of attributes (no more than, say 20), and discrete attributes and classes, each with a small number of possible values (no more than, say 10). The reasons behind some of these requirements will be discussed in Section 3.2.1.
The datasets were split sequentially into a training set of the first 70% of the instances and a validation set of the last 30% of the instances. The reasons and ramifications of this will be discussed in Section 3.2.2.
The dataset chosen as the best for this work was the 1984 United States Congressional Voting Records Database (voting-records) from [Blake et al, 1998, the UCI ML dataset repository]. This dataset records the votes of 435 US Congressmen on 16 key questions, where each attribute (question) can have the value ``yea'', ``nay'' or ``abstained'', and each Congressman is classified as a Democrat or Republican. Further, this dataset has results in [Lim et al, 1999].
Unfortunately, an initial underestimate of the time required to create the fuzzy rules and the generally limited scale of the project precluded other datasets from being tested. The most desirable datasets for testing were the Mushroom Database from [Blake et al, 1998, the UCI ML dataset repository] (difficult because of its large number of instances and attributes) and the Car Evaluation Database from [Blake et al, 1998, the UCI ML dataset repository], which would have required results for one or more standard ML algorithms (such as C4.5) to be run on it.