A command-line tool to help generate various types of data. (Most of the datasets it generates are for testing manifold learning algorithms. I add them as I need them.) Here's the usage information:
Full Usage Information [Square brackets] are used to indicate required arguments. <Angled brackets> are used to indicate optional arguments. waffles_generate [command] Generate certain useful datasets crane <options> Generate a dataset where each row represents a ray-traced image of a crane with a ball. <options> -saveimage [filename] Save an image showing all the frames. -ballradius [size] Specify the size of the ball. The default is 0.3. -frames [horiz] [vert] Specify the number of frames to render. -size [wid] [hgt] Specify the size of each frame. -blur [radius] Blurs the images. A good starting value might be 5.0. -gray Use a single grayscale value for every pixel instead of three (red, green, blue) channel values. cube [n] returns data evenly distributed on the surface of a unit cube. Each side is sampled with [n]x[n] points. The total number of points in the dataset will be 6*[n]*[n]-12*[n]+8. entwinedspirals [points] <options> Generates points that lie on an entwined spirals manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) fishbowl [n] <option> Generate samples on the surface of a fish-bowl manifold. [n] The number of samples to draw. <options> -seed [value] Specify a seed for the random number generator. -opening [size] the size of the opening. (0.0 = no opening. 0.25 = default. 1.0 = half of the sphere.) gridrandomwalk [arff-file] [width] [samples] <options> Generate a sequence of action-observation pairs by randomly walking around on a grid of observation vectors. Assumes there are four possible actions consisting of up, down, left, right. [arff-file] The filename of an arff file containing observation vectors arranged in a grid. [width] The width of the grid. [samples] The number of samples to take. In other words, the length of the random walk. <options> -seed [value] Specify a seed for the random number generator. -start [x] [y] Specifies the starting state. The default is to start in the center of the grid. -obsfile [filename] Specify the filename for the observation sequence data. The default is observations.arff. -actionfile [filename] Specify the filename for the actions data. The default is actions.arff. imagetranslatedovernoise [png-file] <options> Sample a manifold by translating an image over a background of noise. [png-file] The filename of a png image. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) manifold [samples] <options> [equations] Generate sample points randomly distributed on the surface of a manifold. [samples] The number of points with which to sample the manifold <options> -seed [value] Specify a seed for the random number generator. [equations] A set of equations that define the manifold. The equations that define the manifold must be named y1, y2, ..., but helper equations may be included. The manifold-defining equations must all have the same number of parameters. The parameters will be drawn from a standard normal distribution (from 0 to 1). Usually it is a good idea to wrap the equations in quotes. Example: "y1(x1,x2)=x1;y2(x1,x2)=sqrt(x1*x2);h(x)=sqrt(1-x);y3(x1,x2)=x2*x2-h(x 1)" noise [rows] <options> Generate random data by sampling from a distribution. [rows] The number of patterns to generate. <options> -seed [value] Specify a seed for the random number generator. -dist [distribution] Specify the distribution. The default is normal 0 1 beta [alpha] [beta] binomial [n] [p] categorical 3 [p0] [p1] [p2] A categorical distribution with 3 classes. [p0], [p1], and [p2] specify the probabilities of each of the 3 classes. (This is just an example. Other values besides 3 may be used for the number of classes.) cauchy [median] [scale] chisquare [t] exponential [beta] f [t] [u] gamma [alpha] [beta] gaussian [mean] [deviation] geometric [p] logistic [mu] [s] lognormal [mu] [sigma] normal [mean] [deviation] poisson [mu] softimpulse [s] spherical [dims] [radius] student [t] uniform [a] [b] weibull [gamma] randomsequence [length] <options> Generates a sequential list of integer values, shuffles them randomly, and then prints the shuffled list to stdout. [length] The number of values in the random sequence. <options> -seed [value] Specify a seed for the random number generator. -start [value] Specify the smallest value in the sequence. scalerotate [png-file] <options> Generate a dataset where each row represents an image that has been scaled and rotated by various amounts. Thus, these images form an open-cylinder (although somewhat cone-shaped) manifold. [png-file] The filename of a PNG image <options> -saveimage [filename] Save a composite image showing all the frames in a grid. -frames [rotate-frames] [scale-frames] Specify the number of frames. The default is 40 15. -arc [radians] Specify the rotation amount. If not specified, the default is 6.2831853... (2*PI). scurve [points] <options> Generate points that lie on an s-curve manifold. [points] The number of points with which to sample the manifold <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) selfintersectingribbon [points] <options> Generate points that lie on a self-intersecting ribbon manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. swissroll [points] <options> Generate points that lie on a swiss roll manifold. [points] The number of points with which to sample the manifold. <options> -seed [value] Specify a seed for the random number generator. -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) -cutoutstar Don't sample within a star-shaped region on the manifold. windowedimage [png-file] <options> Sample a manifold by translating a window over an image. Each pattern represents the windowed portion of the image. [png-file] The filename of the png image from which to generate the data. <options> -reduced Generate intrinsic values instead of extrinsic values. (This might be useful to empirically measure the accuracy of a manifold learner.) -stepsizes [horiz] [vert] Specify the horizontal and vertical step sizes. (how many pixels to move the window between samples.) -windowsize [width] [height] Specify the size of the window. The default is half the width and height of [png-file]. usage Print usage information.