All the clustering algorithms we will use require input that describe the dependencies between system entities. To get started, download this RSF file that contains such information for the TAB2PS system.
Input to these algorithms is in market-basked data form. To transform an RSF file to an MBD one, use something like
unitrans tab2ps.rsf tab2ps.mbdYou can then try various hierarchical algorithms. Simply give
aa -hto see the usage information.
An example that clusters using a cut-height of 0.1, the Jaccard Coefficient, and the Complete Linkage Algorithm:
aa tab2ps.mbd tab2ps.contain..rsf -c0.1 -s0 -a1After you download this file you can give the following to produce several different clusterings:
aa tab2ps.mbd tab2ps.contain..rsf -Chowtocut.txt -s0 -a1You can see the usage for ACDC by typing
acdc -hHowever, one rarely needs to provide options. You can cluster the TAB2PS system with ACDC using
acdc tab2ps.rsf tab2ps.contain.acdc.rsfTo see a graphical representation of the results, add the -t option as in:
acdc tab2ps.rsf tab2ps.contain.acdc.rsf -tBunch accepts input in a format that is exactly like RSF except that the first token is missing, i.e. Bunch does not differentiate between different types of dependencies. To transform an RSF file in this way, give
cut -f2,3 < tab2ps.rsf > tab2ps.2rsfThe above will only work if tokens are separated by TABs. Add the following -d option if spaces are used for token separation.
cut -f2,3 -d" " < tab2ps.rsf > tab2ps.2rsfYou can run Bunch by simply typing
bunchYou might want to run Bunch in the background to avoid blocking your terminal:
bunch &On the Basic tab, press Select... to choose an input file. Navigate and select tab2ps.2rsf. Bunch may give a warning concerning reflexive edges. You can avoid this warning by using a simple grok script to remove reflexive edges from the input
Click Run to run Bunch with default options. When finished select View Graph.
Change the output format to Text and run again. The output will be in tab2ps.2rsf.bunch in a format called SIL. To transform SIL to RSF:
bunch2rsf tab2ps.2rsf.bunchThe output will be in tab2ps.2rsf.bunch.rsf. Contrast this output to that of ACDC and the hierarchical algorithms.
You can try changing the Options for the Hill Climbing algorithm or selecting a different clustering method like Genetic Algorithms.