Set up

All the clustering algorithms we will use require input that describe the dependencies between system entities. To get started, download this RSF file that contains such information for the TAB2PS system.

Hierarchical Algorithms

Input to these algorithms is in market-basked data form. To transform an RSF file to an MBD one, use something like

unitrans tab2ps.rsf tab2ps.mbd

You can then try various hierarchical algorithms. Simply give

aa -h

to see the usage information.

An example that clusters using a cut-height of 0.1, the Jaccard Coefficient, and the Complete Linkage Algorithm:

aa tab2ps.mbd tab2ps.contain..rsf -c0.1 -s0 -a1

After you download this file you can give the following to produce several different clusterings:

aa tab2ps.mbd tab2ps.contain..rsf -Chowtocut.txt -s0 -a1

ACDC

You can see the usage for ACDC by typing

acdc -h

However, one rarely needs to provide options. You can cluster the TAB2PS system with ACDC using

acdc tab2ps.rsf tab2ps.contain.acdc.rsf

To see a graphical representation of the results, add the -t option as in:

acdc tab2ps.rsf tab2ps.contain.acdc.rsf -t

Bunch

Bunch accepts input in a format that is exactly like RSF except that the first token is missing, i.e. Bunch does not differentiate between different types of dependencies. To transform an RSF file in this way, give

cut -f2,3 < tab2ps.rsf > tab2ps.2rsf

The above will only work if tokens are separated by TABs. Add the following -d option if spaces are used for token separation.

cut -f2,3 -d" " < tab2ps.rsf > tab2ps.2rsf

You can run Bunch by simply typing

bunch

You might want to run Bunch in the background to avoid blocking your terminal:

bunch &

On the Basic tab, press Select... to choose an input file. Navigate and select tab2ps.2rsf. Bunch may give a warning concerning reflexive edges. You can avoid this warning by using a simple grok script to remove reflexive edges from the input

Click Run to run Bunch with default options. When finished select View Graph.

Change the output format to Text and run again. The output will be in tab2ps.2rsf.bunch in a format called SIL. To transform SIL to RSF:

bunch2rsf tab2ps.2rsf.bunch

The output will be in tab2ps.2rsf.bunch.rsf. Contrast this output to that of ACDC and the hierarchical algorithms.

You can try changing the Options for the Hill Climbing algorithm or selecting a different clustering method like Genetic Algorithms.