

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object weka.classifiers.rules.RuleStats
public class RuleStats
This class implements the statistics functions used in the propositional rule learner, from the simpler ones like count of true/false positive/negatives, filter data based on the ruleset, etc. to the more sophisticated ones such as MDL calculation and rule variants generation for each rule in the ruleset.
Obviously the statistics functions listed above need the specific data and the specific ruleset, which are given in order to instantiate an object of this class.
Constructor Summary  

RuleStats()
Default constructor 

RuleStats(Instances data,
FastVector rules)
Constructor that provides ruleset and data 
Method Summary  

void 
addAndUpdate(Rule lastRule)
Add a rule to the ruleset and update the stats 
void 
cleanUp()
Frees up memory after classifier has been built. 
double 
combinedDL(double expFPRate,
double predicted)
Compute the combined DL of the ruleset in this class, i.e. 
void 
countData()
Filter the data according to the ruleset and compute the basic stats: coverage/uncoverage, true/false positive/negatives of each rule 
void 
countData(int index,
Instances uncovered,
double[][] prevRuleStats)
Count data from the position index in the ruleset assuming that given data are not covered by the rules in position 0...(index1), and the statistics of these rules are provided. This procedure is typically useful when a temporary object of RuleStats is constructed in order to efficiently calculate the relative DL of rule in position index, thus all other stuff is not needed. 
static double 
dataDL(double expFPOverErr,
double cover,
double uncover,
double fp,
double fn)
The description length of data given the parameters of the data based on the ruleset. 
Instances 
getData()
Get the data of the stats 
double[] 
getDistributions(int index)
Get the class distribution predicted by the rule in given position 
Instances[] 
getFiltered(int index)
Get the data after filtering the given rule 
java.lang.String 
getRevision()
Returns the revision string. 
FastVector 
getRuleset()
Get the ruleset of the stats 
int 
getRulesetSize()
Get the size of the ruleset in the stats 
double[] 
getSimpleStats(int index)
Get the simple stats of one rule, including 6 parameters: 0: coverage; 1:uncoverage; 2: true positive; 3: true negatives; 4: false positives; 5: false negatives 
double 
minDataDLIfDeleted(int index,
double expFPRate,
boolean checkErr)
Compute the minimal data description length of the ruleset if the rule in the given position is deleted. The min_data_DL_if_deleted = data_DL_if_deleted  potential 
double 
minDataDLIfExists(int index,
double expFPRate,
boolean checkErr)
Compute the minimal data description length of the ruleset if the rule in the given position is NOT deleted. The min_data_DL_if_n_deleted = data_DL_if_n_deleted  potential 
static double 
numAllConditions(Instances data)
Compute the number of all possible conditions that could appear in a rule of a given data. 
static Instances[] 
partition(Instances data,
int numFolds)
Patition the data into 2, first of which has (numFolds1)/numFolds of the data and the second has 1/numFolds of the data 
double 
potential(int index,
double expFPOverErr,
double[] rulesetStat,
double[] ruleStat,
boolean checkErr)
Calculate the potential to decrease DL of the ruleset, i.e. 
void 
reduceDL(double expFPRate,
boolean checkErr)
Try to reduce the DL of the ruleset by testing removing the rules one by one in reverse order and update all the stats 
double 
relativeDL(int index,
double expFPRate,
boolean checkErr)
The description length (DL) of the ruleset relative to if the rule in the given position is deleted, which is obtained by: MDL if the rule exists  MDL if the rule does not exist Note the minimal possible DL of the ruleset is calculated(i.e. 
void 
removeLast()
Remove the last rule in the ruleset as well as it's stats. 
static Instances 
rmCoveredBySuccessives(Instances data,
FastVector rules,
int index)
Static utility function to count the data covered by the rules after the given index in the given rules, and then remove them. 
void 
setData(Instances data)
Set the data of the stats, overwriting the old one if any 
void 
setMDLTheoryWeight(double weight)
Set the weight of theory in MDL calcualtion 
void 
setNumAllConds(double total)
Set the number of all conditions that could appear in a rule in this RuleStats object, if the number set is smaller than 0 (typically 1), then it calcualtes based on the data store 
void 
setRuleset(FastVector rules)
Set the ruleset of the stats, overwriting the old one if any 
static Instances 
stratify(Instances data,
int folds,
java.util.Random rand)
Stratify the given data into the given number of bags based on the class values. 
static double 
subsetDL(double t,
double k,
double p)
Subset description length: S(t,k,p) = k*log2(p)(nk)log2(1p) Details see Quilan: "MDL and categorical theories (Continued)",ML95 
double 
theoryDL(int index)
The description length of the theory for a given rule. 
Methods inherited from class java.lang.Object 

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Constructor Detail 

public RuleStats()
public RuleStats(Instances data, FastVector rules)
data
 the datarules
 the rulesetMethod Detail 

public void cleanUp()
public void setNumAllConds(double total)
total
 the set numberpublic void setData(Instances data)
data
 the data to be setpublic Instances getData()
public void setRuleset(FastVector rules)
rules
 the set of rules to be setpublic FastVector getRuleset()
public int getRulesetSize()
public double[] getSimpleStats(int index)
index
 the index of the rule
public Instances[] getFiltered(int index)
index
 the index of the rule
public double[] getDistributions(int index)
index
 the position index of the rule
public void setMDLTheoryWeight(double weight)
weight
 the weight to be setpublic static double numAllConditions(Instances data)
data
 the given data
public void countData()
public void countData(int index, Instances uncovered, double[][] prevRuleStats)
index
 the given positionuncovered
 the data not covered by rules before indexprevRuleStats
 the provided stats of previous rulespublic void addAndUpdate(Rule lastRule)
lastRule
 the rule to be addedpublic static double subsetDL(double t, double k, double p)
t
 the number of elements in a known setk
 the number of elements in a subsetp
 the expected proportion of subset known by recipient
public double theoryDL(int index)
Details see Quilan: "MDL and categorical theories (Continued)",ML95
index
 the index of the given rule (assuming correct)
public static double dataDL(double expFPOverErr, double cover, double uncover, double fp, double fn)
Details see Quinlan: "MDL and categorical theories (Continued)",ML95
expFPOverErr
 expected FP/(FP+FN)cover
 coverageuncover
 uncoveragefp
 False Positivefn
 False Negative
public double potential(int index, double expFPOverErr, double[] rulesetStat, double[] ruleStat, boolean checkErr)
The way this procedure does is copied from original RIPPER implementation and is quite bizzare because it does not update the following rules' stats recursively any more when testing each rule, which means it assumes after deletion no data covered by the following rules (or regards the deleted rule as the last rule). Reasonable assumption?
index
 the index of the rule in m_Ruleset to be deletedexpFPOverErr
 expected FP/(FP+FN)rulesetStat
 the simple statistics of the ruleset, updated
if the rule should be deletedruleStat
 the simple statistics of the rule to be deletedcheckErr
 whether check if error rate >= 0.5
public double minDataDLIfDeleted(int index, double expFPRate, boolean checkErr)
index
 the index of the rule in questionexpFPRate
 expected FP/(FP+FN), used in dataDL calculationcheckErr
 whether check if error rate >= 0.5
public double minDataDLIfExists(int index, double expFPRate, boolean checkErr)
index
 the index of the rule in questionexpFPRate
 expected FP/(FP+FN), used in dataDL calculationcheckErr
 whether check if error rate >= 0.5
public double relativeDL(int index, double expFPRate, boolean checkErr)
index
 the given position of the rule in question
(assuming correct)expFPRate
 expected FP/(FP+FN), used in dataDL calculationcheckErr
 whether check if error rate >= 0.5
public void reduceDL(double expFPRate, boolean checkErr)
expFPRate
 expected FP/(FP+FN), used in dataDL calculationcheckErr
 whether check if error rate >= 0.5public void removeLast()
public static Instances rmCoveredBySuccessives(Instances data, FastVector rules, int index)
data
 the data to be processedrules
 the rulesetindex
 the given index
public static final Instances stratify(Instances data, int folds, java.util.Random rand)
Instances.stratify(int fold)
that before stratification it sorts the instances according to the
class order in the header file. It assumes no missing values in the class.
data
 the given datafolds
 the given number of foldsrand
 the random object used to randomize the instances
public double combinedDL(double expFPRate, double predicted)
expFPRate
 expected FP/(FP+FN), used in dataDL calculationpredicted
 the default classification if ruleset covers null
public static final Instances[] partition(Instances data, int numFolds)
data
 the given datanumFolds
 the given number of folds
public java.lang.String getRevision()
getRevision
in interface RevisionHandler


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 