A Selected List of Machine Learning Jargons

[latexpage]

Chapter 1 Introduction

  • Data set
  • Instance
  • Attribute/Feature
  • Attribute Value
  • Attribute Space/Sample Space/Input Space
  • Dimensionality(of attribute space)
  • Learning/Traning
    • Training Data
    • Training Set
    • Hypothesis
      • v.s. Ground-truth
  • Prediction
  • Label
  • Label Space
  • Classification/Regression
    • differ on continuity
  • Testing Sample
  • Clustering
    • Unlabeled training data
  • Supervised/Unsupervised Learning
  • Gerneralization
  • Distribution
  • Concept learning/Black Box Model
  • Learning Process
    • Searching the hypothesis space
    • to fit the training set
  • Version Space
    • Set of hypothesis that fit the training data
  • Inductive Bias
    • Select unique hypothesis in version space
    • Occam’s Razor
    • Need a priori information about specific problem, often determines the effectiveness of algorithm
  • Theorem  of No Free Lunch

Chapter 2 Model Evaluation & Selection

  • Accuracy/Error rate
  • Training Error = Empirical Error
  • Generalization Error
  • Overfitting/Underfitting
    • Overfitting is unavoidable if granted P$\neq$NP
  • Testing Error
    • Approximated Generalization Error
  • Method of Evaluation
    • Hold-out
      • Stratified sampling
    • k-fold cross validation
    • Leave-one-out(k=N)
    • Bootstrapping
      • Bootstrap sampling
      • Estimation Bias: distribution of original dataset is  changed
  • Parameter Tuning
  • Validation Set
  • Performance Measure (of generalization ability)
    • Mean Squared Error $E(f;D)=\frac{1}{m}\sum^{m}_{i=1}(f(x_i)-y_i)^{2}$
    • Error rate $E(f;D)=\frac{1}{m}\sum^{m}_{i=1}I(f(x_i)\neq y_i)$
    • Accuracy $acc(f;D)=1-E(f;D)$
    • Precision $P=\frac{TruePostive}{TruePostive+FalsePostive}$
    • Recall $R=\frac{TruePostive}{TruePostive+FalseNegative}$
    • P-R graph
    • Break-Even Point $P=R$
    • F1 Measure $F1=\frac{2\times P\times R}{P+R}=\frac{2\times TruePos}{m+TruePos-TrueNeg}$
    • $F_{\beta}$measure, macro-{P, R, F1}, micro-{P, R, F1} etc.
  • threshold/cut point
  • ROC: Receiver Operating Characteristic curve
    • $TruePosRate=\frac{TruePos}{TruePos+FalseNeg}$
    • $FalsePosRate=\frac{FalsePos}{TrueNeg+FalsePos}$
    • AUC= Area Under ROC Curve
  • Unequal cost, Cost matrix, Total cost
  • Cost-sensitive Error Rate(for binary classification)
    \begin{align}
    \nonumber
    E(f;D;cost)=\frac{1}{m}(\sum_{x_i \in D^{+}} I(f(X_i)\neq y_i)\times cost_{01} \\
    +\sum_{x_i \in D^{-}}I(f(X_i)\neq y_i)\times cost_{10})
    \end{align}
  • Cost Curve*
  • Hypothesis Test
    • statistical analysis of learner’s generalization ability
    • skipped. see page 37-44
  • Bias-variance Decomposition
    • $E(f;D)=bias^2(x)+var(x)+\varepsilon^2$
    • see page 45
    • bias-variance dilemma

 

To be continued…?

Leave a Reply

Your email address will not be published. Required fields are marked *