Meaning of alpha and beta parameters in function makeFeatSelControlSequential (MLR library in R)

For deterministic forward or backward search, I'm used to give thresholds for p-values linked to coefficients linked to individual features. In the documention of makeFeatSelControlSequential in R/MLR https://www.rdocumentation.org/packages/mlr/versions/2.13/topics/FeatSelControl, alpha and beta parameters are described as follow:

alpha
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.

beta
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

It is however not clear what does "improvement difference" mean here. In the example below, I gave 0 as treshold for a backward selection (beta parameter). If this parameter relates to a threshold on p-value, I would expect to get the model without feature but it is not the case as I get an AUC of 0.9886302 instead of 0.5.

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 ) 


# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################

library(BBmisc)
library(mlr)

set.seed(123, "L'Ecuyer")
set.seed(21)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm 
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

asked Nov 15 '18 at 15:20

Chris

217

add a comment |

alpha
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.

beta
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 ) 


# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################

library(BBmisc)
library(mlr)

set.seed(123, "L'Ecuyer")
set.seed(21)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm 
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

asked Nov 15 '18 at 15:20

Chris

217

add a comment |

alpha
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.

beta
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 ) 


# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################

library(BBmisc)
library(mlr)

set.seed(123, "L'Ecuyer")
set.seed(21)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm 
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

asked Nov 15 '18 at 15:20

Chris

217

alpha
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.

beta
(numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 ) 


# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################

library(BBmisc)
library(mlr)

set.seed(123, "L'Ecuyer")
set.seed(21)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm 
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

r cross-validation feature-selection mlr

asked Nov 15 '18 at 15:20

Chris

217

asked Nov 15 '18 at 15:20

Chris

217

asked Nov 15 '18 at 15:20

Chris

217

asked Nov 15 '18 at 15:20

Chris

217

asked Nov 15 '18 at 15:20

Chris

217

add a comment |

1 Answer
1

active

oldest

votes

The parameters control what difference in performance (for whatever performance measure you choose) is acceptable to proceed with a step along a forward or backward search. mlr doesn't compute any p-values, and no p-values are used in this process.

As the parameters only control what happens in a step, they also don't directly control the final outcome. What happens under the hood is that, e.g. for forward search, mlr computes the performances of all feature sets that expand the current one with a single feature and chooses the best one as long as it provides at least the improvement specified in alpha or beta. This procedure repeats until either all features (forward search) or no features (backward search) are present or if no minimum improvement as specified by the parameters can be achieved.

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53322597%2fmeaning-of-alpha-and-beta-parameters-in-function-makefeatselcontrolsequential-m%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

add a comment |

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

add a comment |

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

answered Nov 15 '18 at 15:54

Lars Kotthoff

91.2k11159166

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

add a comment |

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

In my example with multiple performance measures, AUC is used as criteria for this process as it is the first metric mentioned in the list. Right ?

– Chris
Nov 15 '18 at 16:13

That is correct.

– Lars Kotthoff
Nov 15 '18 at 16:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

w4B8D 9Du5Z4j5ODv5,fiXzw3H 1,eACPwvWYt0 f8ycEJ,OkFDuP6 cC,HSf3pvu PO5WecnSlxvthXV

搜尋此網誌

Odtnhj