Python package training parameters

Several parameters have aliases. For example, the iterations parameter has the following synonyms: num_boost_round, n_estimators, num_trees. Simultaneous usage of different names of one parameter raises an error.

Training on GPU requires NVIDIA Driver of version 418.xx or higher.


Parameter	Type	Description	Default value	Supported processing units
Common parameters
loss_function Alias: objective	string object	The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics A custom python object can also be set as the value of this parameter (see an example). For example, use the following construction to calculate the value of Quantile with the coefficient : `Quantile:alpha=0.1`	Depends on the class	CPU and GPU
custom_metric	string list of strings	Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics Examples: Calculate the value of CrossEntropy: `CrossEntropy` Calculate the value of в with the coefficient `Quantile:alpha=0.1` Calculate the values of Logloss and AUC: `['Logloss', 'AUC']` Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. Use the visualization tools to see a live chart with the dynamics of the specified metrics.	None	CPU and GPU
eval_metric	string object	The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics A user-defined function can also be set as the value (see an example). Examples: `R2`	Optimized objective is used	CPU and GPU
iterations Aliases: num_boost_round n_estimators num_trees	int	The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.	1000	CPU and GPU
learning_rate Alias: eta	float	The learning rate. Used for reducing the gradient step.	The default value is defined automatically for Logloss, MultiClass & RMSE loss functions depending on the number of iterations if none of these parameters is set. In this case, the selected learning rate is printed to stdout and saved in the model. In other cases, the default value is 0.03.	CPU and GPU
random_seed Alias: random_state	int	The random seed used for training.	None (0)	CPU and GPU
l2_leaf_reg Alias: reg_lambda	float	Coefficient at the L2 regularization term of the cost function. Any positive value is allowed.	3.0	CPU and GPU
bootstrap_type	string	Bootstrap type. Defines the method for sampling the weights of objects. Supported methods: Bayesian Bernoulli MVS Poisson (supported for GPU only) No	The default value depends on the selected mode and processing unit type	CPU and GPU
bagging_temperature	float	Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. This parameter can be used if the selected bootstrap type is Bayesian.	1	CPU and GPU
subsample	float	Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: Poisson Bernoulli MVS	The default value depends on the dataset size and the bootstrap type: Datasets with less than 100 objects — 1 Datasets with 100 objects or more: Poisson, Bernoulli — 0.66 MVS — 0.8	CPU and GPU
sampling_frequency	string	Frequency to sample weights and objects when building trees. Supported values: PerTree — Before constructing each new tree PerTreeLevel — Before choosing each new split of a tree	PerTreeLevel	CPU
sampling_unit	String	The sampling scheme. Possible values: Object — The weight of the i-th object is used for sampling the corresponding object. Group — The weight of the group is used for sampling each object from the group .	Object	CPU and GPU
mvs_reg	float	Affects the weight of the denominator and can be used for balancing between the importance and Bernoulli sampling (setting it to 0 implies importance sampling and to - Bernoulli). Note. This parameter is supported only for the MVS sampling method (the bootstrap_type parameter must be set to MVS).	The value is set based on the gradient distribution on the current iteration	CPU
random_strength	float	The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions: QueryCrossEntropy YetiRankPairwise PairLogitPairwise	1	CPU
use_best_model	bool	If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: Build the number of trees defined by the training parameters. Use the validation dataset to identify the iteration with the optimal value of the metric specified in --eval-metric (eval_metric). No trees are saved after this iteration. This option requires a validation dataset to be provided.	True if a validation set is input (the eval_set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise.	CPU and GPU
best_model_min_trees	int	The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the use_best_model parameter.	None (The minimal number of trees for the best model is not set)	CPU and GPU
depth Alias: max_depth	int	Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: CPU — Any integer up to 16. GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.	6 (16 if the growing policy is set to Lossguide)	CPU and GPU
grow_policy	string	The tree growing policy. Defines how to perform greedy tree construction. Possible values: SymmetricTree —A tree is built level by level until the specified depth is reached. On each iteration, all leaves from the last tree level are split with the same condition. The resulting tree structure is always symmetric. Depthwise — A tree is built level by level until the specified depth is reached. On each iteration, all non-terminal leaves from the last tree level are split. Each leaf is split by condition with the best loss improvement. Note. Models with this growing policy can not be analyzed using the PredictionDiff feature importance and can be exported only to json and cbm. Lossguide — A tree is built leaf by leaf until the specified maximum number of leaves is reached. On each iteration, non-terminal leaf with the best loss improvement is split. Note. Models with this growing policy can not be analyzed using the PredictionDiff feature importance and can be exported only to json and cbm.	SymmetricTree	CPU and GPU
min_data_in_leaf Alias: min_child_samples	int	The minimum number of training samples in a leaf. CatBoost does not search for new splits in leaves with samples count less than the specified value. Can be used only with the Lossguide and Depthwise growing policies.	1	CPU and GPU
max_leaves Alias: num_leaves	int	The maximum number of leafs in the resulting tree. Can be used only with the Lossguide growing policy. Tip. It is not recommended to use values greater than 64, since it can significantly slow down the training process.	31	CPU and GPU
ignored_features	list	Feature indices or names to exclude from the training. It is assumed that all passed values are feature names if at least one of the passed values can not be converted to a number or a range of numbers. Otherwise, it is assumed that all passed values are feature indices. Specifics: Non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: `cat feature<\t>label value<\t>num feature`. So for the row `rock<\t>0<\t>42`, the identifier for the “rock” feature is 0, and for the “42” feature it's 1. The addition of a non-existing feature name raises an error. For example, use the following construction if features indexed 1, 2, 7, 42, 43, 44, 45, should be ignored: `[1,2,7,42,43,44,45]`	None	CPU and GPU
one_hot_max_size	int	Use one-hot encoding for all categorical features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. See details.	The default value depends on various conditions: N/A if training is performed on CPU in Pairwise scoring mode 255 if training is performed on GPU and the selected Ctr types require target data that is not available during the training 10 if training is performed in Ranking mode 2 if none of the conditions above is met	CPU and GPU
has_time	bool	Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data.	False (not used; generates random permutations)	CPU and GPU
rsm Alias: colsample_bylevel	float (0;1]	Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1].	None (set to 1)	CPU
nan_mode	string	The method for processing missing values in the input dataset. Possible values: “Forbidden” — Missing values are not supported, their presence is interpreted as an error. “Min” — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees. “Max” — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees. Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree. Note. The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter.	Min	CPU and GPU
input_borders	string	Load Custom quantization borders and missing value modes from a file (do not generate them). Borders are automatically generated before training if this parameter is not set.	None	CPU and GPU
output_borders	string	Save quantization borders for the current dataset to a file. Refer to the file format description.	None	CPU and GPU
fold_permutation_block	int	Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation.	1	CPU and GPU
leaf_estimation_method	string	The method used to calculate the values in leaves. Possible values: Newton Gradient Exact	Depends on the mode and the selected loss function: Regression with Quantile or MAE loss functions — One Exact iteration. Regression with any loss function but Quantile or MAE – One Gradient iteration. Classification mode – Ten Newton iterations. Multiclassification mode – One Newton iteration.	The Exact method is available only on CPU All other methods are available on both CPU and GPU
leaf_estimation_iterations	int	CatBoost might calculate leaf values using several gradient or newton steps instead of a single one. This parameter regulates how many steps are done in every tree when calculating leaf values.	None (Depends on the training objective)	CPU and GPU
leaf_estimation_backtracking	string	When the value of the leaf_estimation_iterations parameter is greater than 1, CatBoost makes several gradient or newton steps when calculating the resulting leaf values of a tree. The behaviour differs depending on the value of this parameter: No — Every next step is a regular gradient or newton step: the gradient step is calculated and added to the leaf. Any other value —Backtracking is used. In this case, before adding a step, a condition is checked. If the condition is not met, then the step size is reduced (divided by 2), otherwise the step is added to the leaf. When leaf_estimation_iterations is set to n, the leaf estimation iterations are calculated as follows: each iteration is either an addition of the next step to the leaf value, or it's a scaling of the leaf value. Scaling counts as a separate iteration. Thus, it is possible that instead of having n gradient steps, the algorithm makes a single gradient step that is reduced n times, which means that it is divided by times. Possible values: No — Do not use backtracking. Supported on CPU and GPU. AnyImprovement — Reduce the descent step up to the point when the loss function value is smaller than it was on the previous step. The trial reduction factors are 2, 4, 8, and so on. Supported on CPU and GPU. Armijo — Reduce the descent step until the Armijo condition is met. Supported only on GPU.	AnyImprovement	Depends on the selected value
fold_len_multiplier	float	Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, ), each iteration takes a quadratic amount of memory and time for the number of objects in the iteration. Thus, low values are possible only when there is a small number of objects.	2	CPU and GPU
approx_on_full_history	bool	The principles for calculating the approximated values. Possible values: “False” — Use only а fraction of the fold for calculating the approximated values. The size of the fraction is calculated as follows: , where X is the specified coefficient for changing the length of folds. This mode is faster and in rare cases slightly less accurate “True” — Use all the preceding rows in the fold for calculating the approximated values. This mode is slower and in rare cases slightly more accurate.	False	CPU
class_weights	list dict collections.OrderedDict	Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving binary classification and multiclassification problems. Tip. For imbalanced datasets with binary classification the weight multiplier can be set to 1 for class 0 and to for class 1. For example, `class_weights=[0.1, 4]` multiplies the weights of objects from class 0 by 0.1 and the weights of objects from class 1 by 4. If class labels are not standard consecutive integers [0, 1 ... class_count-1], use the dict or collections.OrderedDict type with label to weight mapping. For example, `class_weights={'a': 1.0, 'b': 0.5, 'c': 2.0}` multiplies the weights of objects with class label `a` by 1.0, the weights of objects with class label `b` by 0.5 and the weights of objects with class label `c` by 2.0. The dictionary form can also be used with standard consecutive integers class labels for additional readability. For example: `class_weights={0: 1.0, 1: 0.5, 2: 2.0}`. Note. Class labels are extracted from dictionary keys for the following types of class_weights: dict collections.OrderedDict (when the order of classes in the model is important) The class_names parameter can be skipped when using these types. Restriction. Do not use this parameter with auto_class_weights and scale_pos_weight.	None (the weight for all classes is set to 1)	CPU and GPU
class_names	list of strings	Classes names. Allows to redefine the default values when using the MultiClass and Logloss metrics. If the upper limit for the numeric class label is specified, the number of classes names should match this value. Attention. The quantity of classes names must match the quantity of classes weights specified in the --class-weights parameter and the number of classes specified in the --classes-count parameter. Format: `<name for class 1>,..,<name for class N>` For example: `smartphone,touchphone,tablet`	None	CPU and GPU
auto_class_weights	string	Automatically calculate class weights based either on the total weight or the total number of objects in each class. The values are used as multipliers for the object weights. Supported values: None — All class weights are set to 1 Balanced: SqrtBalanced: Restriction. Do not use this parameter with class_weights and scale_pos_weight.	None — All class weights are set to 1	CPU and GPU
scale_pos_weight	float	The weight for class 1 in binary classification. The value is used as a multiplier for the weights of objects from class 1. Tip. For imbalanced datasets, the weight multiplier can be set to Restriction. Do not use this parameter with auto_class_weights and class_weights.	1.0	CPU and GPU
boosting_type	string	Boosting scheme. Possible values: Ordered — Usually provides better quality on small datasets, but it may be slower than the Plain scheme. Plain — The classic gradient boosting scheme.	Depends on the processing unit type, the number of objects in the training dataset and the selected learning mode	CPU and GPU Only the Plain mode is supported for the MultiClass loss on GPU
boost_from_average	bool	Initialize approximate values by best constant value for the specified loss function. Sets the value of bias to the initial best constant value. Available for the following loss functions: RMSE Logloss CrossEntropy Quantile MAE MAPE	Depends on the selected loss function: True for RMSE, Quantile, MAE, MAPE False for all other loss functions	CPU and GPU
langevin	bool	Enables the Stochastic Gradient Langevin Boosting mode. Refer to the SGLB: Stochastic Gradient Langevin Boosting paper for details.	False	CPU
diffusion_temperature	float	The diffusion temperature of the Stochastic Gradient Langevin Boosting mode. Only non-negative values are supported.	10000	CPU
posterior_sampling	bool	If this parameter is set several options are specified as follows and model parameters are checked to obtain uncertainty predictions with good theoretical properties. Specifies options: `Langevin`: true, `DiffusionTemperature`: objects in learn pool count, `ModelShrinkRate`: 1 / (2. * objects in learn pool count)	False	CPU only
allow_const_label	bool	Use it to train models with datasets that have equal label values for all objects.	False	CPU and GPU
score_function	string	The score type used to select the next split during the tree construction. Possible values: Cosine (do not use this score type with the Lossguide tree growing policy) L2 NewtonCosine (do not use this score type with the Lossguide tree growing policy) NewtonL2	Cosine	The supported score functions vary depending on the processing unit type: GPU — All score types CPU — Cosine, L2
monotone_constraints	list of strings string dict list	Impose monotonic constraints on numerical features. Possible values: “1” — Increasing constraint on the feature. The algorithm forces the model to be a non-decreasing function of this features. “-1” — Decreasing constraint on the feature. The algorithm forces the model to be a non-increasing function of this features. “0” — constraints are disabled. Supported formats for setting the value of this parameter (all feature indices are zero-based): Set constraints individually for each feature as a string (the number of features is n). Format Zero constraints for features at the end of the list may be dropped. In this example an increasing constraint is set on the first feature and a decreasing one on the third. Constraints are disabled for all other features. Set constraints individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature index 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set constraints individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	None	CPU
feature_weights	list numpy.ndarray string dict	Per-feature multiplication weights used when choosing the best split. The score of each candidate is multiplied by the weights of features from the current split. Non-negative float values are supported for each weight. Supported formats for setting the value of this parameter: Set the multiplication weight for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Multiplication weights equal to 1 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the multiplication weight individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the multiplication weight individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	1 for all features	CPU
first_feature_use_penalties	list numpy.ndarray string dict	Per-feature penalties for the first occurrence of the feature in the model. The given value is subtracted from the score if the current candidate is the first one to include the feature in the model. Refer to the Per-object and per-feature penalties section for details on applying different score penalties. Non-negative float values are supported for each penalty. Set the penalty for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Penalties equal to 0 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the penalty individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the penalty individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	0 for all features	CPU
penalties_coefficient	float	A single-value common coefficient to multiply all penalties. Non-negative values are supported.	1	CPU
per_object_feature_penalties	list numpy.ndarray string dict	Per-object penalties for the first use of the feature for the object. The given value is multiplied by the number of objects that are divided by the current split and use the feature for the first time. Refer to the Per-object and per-feature penalties section for details on applying different score penalties. Non-negative float values are supported for each penalty. Set the penalty for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Penalties equal to 0 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the penalty individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the penalty individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	0 for all objects	CPU
model_shrink_rate	float	The constant used to calculate the coefficient for multiplying the model on each iteration. The actual model shrinkage coefficient calculated at each iteration depends on the value of the model_shrink_mode parameter. The resulting value of the coefficient should be always in the range (0, 1].	The default value depends on the values of the following parameters: model_shrink_mode monotone_constraints	CPU
model_shrink_mode	string	Determines how the actual model shrinkage coefficient is calculated at each iteration. Possible values: Constant: is the value of the model_shrink_rate parameter. is the value of the learning_rate parameter Decreasing: is the value of the model_shrink_rate parameter. is the identifier of the iteration.	Constant	CPU
Text processing parameters
tokenizers	list of json	Tokenizers used to preprocess Text type feature columns before creating the dictionary. Format: `[{ 'TokenizerId1': <value>, 'option_name_1': <value>, .. 'option_name_N': <value>,}]` TokenizerId — The unique name of the tokenizer. option_name — One of the supported tokenizer options. Note. This parameter works with dictionaries and feature_calcers parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
dictionaries	list of json	Dictionaries used to preprocess Text type feature columns. Format: `[{ 'dictionaryId1': <value>, 'option_name_1': <value>, .. 'option_name_N': <value>,}]` DictionaryId — The unique name of dictionary. option_name — One of the supported dictionary options. Note. This parameter works with tokenizers and feature_calcers parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
feature_calcers	list of strings	Feature calcers used to calculate new features based on preprocessed Text type feature columns. Format: `['FeatureCalcerName[:option_name=option_value], ]` FeatureCalcerName — The required feature calcer. option_name — Additional options for feature calcers. Refer to the list of supported calcers for details on options available for each of them. Note. This parameter works with tokenizers and dictionaries parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
text_processing	json	A JSON specification of tokenizers, dictionaries and feature calcers, which determine how text features are converted into a list of float features. Example Refer to the description of the following parameters for details on supported values: tokenizers dictionaries feature_calcers Restriction. Do not use this parameter with the following ones: tokenizers dictionaries feature_calcers	Default value	GPU
Overfitting detection settings
early_stopping_rounds	int	Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value.	False	CPU and GPU
od_type	string	The type of the overfitting detector to use. Possible values: IncToDec Iter	IncToDec	CPU and GPU
od_pval	float	The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type.	0 (the overfitting detection is turned off)	CPU and GPU
od_wait	int	The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value. Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.	20	CPU and GPU
Quantization settings
target_border	float	If set, defines the border for converting target values to 0 and 1. Depending on the specified value: the target is converted to 0 the target is converted to 1	None	CPU and GPU
border_count Alias: max_bin	int	The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively.	The default value depends on the processing unit type and other parameters: CPU: 254 GPU in PairLogitPairwise and YetiRankPairwise modes: 32 GPU in all other modes: 128	CPU and GPU
feature_border_type	string	The quantization mode for numerical features. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum	GreedyLogSum	CPU and GPU
per_float_feature_quantization	string list of strings	The quantization description for the specified feature or list of features. Description format for a single feature: `FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]` Examples: `per_float_feature_quantization='0:border_count=1024'` In this example, the feature indexed 0 has 1024 borders. `per_float_feature_quantization=['0:border_count=1024', '1:border_count=1024']` In this example, features indexed 0 and 1 have 1024 borders.	None	CPU and GPU
Multiclassification settings
classes_count	int	The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified the labels for all classes in the input dataset should be smaller than the given value	None. Calculation principles	CPU and GPU
Performance settings
thread_count	int	The number of threads to use during the training. For CPU Optimizes the speed of execution. This parameter doesn't affect results. For GPU The given value is used for reading the data from the hard drive and does not affect the training. During the training one main thread and one thread for each GPU are used.	-1 (the number of threads is equal to the number of processor cores)	CPU and GPU
used_ram_limit	int	Attempt to limit the amount of used CPU RAM. Restriction. This option affects only the CTR calculation memory usage. In some cases it is impossible to limit the amount of CPU RAM used in accordance with the specified value. Format: `<size><measure of information>` Supported measures of information (non case-sensitive): MB KB GB For example: `2gb`	None (memory usage is no limited)	CPU
gpu_ram_part	float	How much of the GPU RAM to use for training.	0.95	GPU
pinned_memory_size	int	How much pinned (page-locked) CPU RAM to use per GPU. The value should be a positive integer or `inf`. Measure of information can be defined for integer values. Format: `<size><measure of information>` Supported measures of information (non case-sensitive): MB KB GB For example: `2gb`	1073741824	GPU
gpu_cat_features_storage	string	The method for storing the categorical features' values. Possible values: CpuPinnedMemory GpuRam Tip. Use the CpuPinnedMemory value if feature combinations are used and the available GPU RAM is not sufficient.	None (set to GpuRam)	GPU
data_partition	string	The method for splitting the input dataset between multiple workers. Possible values: FeatureParallel — Split the input dataset by features and calculate the value of each of these features on a certain GPU. For example: GPU0 is used to calculate the values of features indexed 0, 1, 2 GPU1 is used to calculate the values of features indexed 3, 4, 5, etc. DocParallel — Split the input dataset by objects and calculate all features for each of these objects on a certain GPU. It is recommended to use powers of two as the value for optimal performance. For example: GPU0 is used to calculate all features for objects indexed `object_1`, `object_2` GPU1 is used to calculate all features for objects indexed `object_3`, `object_4`, etc.	Depends on the learning mode and the input dataset	GPU
Processing unit settings
task_type	string	The processing unit type to use for training. Possible values: CPU GPU	CPU	CPU and GPU
devices	string	IDs of the GPU devices to use for training (indices are zero-based). Format `<unit ID>` for one device (for example, `3`) `<unit ID1>:<unit ID2>:..:<unit IDN>` for multiple devices (for example, `devices='0:1:3'`) `<unit ID1>-<unit IDN>` for a range of devices (for example, `devices='0-3'`)	NULL (all GPU devices are used if the corresponding processing unit type is selected)	GPU
Visualization settings
name	string	The experiment name to display in visualization tools.	experiment	CPU and GPU
Output settings
logging_level	string	The logging level to output to stdout. Possible values: Silent — Do not output any logging information to stdout. Verbose — Output the following data to stdout: optimized metric elapsed time of training remaining time of training Info — Output additional information and the number of trees. Debug — Output debugging information.	None (corresponds to the Verbose logging level)	CPU and GPU
metric_period	int	The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.	1	CPU and GPU
verbose Alias: verbose_eval	bool int	The purpose of this parameter depends on the type of the given value: bool — Defines the logging level: “True” corresponds to the Verbose logging level “False” corresponds to the Silent logging level int — Use the Verbose logging level and set the logging period to the value of this parameter. Restriction. Do not use this parameter with the logging_level parameter.	1	CPU and GPU
train_final_model	bool	If specified, then the model with selected features will be trained after features selection.	True	CPU and GPU
train_dir	string	The directory for storing the files generated during training.	catboost_info	CPU and GPU
model_size_reg	float	The model size regularization coefficient. The larger the value, the smaller the model size. Refer to the Model size regularization coefficient section for details. Possible values are in the range . This regularization is needed only for models with categorical features (other models are small). Models with categorical features might weight tens of gigabytes or more if categorical features have a lot of values. If the value of the regularizer differs from zero, then the usage of categorical features or feature combinations with a lot of values has a penalty, so less of them are used in the resulting model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option.	None (Turned on and set to 0.5)	CPU and GPU
allow_writing_files	bool	Allow to write analytical and snapshot files during training. If set to “False”, the snapshot and data visualization tools are unavailable.	True	CPU and GPU
save_snapshot	bool	Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval parameter to change this period. Note. This parameter is not supported in the params parameter of the cv function.	None	CPU and GPU
snapshot_file	string	The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system: Missing — Write information about training progress to the specified file. Exists — Load data from the specified file and continue training from where it left off. Note. This parameter is not supported in the params parameter of the cv function.	experiment...	CPU and GPU
snapshot_interval	int	The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. Note. This parameter is not supported in the params parameter of the cv function.	600	CPU and GPU
roc_file	string	The name of the output file to save the ROC curve points to. This parameter can only be set in cross-validation mode if the Logloss loss function is selected. The ROC curve points are calculated for the test fold. The output file is saved to the catboost_info directory.	None (the file is not saved)	CPU and GPU
CTR settings
simple_ctr	string	Quantization settings for simple categorical features. Use this parameter to specify the principles for defining the class of the object for regression tasks. By default, it is considered that an object belongs to the positive class if its' label value is greater than the median of all label values of the dataset. Format: `['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator. Examples		CPU and GPU
combinations_ctr	string	Quantization settings for combinations of categorical features. `['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Uniform Median `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.		CPU and GPU
per_feature_ctr	string	Per-feature quantization settings for categorical features. `['FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `FeatureId` — A zero-based feature identifier. `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.		CPU and GPU
ctr_target_border_count	int	The maximum number of borders to use in target quantization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively. The value of the `TargetBorderCount` component overrides this parameter if it is specified for one of the following parameters: simple_ctr combinations_ctr per_feature_ctr	Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise	CPU and GPU
counter_calc_method	string	The method for calculating the Counter CTR type. Possible values: SkipTest — Objects from the validation dataset are not considered at all Full — All objects from both learn and validation datasets are considered	None (Full is used)	CPU and GPU
max_ctr_complexity	int	The maximum number of features that can be combined. Each resulting combination consists of one or more categorical features and can optionally contain binary features in the following form: “numeric feature > value”.	The default value depends on the processing unit type, combined features' type and the selected mode: GPU for categorical features in MultiClass and MultiClassOneVsAll modes: 1 In all other cases: 4	CPU and GPU
ctr_leaf_count_limit	int	The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows: The leaves are sorted by the frequency of the values. The top N leaves are selected, where N is the value specified in the parameter. All leaves starting from N+1 are discarded. This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected.	None The number of different category values is not limited	CPU
store_all_simple_ctr	bool	Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. There is no point in using this parameter without the ctr_leaf_count_limit parameter.	None (set to False) Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features	CPU
final_ctr_computation_mode	string	Final CTR computation mode. Possible values: Default — Compute final CTRs for learn and validation datasets. Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.	Default	CPU and GPU


Parameter	Type	Description	Default value	Supported processing units
Common parameters
loss_function Alias: objective	string object	The metric to use in training. The specified value also determines the machine learning problem to solve. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics A custom python object can also be set as the value of this parameter (see an example). For example, use the following construction to calculate the value of Quantile with the coefficient : `Quantile:alpha=0.1`	Depends on the class	CPU and GPU
custom_metric	string list of strings	Metric values to output during training. These functions are not optimized and are displayed for informational purposes only. Some metrics support optional parameters (see the Objectives and metrics section for details on each metric).. Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics Examples: Calculate the value of CrossEntropy: `CrossEntropy` Calculate the value of в with the coefficient `Quantile:alpha=0.1` Calculate the values of Logloss and AUC: `['Logloss', 'AUC']` Values of all custom metrics for learn and validation datasets are saved to the Metric output files (learn_error.tsv and test_error.tsv respectively). The directory for these files is specified in the --train-dir (train_dir) parameter. Use the visualization tools to see a live chart with the dynamics of the specified metrics.	None	CPU and GPU
eval_metric	string object	The metric used for overfitting detection (if enabled) and best model selection (if enabled). Some metrics support optional parameters (see the Objectives and metrics section for details on each metric). Format: `<Metric>[:<parameter 1>=<value>;..;<parameter N>=<value>]` Supported metrics A user-defined function can also be set as the value (see an example). Examples: `R2`	Optimized objective is used	CPU and GPU
iterations Aliases: num_boost_round n_estimators num_trees	int	The maximum number of trees that can be built when solving machine learning problems. When using other parameters that limit the number of iterations, the final number of trees may be less than the number specified in this parameter.	1000	CPU and GPU
learning_rate Alias: eta	float	The learning rate. Used for reducing the gradient step.	The default value is defined automatically for Logloss, MultiClass & RMSE loss functions depending on the number of iterations if none of these parameters is set. In this case, the selected learning rate is printed to stdout and saved in the model. In other cases, the default value is 0.03.	CPU and GPU
random_seed Alias: random_state	int	The random seed used for training.	None (0)	CPU and GPU
l2_leaf_reg Alias: reg_lambda	float	Coefficient at the L2 regularization term of the cost function. Any positive value is allowed.	3.0	CPU and GPU
bootstrap_type	string	Bootstrap type. Defines the method for sampling the weights of objects. Supported methods: Bayesian Bernoulli MVS Poisson (supported for GPU only) No	The default value depends on the selected mode and processing unit type	CPU and GPU
bagging_temperature	float	Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range . The higher the value the more aggressive the bagging is. This parameter can be used if the selected bootstrap type is Bayesian.	1	CPU and GPU
subsample	float	Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: Poisson Bernoulli MVS	The default value depends on the dataset size and the bootstrap type: Datasets with less than 100 objects — 1 Datasets with 100 objects or more: Poisson, Bernoulli — 0.66 MVS — 0.8	CPU and GPU
sampling_frequency	string	Frequency to sample weights and objects when building trees. Supported values: PerTree — Before constructing each new tree PerTreeLevel — Before choosing each new split of a tree	PerTreeLevel	CPU
sampling_unit	String	The sampling scheme. Possible values: Object — The weight of the i-th object is used for sampling the corresponding object. Group — The weight of the group is used for sampling each object from the group .	Object	CPU and GPU
mvs_reg	float	Affects the weight of the denominator and can be used for balancing between the importance and Bernoulli sampling (setting it to 0 implies importance sampling and to - Bernoulli). Note. This parameter is supported only for the MVS sampling method (the bootstrap_type parameter must be set to MVS).	The value is set based on the gradient distribution on the current iteration	CPU
random_strength	float	The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. The value of this parameter is used when selecting splits. On every iteration each possible split gets a score (for example, the score indicates how much adding this split will improve the loss function for the training dataset). The split with the highest score is selected. The scores have no randomness. A normally distributed random variable is added to the score of the feature. It has a zero mean and a variance that decreases during the training. The value of this parameter is the multiplier of the variance. Note. This parameter is not supported for the following loss functions: QueryCrossEntropy YetiRankPairwise PairLogitPairwise	1	CPU
use_best_model	bool	If this parameter is set, the number of trees that are saved in the resulting model is defined as follows: Build the number of trees defined by the training parameters. Use the validation dataset to identify the iteration with the optimal value of the metric specified in --eval-metric (eval_metric). No trees are saved after this iteration. This option requires a validation dataset to be provided.	True if a validation set is input (the eval_set parameter is defined) and at least one of the label values of objects in this set differs from the others. False otherwise.	CPU and GPU
best_model_min_trees	int	The minimal number of trees that the best model should have. If set, the output model contains at least the given number of trees even if the best model is located within these trees. Should be used with the use_best_model parameter.	None (The minimal number of trees for the best model is not set)	CPU and GPU
depth Alias: max_depth	int	Depth of the tree. The range of supported values depends on the processing unit type and the type of the selected loss function: CPU — Any integer up to 16. GPU — Any integer up to 8 pairwise modes (YetiRank, PairLogitPairwise and QueryCrossEntropy) and up to 16 for all other loss functions.	6 (16 if the growing policy is set to Lossguide)	CPU and GPU
grow_policy	string	The tree growing policy. Defines how to perform greedy tree construction. Possible values: SymmetricTree —A tree is built level by level until the specified depth is reached. On each iteration, all leaves from the last tree level are split with the same condition. The resulting tree structure is always symmetric. Depthwise — A tree is built level by level until the specified depth is reached. On each iteration, all non-terminal leaves from the last tree level are split. Each leaf is split by condition with the best loss improvement. Note. Models with this growing policy can not be analyzed using the PredictionDiff feature importance and can be exported only to json and cbm. Lossguide — A tree is built leaf by leaf until the specified maximum number of leaves is reached. On each iteration, non-terminal leaf with the best loss improvement is split. Note. Models with this growing policy can not be analyzed using the PredictionDiff feature importance and can be exported only to json and cbm.	SymmetricTree	CPU and GPU
min_data_in_leaf Alias: min_child_samples	int	The minimum number of training samples in a leaf. CatBoost does not search for new splits in leaves with samples count less than the specified value. Can be used only with the Lossguide and Depthwise growing policies.	1	CPU and GPU
max_leaves Alias: num_leaves	int	The maximum number of leafs in the resulting tree. Can be used only with the Lossguide growing policy. Tip. It is not recommended to use values greater than 64, since it can significantly slow down the training process.	31	CPU and GPU
ignored_features	list	Feature indices or names to exclude from the training. It is assumed that all passed values are feature names if at least one of the passed values can not be converted to a number or a range of numbers. Otherwise, it is assumed that all passed values are feature indices. Specifics: Non-negative indices that do not match any features are successfully ignored. For example, if five features are defined for the objects in the dataset and this parameter is set to “42”, the corresponding non-existing feature is successfully ignored. The identifier corresponds to the feature's index. Feature indices used in train and feature importance are numbered from 0 to featureCount – 1. If a file is used as input data then any non-feature column types are ignored when calculating these indices. For example, each row in the input file contains data in the following order: `cat feature<\t>label value<\t>num feature`. So for the row `rock<\t>0<\t>42`, the identifier for the “rock” feature is 0, and for the “42” feature it's 1. The addition of a non-existing feature name raises an error. For example, use the following construction if features indexed 1, 2, 7, 42, 43, 44, 45, should be ignored: `[1,2,7,42,43,44,45]`	None	CPU and GPU
one_hot_max_size	int	Use one-hot encoding for all categorical features with a number of different values less than or equal to the given parameter value. Ctrs are not calculated for such features. See details.	The default value depends on various conditions: N/A if training is performed on CPU in Pairwise scoring mode 255 if training is performed on GPU and the selected Ctr types require target data that is not available during the training 10 if training is performed in Ranking mode 2 if none of the conditions above is met	CPU and GPU
has_time	bool	Use the order of objects in the input data (do not perform random permutations during the Transforming categorical features to numerical features and Choosing the tree structure stages). The Timestamp column type is used to determine the order of objects if specified in the input data.	False (not used; generates random permutations)	CPU and GPU
rsm Alias: colsample_bylevel	float (0;1]	Random subspace method. The percentage of features to use at each split selection, when features are selected over again at random. The value must be in the range (0;1].	None (set to 1)	CPU
nan_mode	string	The method for processing missing values in the input dataset. Possible values: “Forbidden” — Missing values are not supported, their presence is interpreted as an error. “Min” — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees. “Max” — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees. Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree. Note. The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter.	Min	CPU and GPU
input_borders	string	Load Custom quantization borders and missing value modes from a file (do not generate them). Borders are automatically generated before training if this parameter is not set.	None	CPU and GPU
output_borders	string	Save quantization borders for the current dataset to a file. Refer to the file format description.	None	CPU and GPU
fold_permutation_block	int	Objects in the dataset are grouped in blocks before the random permutations. This parameter defines the size of the blocks. The smaller is the value, the slower is the training. Large values may result in quality degradation.	1	CPU and GPU
leaf_estimation_method	string	The method used to calculate the values in leaves. Possible values: Newton Gradient Exact	Depends on the mode and the selected loss function: Regression with Quantile or MAE loss functions — One Exact iteration. Regression with any loss function but Quantile or MAE – One Gradient iteration. Classification mode – Ten Newton iterations. Multiclassification mode – One Newton iteration.	The Exact method is available only on CPU All other methods are available on both CPU and GPU
leaf_estimation_iterations	int	CatBoost might calculate leaf values using several gradient or newton steps instead of a single one. This parameter regulates how many steps are done in every tree when calculating leaf values.	None (Depends on the training objective)	CPU and GPU
leaf_estimation_backtracking	string	When the value of the leaf_estimation_iterations parameter is greater than 1, CatBoost makes several gradient or newton steps when calculating the resulting leaf values of a tree. The behaviour differs depending on the value of this parameter: No — Every next step is a regular gradient or newton step: the gradient step is calculated and added to the leaf. Any other value —Backtracking is used. In this case, before adding a step, a condition is checked. If the condition is not met, then the step size is reduced (divided by 2), otherwise the step is added to the leaf. When leaf_estimation_iterations is set to n, the leaf estimation iterations are calculated as follows: each iteration is either an addition of the next step to the leaf value, or it's a scaling of the leaf value. Scaling counts as a separate iteration. Thus, it is possible that instead of having n gradient steps, the algorithm makes a single gradient step that is reduced n times, which means that it is divided by times. Possible values: No — Do not use backtracking. Supported on CPU and GPU. AnyImprovement — Reduce the descent step up to the point when the loss function value is smaller than it was on the previous step. The trial reduction factors are 2, 4, 8, and so on. Supported on CPU and GPU. Armijo — Reduce the descent step until the Armijo condition is met. Supported only on GPU.	AnyImprovement	Depends on the selected value
fold_len_multiplier	float	Coefficient for changing the length of folds. The value must be greater than 1. The best validation result is achieved with minimum values. With values close to 1 (for example, ), each iteration takes a quadratic amount of memory and time for the number of objects in the iteration. Thus, low values are possible only when there is a small number of objects.	2	CPU and GPU
approx_on_full_history	bool	The principles for calculating the approximated values. Possible values: “False” — Use only а fraction of the fold for calculating the approximated values. The size of the fraction is calculated as follows: , where X is the specified coefficient for changing the length of folds. This mode is faster and in rare cases slightly less accurate “True” — Use all the preceding rows in the fold for calculating the approximated values. This mode is slower and in rare cases slightly more accurate.	False	CPU
class_weights	list dict collections.OrderedDict	Class weights. The values are used as multipliers for the object weights. This parameter can be used for solving binary classification and multiclassification problems. Tip. For imbalanced datasets with binary classification the weight multiplier can be set to 1 for class 0 and to for class 1. For example, `class_weights=[0.1, 4]` multiplies the weights of objects from class 0 by 0.1 and the weights of objects from class 1 by 4. If class labels are not standard consecutive integers [0, 1 ... class_count-1], use the dict or collections.OrderedDict type with label to weight mapping. For example, `class_weights={'a': 1.0, 'b': 0.5, 'c': 2.0}` multiplies the weights of objects with class label `a` by 1.0, the weights of objects with class label `b` by 0.5 and the weights of objects with class label `c` by 2.0. The dictionary form can also be used with standard consecutive integers class labels for additional readability. For example: `class_weights={0: 1.0, 1: 0.5, 2: 2.0}`. Note. Class labels are extracted from dictionary keys for the following types of class_weights: dict collections.OrderedDict (when the order of classes in the model is important) The class_names parameter can be skipped when using these types. Restriction. Do not use this parameter with auto_class_weights and scale_pos_weight.	None (the weight for all classes is set to 1)	CPU and GPU
class_names	list of strings	Classes names. Allows to redefine the default values when using the MultiClass and Logloss metrics. If the upper limit for the numeric class label is specified, the number of classes names should match this value. Attention. The quantity of classes names must match the quantity of classes weights specified in the --class-weights parameter and the number of classes specified in the --classes-count parameter. Format: `<name for class 1>,..,<name for class N>` For example: `smartphone,touchphone,tablet`	None	CPU and GPU
auto_class_weights	string	Automatically calculate class weights based either on the total weight or the total number of objects in each class. The values are used as multipliers for the object weights. Supported values: None — All class weights are set to 1 Balanced: SqrtBalanced: Restriction. Do not use this parameter with class_weights and scale_pos_weight.	None — All class weights are set to 1	CPU and GPU
scale_pos_weight	float	The weight for class 1 in binary classification. The value is used as a multiplier for the weights of objects from class 1. Tip. For imbalanced datasets, the weight multiplier can be set to Restriction. Do not use this parameter with auto_class_weights and class_weights.	1.0	CPU and GPU
boosting_type	string	Boosting scheme. Possible values: Ordered — Usually provides better quality on small datasets, but it may be slower than the Plain scheme. Plain — The classic gradient boosting scheme.	Depends on the processing unit type, the number of objects in the training dataset and the selected learning mode	CPU and GPU Only the Plain mode is supported for the MultiClass loss on GPU
boost_from_average	bool	Initialize approximate values by best constant value for the specified loss function. Sets the value of bias to the initial best constant value. Available for the following loss functions: RMSE Logloss CrossEntropy Quantile MAE MAPE	Depends on the selected loss function: True for RMSE, Quantile, MAE, MAPE False for all other loss functions	CPU and GPU
langevin	bool	Enables the Stochastic Gradient Langevin Boosting mode. Refer to the SGLB: Stochastic Gradient Langevin Boosting paper for details.	False	CPU
diffusion_temperature	float	The diffusion temperature of the Stochastic Gradient Langevin Boosting mode. Only non-negative values are supported.	10000	CPU
posterior_sampling	bool	If this parameter is set several options are specified as follows and model parameters are checked to obtain uncertainty predictions with good theoretical properties. Specifies options: `Langevin`: true, `DiffusionTemperature`: objects in learn pool count, `ModelShrinkRate`: 1 / (2. * objects in learn pool count)	False	CPU only
allow_const_label	bool	Use it to train models with datasets that have equal label values for all objects.	False	CPU and GPU
score_function	string	The score type used to select the next split during the tree construction. Possible values: Cosine (do not use this score type with the Lossguide tree growing policy) L2 NewtonCosine (do not use this score type with the Lossguide tree growing policy) NewtonL2	Cosine	The supported score functions vary depending on the processing unit type: GPU — All score types CPU — Cosine, L2
monotone_constraints	list of strings string dict list	Impose monotonic constraints on numerical features. Possible values: “1” — Increasing constraint on the feature. The algorithm forces the model to be a non-decreasing function of this features. “-1” — Decreasing constraint on the feature. The algorithm forces the model to be a non-increasing function of this features. “0” — constraints are disabled. Supported formats for setting the value of this parameter (all feature indices are zero-based): Set constraints individually for each feature as a string (the number of features is n). Format Zero constraints for features at the end of the list may be dropped. In this example an increasing constraint is set on the first feature and a decreasing one on the third. Constraints are disabled for all other features. Set constraints individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature index 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set constraints individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	None	CPU
feature_weights	list numpy.ndarray string dict	Per-feature multiplication weights used when choosing the best split. The score of each candidate is multiplied by the weights of features from the current split. Non-negative float values are supported for each weight. Supported formats for setting the value of this parameter: Set the multiplication weight for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Multiplication weights equal to 1 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the multiplication weight individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the multiplication weight individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	1 for all features	CPU
first_feature_use_penalties	list numpy.ndarray string dict	Per-feature penalties for the first occurrence of the feature in the model. The given value is subtracted from the score if the current candidate is the first one to include the feature in the model. Refer to the Per-object and per-feature penalties section for details on applying different score penalties. Non-negative float values are supported for each penalty. Set the penalty for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Penalties equal to 0 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the penalty individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the penalty individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	0 for all features	CPU
penalties_coefficient	float	A single-value common coefficient to multiply all penalties. Non-negative values are supported.	1	CPU
per_object_feature_penalties	list numpy.ndarray string dict	Per-object penalties for the first use of the feature for the object. The given value is multiplied by the number of objects that are divided by the current split and use the feature for the first time. Refer to the Per-object and per-feature penalties section for details on applying different score penalties. Non-negative float values are supported for each penalty. Set the penalty for each feature as a string (the number of features is n). Format Values should be passed as a parenthesized string of comma-separated values. Penalties equal to 0 at the end of the list may be dropped. In this example the multiplication weight is set to 0.1, 1 and 3 for the first, second and third features respectively. The multiplication weight for all other features is set to 1. Set the penalty individually for each explicitly specified feature as a string (the number of features is n). Format These examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”. Set the penalty individually for each required feature as an array or a dictionary (the number of features is n). Format Array examples. These dictionary examples are identical, given that the name of the feature indexed 2 is “Feature2” and the name of the feature indexed 4 is “Feature4”.	0 for all objects	CPU
model_shrink_rate	float	The constant used to calculate the coefficient for multiplying the model on each iteration. The actual model shrinkage coefficient calculated at each iteration depends on the value of the model_shrink_mode parameter. The resulting value of the coefficient should be always in the range (0, 1].	The default value depends on the values of the following parameters: model_shrink_mode monotone_constraints	CPU
model_shrink_mode	string	Determines how the actual model shrinkage coefficient is calculated at each iteration. Possible values: Constant: is the value of the model_shrink_rate parameter. is the value of the learning_rate parameter Decreasing: is the value of the model_shrink_rate parameter. is the identifier of the iteration.	Constant	CPU
Text processing parameters
tokenizers	list of json	Tokenizers used to preprocess Text type feature columns before creating the dictionary. Format: `[{ 'TokenizerId1': <value>, 'option_name_1': <value>, .. 'option_name_N': <value>,}]` TokenizerId — The unique name of the tokenizer. option_name — One of the supported tokenizer options. Note. This parameter works with dictionaries and feature_calcers parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
dictionaries	list of json	Dictionaries used to preprocess Text type feature columns. Format: `[{ 'dictionaryId1': <value>, 'option_name_1': <value>, .. 'option_name_N': <value>,}]` DictionaryId — The unique name of dictionary. option_name — One of the supported dictionary options. Note. This parameter works with tokenizers and feature_calcers parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
feature_calcers	list of strings	Feature calcers used to calculate new features based on preprocessed Text type feature columns. Format: `['FeatureCalcerName[:option_name=option_value], ]` FeatureCalcerName — The required feature calcer. option_name — Additional options for feature calcers. Refer to the list of supported calcers for details on options available for each of them. Note. This parameter works with tokenizers and dictionaries parameters. For example, if a single tokenizer, three dictionaries and two feature calcers are given, a total of 6 new groups of features are created for each original text feature (). Usage example.	–	GPU
text_processing	json	A JSON specification of tokenizers, dictionaries and feature calcers, which determine how text features are converted into a list of float features. Example Refer to the description of the following parameters for details on supported values: tokenizers dictionaries feature_calcers Restriction. Do not use this parameter with the following ones: tokenizers dictionaries feature_calcers	Default value	GPU
Overfitting detection settings
early_stopping_rounds	int	Sets the overfitting detector type to Iter and stops the training after the specified number of iterations since the iteration with the optimal metric value.	False	CPU and GPU
od_type	string	The type of the overfitting detector to use. Possible values: IncToDec Iter	IncToDec	CPU and GPU
od_pval	float	The threshold for the IncToDec overfitting detector type. The training is stopped when the specified value is reached. Requires that a validation dataset was input. For best results, it is recommended to set a value in the range . The larger the value, the earlier overfitting is detected. Restriction. Do not use this parameter with the Iter overfitting detector type.	0 (the overfitting detection is turned off)	CPU and GPU
od_wait	int	The number of iterations to continue the training after the iteration with the optimal metric value. The purpose of this parameter differs depending on the selected overfitting detector type: IncToDec — Ignore the overfitting detector when the threshold is reached and continue learning for the specified number of iterations after the iteration with the optimal metric value. Iter — Consider the model overfitted and stop training after the specified number of iterations since the iteration with the optimal metric value.	20	CPU and GPU
Quantization settings
target_border	float	If set, defines the border for converting target values to 0 and 1. Depending on the specified value: the target is converted to 0 the target is converted to 1	None	CPU and GPU
border_count Alias: max_bin	int	The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively.	The default value depends on the processing unit type and other parameters: CPU: 254 GPU in PairLogitPairwise and YetiRankPairwise modes: 32 GPU in all other modes: 128	CPU and GPU
feature_border_type	string	The quantization mode for numerical features. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum	GreedyLogSum	CPU and GPU
per_float_feature_quantization	string list of strings	The quantization description for the specified feature or list of features. Description format for a single feature: `FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]` Examples: `per_float_feature_quantization='0:border_count=1024'` In this example, the feature indexed 0 has 1024 borders. `per_float_feature_quantization=['0:border_count=1024', '1:border_count=1024']` In this example, features indexed 0 and 1 have 1024 borders.	None	CPU and GPU
Multiclassification settings
classes_count	int	The upper limit for the numeric class label. Defines the number of classes for multiclassification. Only non-negative integers can be specified. The given integer should be greater than any of the label values. If this parameter is specified the labels for all classes in the input dataset should be smaller than the given value	None. Calculation principles	CPU and GPU
Performance settings
thread_count	int	The number of threads to use during the training. For CPU Optimizes the speed of execution. This parameter doesn't affect results. For GPU The given value is used for reading the data from the hard drive and does not affect the training. During the training one main thread and one thread for each GPU are used.	-1 (the number of threads is equal to the number of processor cores)	CPU and GPU
used_ram_limit	int	Attempt to limit the amount of used CPU RAM. Restriction. This option affects only the CTR calculation memory usage. In some cases it is impossible to limit the amount of CPU RAM used in accordance with the specified value. Format: `<size><measure of information>` Supported measures of information (non case-sensitive): MB KB GB For example: `2gb`	None (memory usage is no limited)	CPU
gpu_ram_part	float	How much of the GPU RAM to use for training.	0.95	GPU
pinned_memory_size	int	How much pinned (page-locked) CPU RAM to use per GPU. The value should be a positive integer or `inf`. Measure of information can be defined for integer values. Format: `<size><measure of information>` Supported measures of information (non case-sensitive): MB KB GB For example: `2gb`	1073741824	GPU
gpu_cat_features_storage	string	The method for storing the categorical features' values. Possible values: CpuPinnedMemory GpuRam Tip. Use the CpuPinnedMemory value if feature combinations are used and the available GPU RAM is not sufficient.	None (set to GpuRam)	GPU
data_partition	string	The method for splitting the input dataset between multiple workers. Possible values: FeatureParallel — Split the input dataset by features and calculate the value of each of these features on a certain GPU. For example: GPU0 is used to calculate the values of features indexed 0, 1, 2 GPU1 is used to calculate the values of features indexed 3, 4, 5, etc. DocParallel — Split the input dataset by objects and calculate all features for each of these objects on a certain GPU. It is recommended to use powers of two as the value for optimal performance. For example: GPU0 is used to calculate all features for objects indexed `object_1`, `object_2` GPU1 is used to calculate all features for objects indexed `object_3`, `object_4`, etc.	Depends on the learning mode and the input dataset	GPU
Processing unit settings
task_type	string	The processing unit type to use for training. Possible values: CPU GPU	CPU	CPU and GPU
devices	string	IDs of the GPU devices to use for training (indices are zero-based). Format `<unit ID>` for one device (for example, `3`) `<unit ID1>:<unit ID2>:..:<unit IDN>` for multiple devices (for example, `devices='0:1:3'`) `<unit ID1>-<unit IDN>` for a range of devices (for example, `devices='0-3'`)	NULL (all GPU devices are used if the corresponding processing unit type is selected)	GPU
Visualization settings
name	string	The experiment name to display in visualization tools.	experiment	CPU and GPU
Output settings
logging_level	string	The logging level to output to stdout. Possible values: Silent — Do not output any logging information to stdout. Verbose — Output the following data to stdout: optimized metric elapsed time of training remaining time of training Info — Output additional information and the number of trees. Debug — Output debugging information.	None (corresponds to the Verbose logging level)	CPU and GPU
metric_period	int	The frequency of iterations to calculate the values of objectives and metrics. The value should be a positive integer. The usage of this parameter speeds up the training. Note. It is recommended to increase the value of this parameter to maintain training speed if a GPU processing unit type is used.	1	CPU and GPU
verbose Alias: verbose_eval	bool int	The purpose of this parameter depends on the type of the given value: bool — Defines the logging level: “True” corresponds to the Verbose logging level “False” corresponds to the Silent logging level int — Use the Verbose logging level and set the logging period to the value of this parameter. Restriction. Do not use this parameter with the logging_level parameter.	1	CPU and GPU
train_final_model	bool	If specified, then the model with selected features will be trained after features selection.	True	CPU and GPU
train_dir	string	The directory for storing the files generated during training.	catboost_info	CPU and GPU
model_size_reg	float	The model size regularization coefficient. The larger the value, the smaller the model size. Refer to the Model size regularization coefficient section for details. Possible values are in the range . This regularization is needed only for models with categorical features (other models are small). Models with categorical features might weight tens of gigabytes or more if categorical features have a lot of values. If the value of the regularizer differs from zero, then the usage of categorical features or feature combinations with a lot of values has a penalty, so less of them are used in the resulting model. Note that the resulting quality of the model can be affected. Set the value to 0 to turn off the model size optimization option.	None (Turned on and set to 0.5)	CPU and GPU
allow_writing_files	bool	Allow to write analytical and snapshot files during training. If set to “False”, the snapshot and data visualization tools are unavailable.	True	CPU and GPU
save_snapshot	bool	Enable snapshotting for restoring the training progress after an interruption. If enabled, the default period for making snapshots is 600 seconds. Use the snapshot_interval parameter to change this period. Note. This parameter is not supported in the params parameter of the cv function.	None	CPU and GPU
snapshot_file	string	The name of the file to save the training progress information in. This file is used for recovering training after an interruption. Depending on whether the specified file exists in the file system: Missing — Write information about training progress to the specified file. Exists — Load data from the specified file and continue training from where it left off. Note. This parameter is not supported in the params parameter of the cv function.	experiment...	CPU and GPU
snapshot_interval	int	The interval between saving snapshots in seconds. The first snapshot is taken after the specified number of seconds since the start of training. Every subsequent snapshot is taken after the specified number of seconds since the previous one. The last snapshot is taken at the end of the training. Note. This parameter is not supported in the params parameter of the cv function.	600	CPU and GPU
roc_file	string	The name of the output file to save the ROC curve points to. This parameter can only be set in cross-validation mode if the Logloss loss function is selected. The ROC curve points are calculated for the test fold. The output file is saved to the catboost_info directory.	None (the file is not saved)	CPU and GPU
CTR settings
simple_ctr	string	Quantization settings for simple categorical features. Use this parameter to specify the principles for defining the class of the object for regression tasks. By default, it is considered that an object belongs to the positive class if its' label value is greater than the median of all label values of the dataset. Format: `['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator. Examples		CPU and GPU
combinations_ctr	string	Quantization settings for combinations of categorical features. `['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Uniform Median `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.		CPU and GPU
per_feature_ctr	string	Per-feature quantization settings for categorical features. `['FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', 'FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]', ...]` Components: `FeatureId` — A zero-based feature identifier. `CtrType` — The method for transforming categorical features to numerical features. Supported methods for training on CPU: Borders Buckets BinarizedTargetMeanValue Counter Supported methods for training on GPU: Borders Buckets FeatureFreq FloatTargetMeanValue `TargetBorderCount` — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1. This option is available for training on CPU only. `TargetBorderType` — The quantization type for the label value. Only used for regression problems. Possible values: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum By default, MinEntropy. This option is available for training on CPU only. `CtrBorderCount` — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. `CtrBorderType` — The quantization type for categorical features. Supported values for training on CPU: Uniform Supported values for training on GPU: Median Uniform UniformAndQuantiles MaxLogSum MinEntropy GreedyLogSum `Prior` — Use the specified priors during training (several values can be specified). Possible formats: One number — Adds the value to the numerator. Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.		CPU and GPU
ctr_target_border_count	int	The maximum number of borders to use in target quantization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively. The value of the `TargetBorderCount` component overrides this parameter if it is specified for one of the following parameters: simple_ctr combinations_ctr per_feature_ctr	Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise	CPU and GPU
counter_calc_method	string	The method for calculating the Counter CTR type. Possible values: SkipTest — Objects from the validation dataset are not considered at all Full — All objects from both learn and validation datasets are considered	None (Full is used)	CPU and GPU
max_ctr_complexity	int	The maximum number of features that can be combined. Each resulting combination consists of one or more categorical features and can optionally contain binary features in the following form: “numeric feature > value”.	The default value depends on the processing unit type, combined features' type and the selected mode: GPU for categorical features in MultiClass and MultiClassOneVsAll modes: 1 In all other cases: 4	CPU and GPU
ctr_leaf_count_limit	int	The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded. The leaves to be discarded are selected as follows: The leaves are sorted by the frequency of the values. The top N leaves are selected, where N is the value specified in the parameter. All leaves starting from N+1 are discarded. This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected.	None The number of different category values is not limited	CPU
store_all_simple_ctr	bool	Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion. There is no point in using this parameter without the ctr_leaf_count_limit parameter.	None (set to False) Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features	CPU
final_ctr_computation_mode	string	Final CTR computation mode. Possible values: Default — Compute final CTRs for learn and validation datasets. Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.	Default	CPU and GPU