machine learning - Decision tree with high cardinality attribute -
i want learn decision tree having reasonable discrete target attribute 5 possible different values. however, there discrete high cardinality input attributes (1000s of different possible string values) wonder if makes sense include them. there policy maximum cardinality should when including attribute train decision tree?
there no maximum cardinality, no. of course, omit values not appear in data.
you have use rdf implementation handles multi-label categorical features directly rather converts them series of binary indicator features.
for categorical feature n values there 2^n - 2 possible decision rules on feature, many consider long way. heuristic have used compute entropy of target when divide data n categorical feature values. order values entropy , evaluate n-2 rules considering prefixes of list.
Comments
Post a Comment