Kamil Mysiak
Oct 31, 2020

--

Technically, tree-based models don't require the encoding of categorical features. That said, sklearn only takes numbers as its parameters so we are forced to encode. I will typically try 4-6 different encoders to see if any of them improve my performance. Also, I almost always encode categories which appear less than 1% or 2% of the time as 'rare'. This helps with encoding categorical features and helps to remove 'noise' from our data which helps with overfitting.

--

--

Kamil Mysiak
Kamil Mysiak

Written by Kamil Mysiak

Data Scientist | I/O Psychologist | Motorcycle Enthusiast | On a Search for my Personal Legend/ https://www.linkedin.com/in/kamil-mysiak-b789a614/

No responses yet