How to extract the decision rules from scikit-learn decision-tree

Question

Can I extract the underlying decision-rules  or  decision paths   from a trained tree in a decision tree as a textual list   Something like    if A gt 0 4 then if B lt 0 2 then if C gt 0 8 then class  X    Thanks for your help

User · Answer

From this answer  you get a readable and efficient representation  https   stackoverflow com a 65939892 3746632 Output looks like this  X is 1d vector to represent a single instance s features  from numba import jit njit  njit def predict X       ret   0     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             ret    1     else     if w pizza  gt  0 5         pass     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         ret    1     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 ret    1         else     if w mexico  gt  0 5             ret    1     else     if w pizza  gt  0 5         pass     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 ret    1         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         ret    1     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         pass     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             ret    1     else     if w pizza  gt  0 5         ret    1     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         ret    1     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         pass     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         pass     if X 0   lt   0 5    if w pizza  lt   0 5         if X 1   lt   0 5    if w mexico  lt   0 5             if X 2   lt   0 5    if w reusable  lt   0 5                 ret    1             else     if w reusable  gt  0 5                 pass         else     if w mexico  gt  0 5             pass     else     if w pizza  gt  0 5         pass     return ret 10

User · Answer

Scikit learn introduced a delicious new method called export text in version 0 21  May 2019  to extract the rules from a tree  Documentation here  It s no longer necessary to create a custom function  Once you ve fit your model  you just need two lines of code  First  import export text  from sklearn tree import export text  Second  create an object that will contain your rules  To make the rules look more readable  use the feature names argument and pass a list of your feature names  For example  if your model is called model and your features are named in a dataframe called X train  you could create an object called tree rules  tree rules   export text model  feature names list X train columns    Then just print or save tree rules  Your output will look like this   --- Age  lt   0 63      --- EstimatedSalary  lt   0 61          --- Age  lt   -0 16              --- class  0          --- Age  gt   -0 16              --- EstimatedSalary  lt   -0 06                  --- class  0              --- EstimatedSalary  gt   -0 06                  --- EstimatedSalary  lt   0 40                      --- EstimatedSalary  lt   0 03                          --- class  1

User · Answer

Codes below is my approach under anaconda python 2 7 plus a package name  pydot-ng  to making a PDF file with decision rules  I hope it is helpful   from sklearn import tree  clf   tree DecisionTreeClassifier max leaf nodes n  clf    clf fit X  data y   feature names   X columns class name   clf  classes  astype int  astype str   def output pdf clf   name       from sklearn import tree     from sklearn externals six import StringIO     import pydot ng as pydot     dot data   StringIO       tree export graphviz clf   out file dot data                           feature names feature names                           class names class name                           filled True  rounded True                           special characters True                            node ids 1       graph   pydot graph from dot data dot data getvalue        graph write pdf   s pdf  name   output pdf clf   name  filename s  n    a tree graphy show here

User · Answer

Here is a function that generates Python code from a decision tree by converting the output of export text   import string from sklearn tree import export text  def export py code tree  feature names  max depth 100  spacing 4       if spacing  lt  2          raise ValueError  spacing must be  gt  1          Clean up feature names  for correctness      nums   string digits     alnums   string ascii letters   nums     clean   lambda s     join c if c in alnums else     for c in s      features    clean x  for x in feature names      features        x if x 0  in nums else x for x in features if x      if len set features      len feature names           raise ValueError  invalid feature names          First  export tree to text     res   export text tree  feature names features                           max depth max depth                          decimals 6                          spacing spacing-1         Second  generate Python code from the text     skip  dash       spacing   -   spacing-1      code    def decision tree      n  format      join features       for line in repr tree  split   n            code    skip          line     n      for line in res split   n            line   line rstrip   replace                  if   lt   in line or   gt   in line              line  val   line rsplit maxsplit 1              line   line replace       dash   if               line         g    format line  float val           else              line   line replace      class   format dash    return           code    skip   line     n       return code   Sample usage   res   export py code tree  feature names names  spacing 4  print  res    Sample output   def decision tree f1  f2  f3         DecisionTreeClassifier class weight None  criterion  gini   max depth 3                               max features None  max leaf nodes None                               min impurity decrease 0 0  min impurity split None                               min samples leaf 1  min samples split 2                               min weight fraction leaf 0 0  presort False                               random state 42  splitter  best       if f1  lt   12 5          if f2  lt   17 5              if f1  lt   10 5                  return 2             if f1  gt  10 5                  return 3         if f2  gt  17 5              if f2  lt   22 5                  return 1             if f2  gt  22 5                  return 1     if f1  gt  12 5          if f1  lt   17 5              if f3  lt   23 5                  return 2             if f3  gt  23 5                  return 3         if f1  gt  17 5              if f1  lt   25                  return 1             if f1  gt  25                  return 2   The above example is generated with names     f  str j 1  for j in range NUM FEATURES     One handy feature is that it can generate smaller file size with reduced spacing  Just set spacing 2

User · Answer

Here is my approach to extract the decision rules in a form that can be used in directly in sql  so the data can be grouped by node   Based on the approaches of previous posters     The result will be subsequent CASE clauses that can be copied to an sql statement  ex   SELECT COALESCE  CASE WHEN  lt conditions gt  THEN  gt   lt NodeA gt     gt   CASE WHEN   lt conditions gt  THEN  lt NodeB gt     gt       NodeName    gt  FROM  lt table or view gt     import numpy as np  import pickle feature names               features     feature names i  for i in range len feature names    clf  pickle loads trained model  impurity clf tree  impurity importances   clf feature importances  SqlOut      global Conts global ContsNode global Path  Conts     ContsNode    Path    global Results Results     def print decision tree tree  feature names  offset unit                    left        tree tree  children left     right       tree tree  children right     threshold   tree tree  threshold     value   tree tree  value      if feature names is None          features       f d   i for i in tree tree  feature      else          features     feature names i  for i in tree tree  feature               def recurse left  right  threshold  features  node  depth 0 ParentNode 0 IsElse 0           global Conts         global ContsNode         global Path         global Results         global LeftParents         LeftParents            global RightParents         RightParents            for i in range len left      This is just to tell you how to create a list              LeftParents append -1              RightParents append -1              ContsNode append                 Path append               for i in range len left      i is node             if  left i   -1 and right i   -1                         if LeftParents i  gt  0                      if Path LeftParents i   gt                              Path i  Path LeftParents i     AND    ContsNode LeftParents i                                                        else                          Path i  ContsNode LeftParents i                                                      if RightParents i  gt  0                      if Path RightParents i   gt                              Path i  Path RightParents i     AND not    ContsNode RightParents i                                                          else                          Path i    not    ContsNode RightParents i                                        Results append   case when     Path i     then         4d   format i          2 2f   format impurity i       Path i  0 180                     else                         if LeftParents i  gt  0                      if Path LeftParents i   gt                              Path i  Path LeftParents i     AND    ContsNode LeftParents i                                                        else                          Path i  ContsNode LeftParents i                                                      if RightParents i  gt  0                      if Path RightParents i   gt                              Path i  Path RightParents i     AND not    ContsNode RightParents i                                                          else                          Path i    not   ContsNode RightParents i                                         if  left i   -1                       LeftParents left i   i                 if  right i   -1                       RightParents right i   i                 ContsNode i           features i       lt       str threshold i                  recurse left  right  threshold  features  0 0 0 0  print decision tree clf features  SqlOut    for i in range len Results         SqlOut SqlOut Results i     end   chr 13  chr 10

User · Answer

Just because everyone was so helpful I ll just add a modification to Zelazny7 and Daniele s beautiful solutions  This one is for python 2 7  with tabs to make it more readable   def get code tree  feature names  tabdepth 0       left        tree tree  children left     right       tree tree  children right     threshold   tree tree  threshold     features     feature names i  for i in tree tree  feature      value   tree tree  value      def recurse left  right  threshold  features  node  tabdepth 0               if  threshold node     -2                       print   t    tabdepth                      print  if       features node       lt       str threshold node                                if left node     -1                              recurse  left  right  threshold  features left node   tabdepth 1                      print   t    tabdepth                      print    else                        if right node     -1                              recurse  left  right  threshold  features right node   tabdepth 1                      print   t    tabdepth                      print                 else                      print   t    tabdepth                      print  return     str value node        recurse left  right  threshold  features  0

User · Answer

You can also make it more informative by distinguishing it to which class it belongs or even by mentioning its output value   def print decision tree tree  feature names  offset unit              left        tree tree  children left right       tree tree  children right threshold   tree tree  threshold value   tree tree  value if feature names is None      features      f d  i for i in tree tree  feature  else      features     feature names i  for i in tree tree  feature           def recurse left  right  threshold  features  node  depth 0           offset   offset unit depth         if  threshold node     -2                   print offset  if       features node       lt       str threshold node                             if left node     -1                          recurse  left  right  threshold  features left node  depth 1                  print offset    else                     if right node     -1                          recurse  left  right  threshold  features right node  depth 1                  print offset              else                   print offset value node                      To remove values from node                 temp str value node                   mid len temp   2                 tempx                    tempy                    cnt 0                 for i in temp                      if cnt lt  mid                          tempx append i                          cnt  1                     else                          tempy append i                          cnt  1                 val yes                    val no                    res                    for j in tempx                      if j      or j      or j      or j                               res append j                      else                          val no append j                  for j in tempy                      if j      or j      or j      or j                               res append j                      else                          val yes append j                  val yes   int    join map str  val yes                    val no   int    join map str  val no                     if val yes gt val no                      print offset   033 1m   YES                       print   033 0m                   elif val no gt val yes                      print offset   033 1m   NO                       print   033 0m                   else                      print offset   033 1m   Tie                       print   033 0m    recurse left  right  threshold  features  0 0

User · Answer

I created my own function to extract the rules from the decision trees created by sklearn   import pandas as pd import numpy as np from sklearn tree import DecisionTreeClassifier    dummy data  df   pd DataFrame   col1   0 1 2 3   col2   3 4 5 6   dv   0 1 0 1       create decision tree dt   DecisionTreeClassifier max depth 5  min samples leaf 1  dt fit df ix    2   df dv    This function first starts with the nodes  identified by -1 in the child arrays  and then recursively finds the parents  I call this a node s  lineage    Along the way  I grab the values I need to create if then else SAS logic   def get lineage tree  feature names        left        tree tree  children left      right       tree tree  children right      threshold   tree tree  threshold      features     feature names i  for i in tree tree  feature          get ids of child nodes      idx   np argwhere left    -1    0             def recurse left  right  child  lineage None                       if lineage is None                 lineage    child            if child in left                 parent   np where left    child  0  item                  split    l            else                 parent   np where right    child  0  item                  split    r             lineage append  parent  split  threshold parent   features parent               if parent    0                 lineage reverse                  return lineage           else                 return recurse left  right  parent  lineage        for child in idx            for node in recurse left  right  child                  print node   The sets of tuples below contain everything I need to create SAS if then else statements  I do not like using do blocks in SAS which is why I create logic describing a node s entire path  The single integer after the tuples is the ID of the terminal node in a path  All of the preceding tuples combine to create that node   In  1   get lineage dt  df columns   0   l   0 5   col1   1  0   r   0 5   col1    2   l   4 5   col2   3  0   r   0 5   col1    2   r   4 5   col2    4   l   2 5   col1   5  0   r   0 5   col1    2   r   4 5   col2    4   r   2 5   col1   6

User · Answer

I believe that this answer is more correct than the other answers here   from sklearn tree import  tree  def tree to code tree  feature names       tree    tree tree      feature name             feature names i  if i     tree TREE UNDEFINED else  undefined           for i in tree  feature           print  def tree       format      join feature names        def recurse node  depth           indent          depth         if tree  feature node      tree TREE UNDEFINED              name   feature name node              threshold   tree  threshold node              print    if     lt        format indent  name  threshold              recurse tree  children left node   depth   1              print    else     if     gt      format indent  name  threshold              recurse tree  children right node   depth   1          else              print    return     format indent  tree  value node        recurse 0  1    This prints out a valid Python function  Here s an example output for a tree that is trying to return its input  a number between 0 and 10   def tree f0     if f0  lt   6 0      if f0  lt   1 5        return    0        else     if f0  gt  1 5       if f0  lt   4 5          if f0  lt   3 5            return    3            else     if f0  gt  3 5           return    4          else     if f0  gt  4 5         return    5      else     if f0  gt  6 0     if f0  lt   8 5        if f0  lt   7 5          return    7          else     if f0  gt  7 5         return    8        else     if f0  gt  8 5       return    9      Here are some stumbling blocks that I see in other answers    Using tree  threshold    -2 to decide whether a node is a leaf isn t a good idea  What if it s a real decision node with a threshold of -2  Instead  you should look at tree feature or tree children    The line features    feature names i  for i in tree  feature  crashes with my version of sklearn  because some values of tree tree  feature are -2  specifically for leaf nodes   There is no need to have multiple if statements in the recursive function  just one is fine

User · Answer

I needed a more human-friendly format of rules from the Decision Tree  I m building open-source AutoML Python package and many times MLJAR users want to see the exact rules from the tree  That s why I implemented a function based on paulkernfeld answer  def get rules tree  feature names  class names       tree    tree tree      feature name             feature names i  if i     tree TREE UNDEFINED else  quot undefined  quot          for i in tree  feature            paths          path               def recurse node  path  paths                    if tree  feature node      tree TREE UNDEFINED              name   feature name node              threshold   tree  threshold node              p1  p2   list path   list path              p1     f quot   name   lt    np round threshold  3    quot               recurse tree  children left node   p1  paths              p2     f quot   name   gt   np round threshold  3    quot               recurse tree  children right node   p2  paths          else              path      tree  value node   tree  n node samples node                paths     path                   recurse 0  path  paths         sort by samples count     samples count    p -1  1  for p in paths      ii   list np argsort samples count       paths    paths i  for i in reversed ii            rules          for path in paths          rule    quot if  quot                   for p in path  -1               if rule     quot if  quot                   rule     quot  and  quot              rule    str p          rule     quot  then  quot          if class names is None              rule     quot response   quot  str np round path -1  0  0  0  3           else              classes   path -1  0  0              l   np argmax classes              rule    f quot class   class names l    proba   np round 100 0 classes l  np sum classes  2     quot          rule    f quot    based on  path -1  1     samples quot          rules     rule               return rules  The rules are sorted by the number of training samples assigned to each rule  For each rule  there is information about the predicted class name and probability of prediction for classification tasks  For the regression task  only information about the predicted value is printed  Example from sklearn import datasets from sklearn tree import DecisionTreeRegressor from sklearn import tree    Prepare the data data boston   datasets load boston   X   boston data y   boston target    Fit the regressor  set max depth   3 regr   DecisionTreeRegressor max depth 3  random state 1234  model   regr fit X  y     Print rules rules   get rules regr  boston feature names  None  for r in rules      print r   The printed rules  if  RM  lt   6 941  and  LSTAT  lt   14 4  and  DIS  gt  1 385  then response  22 905   based on 250 samples if  RM  lt   6 941  and  LSTAT  gt  14 4  and  CRIM  lt   6 992  then response  17 138   based on 101 samples if  RM  lt   6 941  and  LSTAT  gt  14 4  and  CRIM  gt  6 992  then response  11 978   based on 74 samples if  RM  gt  6 941  and  RM  lt   7 437  and  NOX  lt   0 659  then response  33 349   based on 43 samples if  RM  gt  6 941  and  RM  gt  7 437  and  PTRATIO  lt   19 65  then response  45 897   based on 29 samples if  RM  lt   6 941  and  LSTAT  lt   14 4  and  DIS  lt   1 385  then response  45 58   based on 5 samples if  RM  gt  6 941  and  RM  lt   7 437  and  NOX  gt  0 659  then response  14 4   based on 3 samples if  RM  gt  6 941  and  RM  gt  7 437  and  PTRATIO  gt  19 65  then response  21 9   based on 1 samples  I ve summarized the ways to extract rules from the Decision Tree in my article  Extract Rules from Decision Tree in 3 Ways with Scikit-Learn and Python

User · Answer

Here is a function  printing rules of a scikit-learn decision tree under python 3 and with offsets for conditional blocks to make the structure more readable   def print decision tree tree  feature names None  offset unit                 Plots textual representation of rules of a decision tree     tree  scikit-learn representation of tree     feature names  list of feature names  They are set to f1 f2 f3     if not specified     offset unit  a string of offset of the conditional block         left        tree tree  children left     right       tree tree  children right     threshold   tree tree  threshold     value   tree tree  value     if feature names is None          features      f d  i for i in tree tree  feature      else          features     feature names i  for i in tree tree  feature               def recurse left  right  threshold  features  node  depth 0               offset   offset unit depth             if  threshold node     -2                       print offset  if       features node       lt       str threshold node                                 if left node     -1                              recurse  left  right  threshold  features left node  depth 1                      print offset    else                         if right node     -1                              recurse  left  right  threshold  features right node  depth 1                      print offset                  else                      print offset  return     str value node         recurse left  right  threshold  features  0 0

User · Answer

I modified the code submitted by Zelazny7 to print some pseudocode   def get code tree  feature names           left        tree tree  children left         right       tree tree  children right         threshold   tree tree  threshold         features     feature names i  for i in tree tree  feature          value   tree tree  value          def recurse left  right  threshold  features  node                   if  threshold node     -2                           print  if       features node       lt       str threshold node                                    if left node     -1                                  recurse  left  right  threshold  features left node                           print    else                            if right node     -1                                  recurse  left  right  threshold  features right node                           print                     else                          print  return     str value node            recurse left  right  threshold  features  0    if you call get code dt  df columns  on the same example you will obtain   if   col1  lt   0 5     return    1   0      else   if   col2  lt   4 5     return    0   1      else   if   col1  lt   2 5     return    1   0      else   return    0   1

User · Answer

This is the code you need I have modified the top liked code to indent in a jupyter notebook python 3 correctly import numpy as np from sklearn tree import  tree  def tree to code tree  feature names       tree    tree tree      feature name    feature names i                       if i     tree TREE UNDEFINED else  quot undefined  quot                       for i in tree  feature      print  quot def tree      quot  format  quot    quot  join feature names         def recurse node  depth           indent    quot      quot    depth         if tree  feature node      tree TREE UNDEFINED              name   feature name node              threshold   tree  threshold node              print  quot   if     lt       quot  format indent  name  threshold               recurse tree  children left node   depth   1              print  quot   else     if     gt     quot  format indent  name  threshold               recurse tree  children right node   depth   1          else              print  quot   return    quot  format indent  np argmax tree  value node          recurse 0  1

User · Answer

Apparently a long time ago somebody already decided to try to add the following function to the official scikit s tree export functions  which basically only supports export graphviz   def export dict tree  feature names None  max depth None           Export a decision tree in dict format    Here is his full commit   https   github com scikit-learn scikit-learn blob 79bdc8f711d0af225ed6be9fdb708cea9f98a910 sklearn tree export py  Not exactly sure what happened to this comment  But you could also try to use that function   I think this warrants a serious documentation request to the good people of scikit-learn to properly document the sklearn tree Tree API which is the underlying tree structure that DecisionTreeClassifier exposes as its attribute tree

User · Answer

I ve been going through this  but i needed the rules to be written in this format   if A gt 0 4 then if B lt 0 2 then if C gt 0 8 then class  X     So I adapted the answer of  paulkernfeld  thanks  that you can customize to your need  def tree to code tree  feature names  Y       tree    tree tree      feature name             feature names i  if i     tree TREE UNDEFINED else  undefined           for i in tree  feature           pathto dict        global k     k   0     def recurse node  depth  parent           global k         indent          depth          if tree  feature node      tree TREE UNDEFINED              name   feature name node              threshold   tree  threshold node              s       lt        format  name  threshold  node               if node    0                  pathto node  s             else                  pathto node  pathto parent     amp     s              recurse tree  children left node   depth   1  node              s      gt      format  name  threshold              if node    0                  pathto node  s             else                  pathto node  pathto parent     amp     s             recurse tree  children right node   depth   1  node          else              k k 1             print k     pathto parent   tree  value node       recurse 0  1  0

User · Answer

There is a new DecisionTreeClassifier method  decision path  in the 0 18 0 release   The developers provide an extensive  well-documented  walkthrough   The first section of code in the walkthrough that prints the tree structure seems to be OK   However  I modified the code in the second section to interrogate one sample   My changes denoted with    lt --  Edit The changes marked by    lt -- in the code below have since been updated in walkthrough link after the errors were pointed out in pull requests  8653 and  10951  It s much easier to follow along now    sample id   0 node index   node indicator indices node indicator indptr sample id                                       node indicator indptr sample id   1    print  Rules used to predict sample  s      sample id  for node id in node index       if leave id sample id     node id      lt -- changed    to             continue    lt -- comment out         print  leaf node    reached  no decision here  format leave id sample id       lt --      else     lt  -- added else to iterate through decision nodes         if  X test sample id  feature node id    lt   threshold node id                threshold sign     lt            else              threshold sign     gt            print  decision id node  s    X  s   s      s   s  s                    node id                   sample id                   feature node id                    X test sample id  feature node id       lt -- changed i to sample id                  threshold sign                   threshold node id     Rules used to predict sample 0   decision id node 0    X 0  3     2 4   gt  0 800000011921  decision id node 2    X 0  2     5 1   gt  4 94999980927  leaf node 4 reached  no decision here   Change the sample id to see the decision paths for other samples   I haven t asked the developers about these changes  just seemed more intuitive when working through the example

User · Answer

Here is a way to translate the whole tree into a single  not necessarily too human-readable  python expression using the SKompiler library   from skompiler import skompile skompile dtree predict  to  python code

User · Answer

from StringIO import StringIO out   StringIO   out   tree export graphviz clf  out file out  print out getvalue     You can see a digraph Tree  Then  clf tree  feature and clf tree  value are array of nodes splitting feature and array of nodes values respectively  You can refer to more details from this github source

User · Answer

Now you can use export text   from sklearn tree import export text  r   export text loan tree  feature names  list X train columns    print r    A complete example from  sklearn  1   from sklearn datasets import load iris from sklearn tree import DecisionTreeClassifier from sklearn tree import export text iris   load iris   X   iris  data   y   iris  target   decision tree   DecisionTreeClassifier random state 0  max depth 2  decision tree   decision tree fit X  y  r   export text decision tree  feature names iris  feature names    print r

User · Answer

Just use the function from sklearn tree like this  from sklearn tree import export graphviz     export graphviz tree                  out file    tree dot                   feature names   tree columns    or just   petal length    petal width     And then look in your project folder for the file tree dot  copy the ALL the content and paste it here http   www webgraphviz com  and generate your graph

User · Answer

Thank for the wonderful solution of  paulkerfeld  On top of his solution  for all those who want to have a serialized version of trees  just use tree threshold  tree children left  tree children right  tree feature and tree value  Since the leaves don t have splits and hence no feature names and children  their placeholder in tree feature and tree children     are  tree TREE UNDEFINED and  tree TREE LEAF  Every split is assigned a unique index by depth first search    Notice that the tree value is of shape  n  1  1

User · Answer

Modified Zelazny7 s code to fetch SQL from the decision tree     SQL from decision tree  def get lineage tree  feature names        left        tree tree  children left      right       tree tree  children right      threshold   tree tree  threshold      features     feature names i  for i in tree tree  feature       le   lt                        g    gt          get ids of child nodes      idx   np argwhere left    -1    0             def recurse left  right  child  lineage None                       if lineage is None                 lineage    child            if child in left                 parent   np where left    child  0  item                  split    l            else                 parent   np where right    child  0  item                  split    r            lineage append  parent  split  threshold parent   features parent              if parent    0                 lineage reverse                  return lineage           else                 return recurse left  right  parent  lineage       print  case        for j child in enumerate idx           clause   when           for node in recurse left  right  child               if len str node   lt 3                  continue             i node             if i 1    l    sign le              else  sign g             clause clause i 3  sign str i 2     and           clause clause  -4    then   str j          print clause      print  else 99 end as clusters

User · Answer

This builds on  paulkernfeld  s answer  If you have a dataframe X with your features and a target dataframe y with your resonses and you you want to get an idea which y value ended in which node  and also ant to plot it accordingly  you can do the following       def tree to code tree  feature names           from sklearn tree import  tree         codelines              codelines append  def get cat X tmp   n           codelines append     catout      n           codelines append     for codelines in range 0 X tmp shape 0    n           codelines append        Xin   X tmp iloc codelines  n           tree    tree tree          feature name                 feature names i  if i     tree TREE UNDEFINED else  undefined               for i in tree  feature                    print  def tree       format      join feature names            def recurse node  depth               indent              depth             if tree  feature node      tree TREE UNDEFINED                  name   feature name node                  threshold   tree  threshold node                  codelines append     if Xin        lt       n  format indent  name  threshold                   recurse tree  children left node   depth   1                  codelines append     else     if Xin        gt     n  format indent  name  threshold                   recurse tree  children right node   depth   1              else                  codelines append     mycat      n  format indent  node            recurse 0  1          codelines append        catout append mycat  n           codelines append     return pd DataFrame catout index X tmp index columns   category    n           codelines append  node ids   get cat X  n           return codelines     mycode   tree to code clf X columns values         now execute the function and obtain the dataframe with all nodes     exec    join mycode       node ids    int x 0   for x in node ids values      node ids2   pd DataFrame node ids       print  make plot       import matplotlib cm as cm     colors   cm rainbow np linspace 0  1  1 max  list set node ids           plt figure figsize cm2inch 24  21       for i in list set node ids            plt plot y node ids2 values  i   o  color colors i   label str i         mytitle     y colored by node       plt title mytitle  fontsize 14      plt xlabel  my xlabel       plt ylabel tagname      plt xticks rotation 70             plt legend loc  upper center   bbox to anchor  0 5  1 00   shadow True  ncol 9      plt tight layout       plt show       plt close    not the most elegant version but it does the job

[python] How to extract the decision rules from scikit-learn decision-tree?

Examples related to python

Examples related to machine-learning

Examples related to scikit-learn

Examples related to decision-tree

Examples related to random-forest