A simple explanation of Naive Bayes Classification

Question

I am finding it hard to understand the process of Naive Bayes  and I was wondering if someone could explain it with a simple step by step process in English  I understand it takes comparisons by times occurred as a probability  but I have no idea how the training data is related to the actual dataset   Please give me an explanation of what role the training set plays  I am giving a very simple example for fruits here  like banana for example  training set--- round-red round-orange oblong-yellow round-red  dataset---- round-red round-orange round-red round-orange oblong-yellow round-red round-orange oblong-yellow oblong-yellow round-red

User · Answer

Naive Bayes  Naive Bayes comes under supervising machine learning which used to make classifications of data sets  It is used to predict things based on its prior knowledge and independence assumptions   They call it naive because it   s assumptions  it assumes that all of the features in the dataset are equally important and independent  are really optimistic and rarely true in most real-world applications   It is classification algorithm which makes the decision for the unknown data set  It is based on Bayes Theorem which describe the probability of an event based on its prior knowledge   Below diagram shows how naive Bayes works    Formula to predict NB     How to use Naive Bayes Algorithm    Let s take an example of how N B woks  Step 1  First we find out Likelihood of table which shows the probability of yes or no in below diagram  Step 2  Find the posterior probability of each class     Problem  Find out the possibility of whether the player plays in Rainy condition   P Yes Rainy    P Rainy Yes    P Yes    P Rainy   P Rainy Yes    2 9   0 222 P Yes    9 14   0 64 P Rainy    5 14   0 36  Now  P Yes Rainy    0 222 0 64 0 36   0 39 which is lower probability which means chances of the match played is low    For more reference refer these blog   Refer GitHub Repository Naive-Bayes-Examples

User · Answer

I realize that this is an old question  with an established answer  The reason I m posting is that is the accepted answer has many elements of k-NN  k-nearest neighbors   a different algorithm  Both k-NN and NaiveBayes are classification algorithms  Conceptually  k-NN uses the idea of  quot nearness quot  to classify new entities  In k-NN  nearness  is modeled with ideas such as Euclidean Distance or Cosine Distance  By contrast  in NaiveBayes  the concept of  probability  is used to classify new entities  Since the question is about Naive Bayes  here s how I d describe the ideas and steps to someone  I ll try to do it with as few equations and in plain English as much as possible  First  Conditional Probability  amp  Bayes  Rule Before someone can understand and appreciate the nuances of Naive Bayes   they need to know a couple of related concepts first  namely  the idea of Conditional Probability  and Bayes  Rule   If you are familiar with these concepts  skip to the section titled Getting to Naive Bayes   Conditional Probability in plain English  What is the probability that something will happen  given that something else has already happened  Let s say that there is some Outcome O  And some Evidence E  From the way these probabilities are defined  The Probability of having both the Outcome O and Evidence E is   Probability of O occurring  multiplied by the  Prob of E given that O happened  One Example to understand Conditional Probability  Let say we have a collection of US Senators  Senators could be Democrats or Republicans  They are also either male or female  If we select one senator completely randomly  what is the probability that this person is a female Democrat  Conditional Probability can help us answer that  Probability of  Democrat and Female Senator   Prob Senator is Democrat  multiplied by Conditional Probability of Being Female given that they are a Democrat    P Democrat  amp  Female    P Democrat    P Female   Democrat    We could compute the exact same thing  the reverse way    P Democrat  amp  Female    P Female    P Democrat   Female    Understanding Bayes Rule Conceptually  this is a way to go from P Evidence  Known Outcome  to P Outcome Known Evidence   Often  we know how frequently some particular evidence is observed  given a known outcome  We have to use this known fact to compute the reverse  to compute the chance of that outcome happening  given the evidence  P Outcome given that we know some Evidence    P Evidence given that we know the Outcome  times Prob Outcome   scaled by the P Evidence  The classic example to understand Bayes  Rule  Probability of Disease D given Test-positive                    P Test is positive Disease    P Disease                                                                             scaled by  P Testing Positive  with or without the disease   Now  all this was just preamble  to get to Naive Bayes  Getting to Naive Bayes  So far  we have talked only about one piece of evidence  In reality  we have to predict an outcome given multiple evidence  In that case  the math gets very complicated  To get around that complication  one approach is to  uncouple  multiple pieces of evidence  and to treat each of piece of evidence as independent  This approach is why this is called naive Bayes  P Outcome Multiple Evidence     P Evidence1 Outcome    P Evidence2 outcome          P EvidenceN outcome    P Outcome  scaled by P Multiple Evidence   Many people choose to remember this as                        P Likelihood of Evidence    Prior prob of outcome P outcome evidence                                                                                               P Evidence   Notice a few things about this equation   If the Prob evidence outcome  is 1  then we are just multiplying by 1  If the Prob some particular evidence outcome  is 0  then the whole prob  becomes 0  If you see contradicting evidence  we can rule out that outcome  Since we divide everything by P Evidence   we can even get away without calculating it  The intuition behind multiplying by the prior is so that we give high probability to more common outcomes  and low probabilities to unlikely outcomes  These are also called base rates and they are a way to scale our predicted probabilities   How to Apply NaiveBayes to Predict an Outcome  Just run the formula above for each possible outcome  Since we are trying to classify  each outcome is called a class and it has a class label  Our job is to look at the evidence  to consider how likely it is to be this class or that class  and assign a label to each entity  Again  we take a very simple approach  The class that has the highest probability is declared the  quot winner quot  and that class label gets assigned to that combination of evidences  Fruit Example Let s try it out on an example to increase our understanding  The OP asked for a  fruit  identification example  Let s say that we have data on 1000 pieces of fruit  They happen to be Banana  Orange or some Other Fruit  We know 3 characteristics about each fruit   Whether it is Long Whether it is Sweet and If its color is Yellow   This is our  training set   We will use this to predict the type of any new fruit we encounter  Type           Long   Not Long    Sweet   Not Sweet    Yellow  Not Yellow Total                                                                                  Banana         400       100      350        150        450      50         500 Orange           0       300      150        150        300       0         300 Other Fruit    100       100      150         50         50     150         200                                                                                  Total          500       500      650        350        800     200        1000                                                                                   We can pre-compute a lot of things about our fruit collection  The so-called  quot Prior quot  probabilities   If we didn t know any of the fruit attributes  this would be our guess   These are our base rates   P Banana         0 5  500 1000   P Orange         0 3  P Other Fruit    0 2  Probability of  quot Evidence quot  p Long      0 5 P Sweet     0 65 P Yellow    0 8  Probability of  quot Likelihood quot  P Long Banana    0 8 P Long Orange    0   Oranges are never long in all the fruit we have seen          P Yellow Other Fruit         50 200   0 25 P Not Yellow Other Fruit    0 75  Given a Fruit  how to classify it  Let s say that we are given the properties of an unknown fruit  and asked to classify it  We are told that the fruit is Long  Sweet and Yellow  Is it a Banana  Is it an Orange  Or Is it some Other Fruit  We can simply run the numbers for each of the 3 outcomes  one by one  Then we choose the highest probability and  classify  our unknown fruit as belonging to the class that had the highest probability based on our prior evidence  our 1000 fruit training set   P Banana Long  Sweet and Yellow         P Long Banana    P Sweet Banana    P Yellow Banana    P banana                                                                                              P Long    P Sweet    P Yellow                               0 8   0 7   0 9   0 5   P evidence         0 252   P evidence    P Orange Long  Sweet and Yellow    0   P Other Fruit Long  Sweet and Yellow        P Long Other fruit    P Sweet Other fruit    P Yellow Other fruit    P Other Fruit                                                                                                                                       P evidence          100 200   150 200   50 200   200 1000    P evidence         0 01875   P evidence   By an overwhelming margin  0 252  gt  gt  0 01875   we classify this Sweet Long Yellow fruit as likely to be a Banana  Why is Bayes Classifier so popular  Look at what it eventually comes down to  Just some counting and multiplication  We can pre-compute all these terms  and so classifying becomes easy  quick and efficient  Let z   1   P evidence   Now we quickly compute the following three quantities  P Banana evidence    z   Prob Banana    Prob Evidence1 Banana    Prob Evidence2 Banana      P Orange Evidence    z   Prob Orange    Prob Evidence1 Orange    Prob Evidence2 Orange      P Other Evidence     z   Prob Other     Prob Evidence1 Other     Prob Evidence2 Other        Assign the class label of whichever is the highest number  and you are done  Despite the name  Naive Bayes turns out to be excellent in certain applications  Text classification is one area where it really shines  Hope that helps in understanding the concepts behind the Naive Bayes algorithm

User · Answer

I try to explain the Bayes rule with an example  What is the chance that a random person selected from the society is a smoker  You may reply 10   and let s assume that s right  Now  what if I say that the random person is a man and is 15 years old  You may say 15 or 20   but why   In fact  we try to update our initial guess with new pieces of evidence   P smoker  vs  P smoker   evidence     The Bayes rule is a way to relate these two probabilities  P smoker   evidence    P smoker   p evidence   smoker  P evidence   Each evidence may increase or decrease this chance  For example  this fact that he is a man may increase the chance provided that this percentage  being a man  among non-smokers is lower  In the other words  being a man must be an indicator of being a smoker rather than a non-smoker  Therefore  if an evidence is an indicator of something  it increases the chance  But how do we know that this is an indicator  For each feature  you can compare the commonness  probability  of that feature under the given conditions with its commonness alone   P f   x  vs  P f    P smoker   evidence    P smoker    P evidence   smoker  P evidence   For example  if we know that 90  of smokers are men  it s not still enough to say whether being a man is an indicator of being smoker or not  For example if the probability of being a man in the society is also 90   then knowing that someone is a man doesn t help us   90    90     1  But if men contribute to 40  of the society  but 90  of the smokers  then knowing that someone is a man increases the chance of being a smoker  90    40     2 25  so it increases the initial guess  10   by 2 25 resulting 22 5   However  if the probability of being a man was 95  in the society  then regardless of the fact that the percentage of men among smokers is high  90    the evidence that someone is a man decreases the chance of him being a smoker   90    95     0 95   So we have  P smoker   f1  f2  f3         P smoker    contribution of f1  contribution of f2         P smoker     P being a man   smoker  P being a man     P under 20   smoker   P under 20    Note that in this formula we assumed that being a man and being under 20 are independent features so we multiplied them  it means that knowing that someone is under 20 has no effect on guessing that he is man or woman  But it may not be true  for example maybe most adolescence  in a society are men    To use this formula in a classifier The classifier is given with some features  being a man and being under 20  and it must decide if he is an smoker or not  these are two classes   It uses the above formula to calculate the probability of each class under the evidence  features   and it assigns the class with the highest probability to the input  To provide the required probabilities  90   10   80      it uses the training set  For example  it counts the people in the training set that are smokers and find they contribute 10  of the sample  Then for smokers checks how many of them are men or women      how many are above 20 or under 20    In the other words  it tries to build the probability distribution of the features for each class based on the training data

User · Answer

Ram Narasimhan explained the concept very nicely here below is an alternative explanation through the code example of Naive Bayes in action It uses an example problem from this book on page 351 This is the data set that we will be using  In the above dataset if we give the hypothesis     quot Age quot    lt  30    quot Income quot   quot medium quot    quot Student quot   yes     quot Creadit Rating quot   fair   then what is the probability that he will buy or will not buy a computer  The code below exactly answers that question  Just create a file called named new dataset csv and paste the following content  Age Income Student Creadit Rating Buys Computer  lt  30 high no fair no  lt  30 high no excellent no 31-40 high no fair yes  gt 40 medium no fair yes  gt 40 low yes fair yes  gt 40 low yes excellent no 31-40 low yes excellent yes  lt  30 medium no fair no  lt  30 low yes fair yes  gt 40 medium yes fair yes  lt  30 medium yes excellent yes 31-40 medium no excellent yes 31-40 high yes fair yes  gt 40 medium no excellent no  Here is the code the comments explains everything we are doing here   python  import pandas as pd  import pprint   class Classifier        data   None     class attr   None     priori          cp          hypothesis   None       def   init   self filename None  class attr None            self data   pd read csv filename  sep      header   0           self class attr   class attr                  probability class       How many  times it appears in cloumn                                                                                                           count of all class attribute             def calculate priori self           class values   list set self data self class attr            class data    list self data self class attr           for i in class values              self priori i     class data count i  float len class data           print  quot Priori Values   quot   self priori                  Here we calculate the individual probabilites          P outcome evidence      P Likelihood of Evidence  x Prior prob of outcome                                                                                                                                P Evidence              def get cp self  attr  attr type  class value           data attr   list self data attr           class data   list self data self class attr           total  1         for i in range 0  len data attr                if class data i     class value and data attr i     attr type                  total  1         return total float class data count class value                    Here we calculate Likelihood of Evidence and multiple all individual probabilities with priori          Outcome Multiple Evidence    P Evidence1 Outcome  x P Evidence2 outcome  x     x P EvidenceN outcome  x P Outcome          scaled by P Multiple Evidence              def calculate conditional probabilities self  hypothesis           for i in self priori              self cp i                   for j in hypothesis                  self cp i  update   hypothesis j   self get cp j  hypothesis j   i            print  quot  nCalculated Conditional Probabilities   n quot          pprint pprint self cp       def classify self           print  quot Result   quot          for i in self cp              print i   quot     gt   quot   reduce lambda x  y  x y  self cp i  values    self priori i   if   name       quot   main   quot       c   Classifier filename  quot new dataset csv quot   class attr  quot Buys Computer quot        c calculate priori       c hypothesis     quot Age quot    lt  30    quot Income quot   quot medium quot    quot Student quot   yes     quot Creadit Rating quot   fair        c calculate conditional probabilities c hypothesis      c classify    output  Priori Values     yes   0 6428571428571429   no   0 35714285714285715   Calculated Conditional Probabilities        no               lt  30   0 8           fair   0 6            medium   0 6            yes   0 4             yes               lt  30   0 3333333333333333           fair   0 7777777777777778           medium   0 5555555555555556           yes   0 7777777777777778            Result   yes     gt   0 0720164609053 no     gt   0 0411428571429  Hope it helps in better understanding the problem peace

User · Answer

Your question as I understand it is divided in two parts  part one being you need a better understanding of the Naive Bayes classifier  amp  part two being the confusion surrounding Training set    In general all of Machine Learning Algorithms need to be trained for supervised learning tasks like classification  prediction etc  or for unsupervised learning tasks like clustering   During the training step  the algorithms are taught with a particular input dataset  training set  so that later on we may test them for unknown inputs  which they have never seen before  for which they may classify or predict etc  in case of supervised learning  based on their learning  This is what most of the Machine Learning techniques like Neural Networks  SVM  Bayesian etc  are based upon   So in a general Machine Learning project basically you have to divide your input set to a Development Set  Training Set   Dev-Test Set   amp  a Test Set  or Evaluation set   Remember your basic objective would be that your system learns and classifies new inputs which they have never seen before in either Dev set or test set   The test set typically has the same format as the training set  However  it is very important that the test set be distinct from the training corpus  if we simply reused the training set as the test set  then a model that simply memorized its input  without learning how to generalize to new examples  would receive misleadingly high scores   In general  for an example  70  of our data can be used as training set cases  Also remember to partition the original set into the training and test sets randomly   Now I come to your other question about Naive Bayes   To demonstrate the concept of Na  ve Bayes Classification  consider the example given below     As indicated  the objects can be classified as either GREEN or RED  Our task is to classify new cases as they arrive  i e   decide to which class label they belong  based on the currently existing objects   Since there are twice as many GREEN objects as RED  it is reasonable to believe that a new case  which hasn t been observed yet  is twice as likely to have membership GREEN rather than RED  In the Bayesian analysis  this belief is known as the prior probability  Prior probabilities are based on previous experience  in this case the percentage of GREEN and RED objects  and often used to predict outcomes before they actually happen   Thus  we can write   Prior Probability of GREEN  number of GREEN objects   total number of objects  Prior Probability of RED  number of RED objects   total number of objects  Since there is a total of 60 objects  40 of which are GREEN and 20 RED  our prior probabilities for class membership are   Prior Probability for GREEN  40   60  Prior Probability for RED  20   60  Having formulated our prior probability  we are now ready to classify a new object  WHITE circle in the diagram below   Since the objects are well clustered  it is reasonable to assume that the more GREEN  or RED  objects in the vicinity of X  the more likely that the new cases belong to that particular color  To measure this likelihood  we draw a circle around X which encompasses a number  to be chosen a priori  of points irrespective of their class labels  Then we calculate the number of points in the circle belonging to each class label  From this we calculate the likelihood       From the illustration above  it is clear that Likelihood of X given GREEN is smaller than Likelihood of X given RED  since the circle encompasses 1 GREEN object and 3 RED ones  Thus       Although the prior probabilities indicate that X may belong to GREEN  given that there are twice as many GREEN compared to RED  the likelihood indicates otherwise  that the class membership of X is RED  given that there are more RED objects in the vicinity of X than GREEN   In the Bayesian analysis  the final classification is produced by combining both sources of information  i e   the prior and the likelihood  to form a posterior probability using the so-called Bayes  rule  named after Rev  Thomas Bayes 1702-1761      Finally  we classify X as RED since its class membership achieves the largest posterior probability

[algorithm] A simple explanation of Naive Bayes Classification

Examples related to algorithm

Examples related to machine-learning

Examples related to dataset

Examples related to classification

Examples related to naivebayes