What are advantages of Artificial Neural Networks over Support Vector Machines

Question

ANN  Artificial Neural Networks  and SVM  Support Vector Machines  are two popular strategies for supervised machine learning and classification  It s not often clear which method is better for a particular project  and I m certain the answer is always  it depends   Often  a combination of both along with Bayesian classification is used   These questions on Stackoverflow have already been asked regarding ANN vs SVM   ANN and SVM classification  what the difference among ANN  SVM and KNN in my classification question  Support Vector Machine or Artificial Neural Network for text processing   In this question  I d like to know specifically what aspects of an ANN  specifically  a Multilayer Perceptron  might make it desirable to use over an SVM  The reason I ask is because it s easy to answer the opposite question  Support Vector Machines are often superior to ANNs because they avoid two major weaknesses of ANNs    1  ANNs often converge on local minima rather than global minima  meaning that they are essentially  missing the big picture  sometimes  or missing the forest for the trees    2  ANNs often overfit if training goes on too long  meaning that for any given pattern  an ANN might start to consider the noise as part of the pattern   SVMs don t suffer from either of these two problems  However  it s not readily apparent that SVMs are meant to be a total replacement for ANNs  So what specific advantage s  does an ANN have over an SVM that might make it applicable for certain situations  I ve listed specific advantages of an SVM over an ANN  now I d like to see a list of ANN advantages  if any

User · Answer

One answer I m missing here  Multi-layer perceptron is able to find relation between features  For example it is necessary in computer vision when a raw image is provided to the learning algorithm and now Sophisticated features are calculated  Essentially the intermediate levels can calculate new unknown features

User · Answer

One thing to note is that the two are actually very related   Linear SVMs are equivalent to single-layer NN s  i e   perceptrons   and multi-layer NNs can be expressed in terms of SVMs   See here for some details

User · Answer

One obvious advantage of artificial neural networks over support vector machines is that artificial neural networks may have any number of outputs  while support vector machines have only one  The most direct way to create an n-ary classifier with support vector machines is to create n support vector machines and train each of them one by one  On the other hand  an n-ary classifier with neural networks can be trained in one go  Additionally  the neural network will make more sense because it is one whole  whereas the support vector machines are isolated systems  This is especially useful if the outputs are inter-related   For example  if the goal was to classify hand-written digits  ten support vector machines would do  Each support vector machine would recognize exactly one digit  and fail to recognize all others  Since each handwritten digit cannot be meant to hold more information than just its class  it makes no sense to try to solve this with an artificial neural network   However  suppose the goal was to model a person s hormone balance  for several hormones  as a function of easily measured physiological factors such as time since last meal  heart rate  etc     Since these factors are all inter-related  artificial neural network regression makes more sense than support vector machine regression

User · Answer

If you want to use a kernel SVM you have to guess the kernel  However  ANNs are universal approximators with only guessing to be done is the width  approximation accuracy  and height  approximation efficiency   If you design the optimization problem correctly you do not over-fit  please see bibliography for over-fitting   It also depends on the training examples if they scan correctly and uniformly the search space  Width and depth discovery is the subject of integer programming   Suppose you have bounded functions f    and bounded universal approximators on I  0 1  with range again I  0 1  for example that are parametrized by a real sequence of compact support U   a  with the property that there exists a sequence of sequences with  lim sup    f x  - U x a k        x    0   and you draw examples and tests  x y  with a distribution D on IxI   For a prescribed support  what you do is to find the best a such that  sum      y l  - U x l  a     2      1 lt  l lt  N   is minimal   Let this a aa which is a random variable   the over-fitting is then  average using D and D  N  of   y - U x aa     2   Let me explain why  if you select aa such that the error is minimized  then for a rare set of values you have perfect fit  However  since they are rare the average is never 0  You want to minimize the second although you have a discrete approximation to D  And keep in mind that the support length is free

User · Answer

We should also consider that the SVM system can be applied directly to non-metric spaces  such as the set of labeled graphs or strings  In fact  the internal kernel function can be generalized properly to virtually any kind of input  provided that the positive definiteness requirement of the kernel is satisfied  On the other hand  to be able to use an ANN on a set of labeled graphs  explicit embedding procedures must be considered

User · Answer

Judging from the examples you provide  I m assuming that by ANNs  you mean multilayer feed-forward networks  FF nets for short   such as multilayer perceptrons  because those are in direct competition with SVMs   One specific benefit that these models have over SVMs is that their size is fixed  they are parametric models  while SVMs are non-parametric  That is  in an ANN you have a bunch of hidden layers with sizes h1 through hn depending on the number of features  plus bias parameters  and those make up your model  By contrast  an SVM  at least a kernelized one  consists of a set of support vectors  selected from the training set  with a weight for each  In the worst case  the number of support vectors is exactly the number of training samples  though that mainly occurs with small training sets or in degenerate cases  and in general its model size scales linearly  In natural language processing  SVM classifiers with tens of thousands of support vectors  each having hundreds of thousands of features  is not unheard of   Also  online training of FF nets is very simple compared to online SVM fitting  and predicting can be quite a bit faster   EDIT  all of the above pertains to the general case of kernelized SVMs  Linear SVM are a special case in that they are parametric and allow online learning with simple algorithms such as stochastic gradient descent

[machine-learning] What are advantages of Artificial Neural Networks over Support Vector Machines?

Examples related to machine-learning

Examples related to neural-network

Examples related to classification

Examples related to svm