Using Keras Tensorflow with AMD GPU

Question

I m starting to learn Keras  which I believe is a layer on top of Tensorflow and Theano   However  I only have access to AMD GPUs such as the AMD R9 280X   How can I setup my Python environment such that I can make use of my AMD GPUs through Keras Tensorflow support for OpenCL   I m running on OSX

User · Answer

If you have access to other AMD gpu s please see here  https   github com ROCmSoftwarePlatform hiptensorflow tree hip rocm docs  This should get you going in the right direction for tensorflow on the ROCm platform  but Selly s post about https   rocm github io hardware html is the deal with this route  That page is not an exhaustive list  I found out on my own that the Xeon E5 v2 Ivy Bridge works fine with ROCm even though they list v3 or newer  graphics cards however are a bit more picky  gfx8 or newer with a few small exceptions  polaris and maybe others as time goes on   UPDATE - It looks like hiptensorflow has an option for opencl support during configure  I would say investigate the link even if you don t have gfx8  or polaris gpu if the opencl implementation works  It is a long winded process but an hour or three  depending on hardware  following a well written instruction isn t too much to lose to find out

User · Answer

The original question on this post was  How to get Keras and Tensorflow to run with an AMD GPU   The answer to this question is as followed   1   Keras will work if you can make Tensorflow work correctly  optionally within your virtual conda environment    2   To get Tensorflow to work on an AMD GPU  as others have stated  one way this could work is to compile Tensorflow to use OpenCl  To do so read the link below  But for brevity I will summarize the required steps here    You will need AMDs proprietary drivers  These are currently only available on Ubuntu 14 04  the version before Ubuntu decided to change the way the UI is rendered   Support for Ubuntu 16 04 is at the writing of this post limited to a few GPUs through AMDProDrivers  Readers who want to do deep learning on AMD GPUs should be aware of this  Compiling Tensorflow with OpenCl support also requires you to obtain and install the following prerequisites  OpenCl headers  ComputeCpp  After the prerequisites are fulfilled  configure your build  Note that there are 3 options for compiling Tensorflow  Std Tensorflow  stable   Benoits Steiner s Tensorflow-opencl  developmental   and Luke Iwanski s Tensorflow-opencl  highly experimental  which you can pull from github  Also note that if you decide to build from any of the opencl versions  the question to use opencl will be missing because it is assumed that you are using it  Conversely  this means that if you configure from the standard tensorflow  you will need to select  Yes  when the configure script asks you to use opencl and  NO  for CUDA  Then run tests like so     bazel test --config sycl -k --test timeout 1600 --   tensorflow      -  tensorflow contrib     -  tensorflow java     -  tensorflow  compiler        Update  Doing this on my setup takes exceedingly long on my setup  The part that takes long are all the tests running  I am not sure what this means but a lot of my tests are timeing out at 1600 seconds  The duration can probably be shortened at the expense of more tests timeing out  Alternatively  you can just build tensor flow without tests  At the time of this writing  running the tests has taken 2 days already   Or just build the pip package like so   bazel build --local resources 2048  5 1 0 -c opt --config sycl   tensorflow tools pip package build pip package   Please actually read the blog post over at Codeplay  Lukas Iwansky posted a comprehensive tutorial post on how to get Tensorflow to work with OpenCl just on March 30th 2017  So this is a very recent post  There are also some details which I did not write about here    As indicated in the many posts above  little bits of information are spread throughout the interwebs  What Lukas  post adds in terms of value is that all the information was put together into one place which should make setting up Tensforflow and OpenCl a bit less daunting  I will only provide a link here   https   www codeplay com portal 03-30-17-setting-up-tensorflow-with-opencl-using-sycl  A slightly more complete walk-through has been posted here   http   deep-beta co uk setting-up-tensorflow-with-opencl-using-sycl   It differs mainly by explicitly telling the user that he she needs to    create symlinks to a subfolder and then actually install tensorflow via  python setup py develop  command    Note an alternative approach was mentioned above using tensorflow-cl   https   github com hughperkins tensorflow-cl  I am unable to discern which approach is better at this time though it appears that this approach is less active  Fewer issues are posted  and fewer conversations to resolve those issues are happening  There was a major push last year  Additional pushes have ebbed off since November 2016 although Hugh seems to have pushed some updates a few days ago as of the writing of this post   Update  If you read some of the documentation readme  this version of tensorflowo now only relies on community support as the main developer is busy with life    UPDATE  2017-04-25   I have some notes based on testing tensorflow-opencl below    The future user of this package should note that using opencl means that all the heavy-lifting in terms of computing is shifted to the GPU  I mention this because I was personally thinking that the compute work-load would be shared between my CPU and iGPU  This means that the power of your GPU is very important  specifically  bandwidth  and available VRAM     Following are some numbers for calculating 1 epoch using the CIFAR10 data set for MY SETUP  A10-7850 with iGPU   Your mileage will almost certainly vary    Tensorflow  via pip install     1700 s epoch  Tensorflow  w  SSE   AVX     1100 s epoch  Tensorflow  w  opencl  amp  iGPU     5800 s epoch   You can see that in this particular case performance is worse  I attribute this to the following factors    The iGPU only has 1GB  This leads to a lot of copying back and forth between CPU and GPU   Opencl 1 2 does not have the ability to data pass via pointers yet  instead data has to be copied back and forth   The iGPU only has 512 stream processors  and 32 Gb s memory bandwidth  which in this case is slower than 4 CPUs using SSE4   AVX instruction sets  The development of tensorflow-opencl is in it s beginning stages  and a lot of optimizations in SYCL etc  have not been done yet    If you are using an AMD GPU with more VRAM and more stream processors  you are certain to get much better performance numbers  I would be interested to read what numbers people are achieving to know what s possible   I will continue to maintain this answer if when updates get pushed   3   An alternative way is currently being hinted at which is using AMD s RocM initiative  and miOpen  cuDNN equivalent  library  These are will be open-source libraries that enable deep learning  The caveat is that RocM support currently only exists for Linux  and that miOpen has not been released to the wild yet  but Raja  AMD GPU head  has said in an AMA that using the above  it should be possible to do deep learning on AMD GPUs  In fact  support is planned for not only Tensorflow  but also  Cafe2  Cafe  Torch7 and MxNet

User · Answer

Theano does have support for OpenCL but it is still in its early stages  Theano itself is not interested in OpenCL and relies on community support   Most of the operations are already implemented and it is mostly a matter of tuning and optimizing the given operations   To use the OpenCL backend you have to build libgpuarray yourself   From personal experience I can tell you that you will get CPU performance if you are lucky  The memory allocation seems to be very naively implemented  therefore computation will be slow  and will crash when it runs out of memory  But I encourage you to try and maybe even optimize the code or help reporting bugs

User · Answer

Technically you can if you use something like OpenCL  but Nvidia s CUDA is much better and OpenCL requires other steps that may or may not work   I would recommend if you have an AMD gpu  use something like Google Colab where they provide a free Nvidia GPU you can use when coding

User · Answer

Tensorflow 1 3 has been supported on AMD ROCm stack    https   github com ROCmSoftwarePlatform tensorflow    A pre-built docker image has also been posted publicly    https   hub docker com r rocm tensorflow

User · Answer

One can use AMD GPU via the PlaidML Keras backend   Fastest  PlaidML is often 10x faster  or more  than popular platforms  like TensorFlow CPU  because it supports all GPUs  independent of make and model  PlaidML accelerates deep learning on AMD  Intel  NVIDIA  ARM  and embedded GPUs   Easiest  PlaidML is simple to install and supports multiple frontends  Keras and ONNX currently   Free  PlaidML is completely open source and doesn t rely on any vendor libraries with proprietary and restrictive licenses   For most platforms  getting started with accelerated deep learning is as easy as running a few commands  assuming you have Python  v2 or v3  installed    virtualenv plaidml source plaidml bin activate pip install plaidml-keras plaidbench   Choose which accelerator you d like to use  many computers  especially laptops  have multiple    plaidml-setup   Next  try benchmarking MobileNet inference performance   plaidbench keras mobilenet   Or  try training MobileNet   plaidbench --batch-size 16 keras --train mobilenet   To use it with keras set   os environ  KERAS BACKEND      plaidml keras backend    For more information   https   github com plaidml plaidml  https   github com rstudio keras issues 205 issuecomment-348336284

User · Answer

I m writing an OpenCL 1 2 backend for Tensorflow at https   github com hughperkins tensorflow-cl  This fork of tensorflow for OpenCL has the following characteristics    it targets any all OpenCL 1 2 devices   It doesnt need OpenCL 2 0  doesnt need SPIR-V  or SPIR   Doesnt need Shared Virtual Memory  And so on     it s based on an underlying library called  cuda-on-cl   https   github com hughperkins cuda-on-cl   cuda-on-cl targets to be able to take any NVIDIA   CUDA    soure-code  and compile it for OpenCL 1 2 devices   It s a very general goal  and a very general compiler  for now  the following functionalities are implemented    per-element operations  using Eigen over OpenCL   more info at  https   bitbucket org hughperkins eigen src eigen-cl unsupported test cuda-on-cl  at eigen-cl   blas   matrix-multiplication  using Cedric Nugteren s CLBlast https   github com cnugteren CLBlast reductions  argmin  argmax  again using Eigen  as per earlier info and links learning  trainers  gradients   At least  StochasticGradientDescent trainer is working  and the others are commited  but not yet tested  it is developed on Ubuntu 16 04  using Intel HD5500  and NVIDIA GPUs  and Mac Sierra  using Intel HD 530  and Radeon Pro 450    This is not the only OpenCL fork of Tensorflow available   There is also a fork being developed by Codeplay https   www codeplay com   using Computecpp  https   www codeplay com products computesuite computecpp  Their fork has stronger requirements than my own  as far as I know  in terms of which specific GPU devices it works on   You would need to check the Platform Support Notes  at the bottom of hte computecpp page   to determine whether your device is supported   The codeplay fork is actually an official Google fork  which is here  https   github com benoitsteiner tensorflow-opencl

User · Answer

This is an old question  but since I spent the last few weeks trying to figure it out on my own    OpenCL support for Theano is hit and miss  They added a libgpuarray back-end which appears to still be buggy  i e   the process runs on the GPU but the answer is wrong--like 8  accuracy on MNIST for a DL model that gets  95   accuracy on CPU or nVidia CUDA   Also because  50-80  of the performance boost on the nVidia stack comes from the CUDNN libraries now  OpenCL will just be left in the dust   SEE BELOW      ROCM appears to be very cool  but the documentation  and even a clear declaration of what ROCM is what it does  is hard to understand  They re doing their best  but they re 4  years behind  It does NOT NOT NOT work on an RX550  as of this writing   So don t waste your time  this is where 1 of the weeks went       At first  it appears ROCM is a new addition to the driver set  replacing AMDGPU-Pro  or augmenting it   but it is in fact a kernel module and set of libraries that essentially replace AMDGPU-Pro   Think of this as the equivalent of Nvidia-381 driver   CUDA some libraries kind of   https   rocm github io dl html  Honestly I still haven t tested the performance or tried to get it to work with more recent Mesa drivers yet  I will do that sometime  Add MiOpen to ROCM  and that is essentially CUDNN  They also have some pretty clear guides for migrating  But better yet  They created  HIP  which is an automagical translator from CUDA CUDNN to MiOpen  It seems to work pretty well since they lined the API s up directly to be translatable  There are concepts that aren t perfect maps  but in general it looks good    Now  finally  after 3-4 weeks of trying to figure out OpenCL  etc  I found this tutorial to help you get started quickly  It is a step-by-step for getting hipCaffe up and running  Unlike nVidia though   please ensure you have supported hardware     https   rocm github io hardware html  Think you can get it working without their supported hardware  Good luck  You ve been warned  Once you have ROCM up and running  AND RUN THE VERIFICATION TESTS   here is the hipCaffe tutorial--if you got ROCM up you ll be doing an MNIST validation test within 10 minutes--sweet   https   rocm github io ROCmHipCaffeQuickstart html

[python] Using Keras & Tensorflow with AMD GPU

Examples related to python

Examples related to python-2.7

Examples related to opencl

Examples related to tensorflow

Examples related to keras