Since its been a while I decided to upgrade my ml box to cuda 9.0, man that was fun, lots of googling with multiple visits to ubuntu and nvidia forums and reading up on several blog posts and stackoverflow articles and almost at the end of the long day am running cuda 9.0, Cudnn 7 and tensorflow 1.5 GPU enabled with models with Keras 2.1.x.

the short version is almost 80% of problems were from lingering packages and changes made to the machine during the last install . So the key is to make sure you roll back and remove the packages cleanly before proceeding. the final step is actually very simple, good job nvidia!.

first we need to remove all the old packages installed

sudo apt-get purge nvidia-* -y 
sudo apt-get purge cuda-* -y
sudo apt-get purge libcuda* -y
sudo apt-get purge libcudnn* -y
sudo apt-get autoremove -y
sudo apt-get autoclean -y
sudo apt-get update

Then remove any repo’s that you have added

sudo rm /etc/apt/sources.list.d/nvidia-diag-driver-local-384.66.list
sudo rm /etc/apt/sources.list.d/graphics-drivers-ubuntu-ppa-xenial.list

Then make sure there is nothing left over.

sudo dpkg --list | grep nvidia
sudo dpkg --list | grep cuda
sudo dpkg --list | grep libcudnn

If you find any packages use dpkg to remove them, ex:

sudo dpkg --purge libcudnn5
sudo dpkg --purge cuda-repo-ubuntu1604
sudo dpkg --purge cuda-cudart-8-0 cuda-cudart-dev-8-0 cuda-cufft-8-0 cuda-curand-8-0 cuda-cusolver-8-0 cuda-cusparse-8-0 cuda-npp-8-0 cuda-nvgraph-8-0 cuda-nvrtc-8-0 cuda-toolkit-9-0

revert gcc and g++ to ver 5 as the latest theano and tf have been updated.

sudo ln -s /usr/bin/gcc-5 /usr/bin/gcc -f
sudo ln -s /usr/bin/g++-5 /usr/bin/g++ -f

now reboot the machine and then once it loads make sure there is not old packages and nvidia kernel module is not loaded

lsmod | grep nvidia

now install the cuda repo package and add the cuda gpk keys before installing the cuda meta package.

sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-key adv --fetch-keys
sudo apt-get update
sudo apt-get install cuda-9-0 -y

This option seems to have been significantly improved, it automatically installed the correct nvidia drivers (390.30) via the cuda-drivers package and the blas package (cuda-cublas-9-0) without any mucking around from the user, it does take a while though.

Once its complete, go ahead and reboot the machine and once its back up you should have the nvidia module loaded

lsmod | grep nvidia
nvidia_uvm            761856  4
nvidia_drm             40960  0
nvidia_modeset       1093632  1 nvidia_drm
drm_kms_helper        155648  1 nvidia_drm
drm                   364544  3 drm_kms_helper,nvidia_drm
nvidia              14327808  494 nvidia_modeset,nvidia_uvm
ipmi_msghandler        49152  3 ipmi_ssif,nvidia,ipmi_si

also run nvidia-smi

Fri Mar  2 00:20:04 2018
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1070    Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   37C    P0    33W / 166W |      0MiB /  8119MiB |      0%      Default |
|   1  GeForce GTX 1070    Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   39C    P5    15W / 166W |      0MiB /  8119MiB |      2%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

if any of the above does not work, remember to update the .bashrc PATH variables to the cuda 9.0 folder

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/include${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME="/usr/local/cuda"

If you have come this far, installing theano or tensorflow is pretty trivial these days thanks to anaconda python distribution, in my case i use the miniconda installer and then install the required packages and dependencies.

wget --quiet -O ~/
/bin/bash ~/ -b -p /opt/conda
export PATH="/opt/conda/bin:$PATH"
conda install --quiet --yes keras tensorflow theano ipython pandas scipy scikit-learn mkl-service

MKL_THREADING_LAYER is only need for theano.

One response to “GTX 1070 on Ubuntu 16.04 with Cuda 9.0, Keras, Theano and Tensorflow”

  1. eslam fouda Avatar
    eslam fouda

    thanks for your great post
    I would like you to help me with this message:
    eslam@eslam-GA-MA74GMT-S2:~$ /bin/bash ~/ -b -p /opt/conda
    mkdir: cannot create directory ‘/opt/conda’: Permission denied
    ERROR: Could not create directory: ‘/opt/conda’
    thanks in advance

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.