Since its been a while I decided to upgrade my ml box to cuda 9.0, man that was fun, lots of googling with multiple visits to ubuntu and nvidia forums and reading up on several blog posts and stackoverflow articles and almost at the end of the long day am running cuda 9.0, Cudnn 7 and tensorflow 1.5 GPU enabled with models with Keras 2.1.x.

the short version is almost 80% of problems were from lingering packages and changes made to the machine during the last install . So the key is to make sure you roll back and remove the packages cleanly before proceeding. the final step is actually very simple, good job nvidia!.

first we need to remove all the old packages installed

sudo apt-get purge nvidia-* -y 
sudo apt-get purge cuda-* -y
sudo apt-get purge libcuda* -y
sudo apt-get purge libcudnn* -y
sudo apt-get autoremove -y
sudo apt-get autoclean -y
sudo apt-get update

Then remove any repo’s that you have added

sudo rm /etc/apt/sources.list.d/nvidia-diag-driver-local-384.66.list
sudo rm /etc/apt/sources.list.d/graphics-drivers-ubuntu-ppa-xenial.list

Then make sure there is nothing left over.

sudo dpkg --list | grep nvidia
sudo dpkg --list | grep cuda
sudo dpkg --list | grep libcudnn

If you find any packages use dpkg to remove them, ex:

sudo dpkg --purge libcudnn5
sudo dpkg --purge cuda-repo-ubuntu1604
sudo dpkg --purge cuda-cudart-8-0 cuda-cudart-dev-8-0 cuda-cufft-8-0 cuda-curand-8-0 cuda-cusolver-8-0 cuda-cusparse-8-0 cuda-npp-8-0 cuda-nvgraph-8-0 cuda-nvrtc-8-0 cuda-toolkit-9-0

revert gcc and g++ to ver 5 as the latest theano and tf have been updated.

sudo ln -s /usr/bin/gcc-5 /usr/bin/gcc -f
sudo ln -s /usr/bin/g++-5 /usr/bin/g++ -f

now reboot the machine and then once it loads make sure there is not old packages and nvidia kernel module is not loaded

lsmod | grep nvidia

now install the cuda repo package and add the cuda gpk keys before installing the cuda meta package.

sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-key adv --fetch-keys
sudo apt-get update
sudo apt-get install cuda-9-0 -y

This option seems to have been significantly improved, it automatically installed the correct nvidia drivers (390.30) via the cuda-drivers package and the blas package (cuda-cublas-9-0) without any mucking around from the user, it does take a while though.

Once its complete, go ahead and reboot the machine and once its back up you should have the nvidia module loaded

lsmod | grep nvidia
nvidia_uvm            761856  4
nvidia_drm             40960  0
nvidia_modeset       1093632  1 nvidia_drm
drm_kms_helper        155648  1 nvidia_drm
drm                   364544  3 drm_kms_helper,nvidia_drm
nvidia              14327808  494 nvidia_modeset,nvidia_uvm
ipmi_msghandler        49152  3 ipmi_ssif,nvidia,ipmi_si

also run nvidia-smi

Fri Mar  2 00:20:04 2018
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 1070    Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   37C    P0    33W / 166W |      0MiB /  8119MiB |      0%      Default |
|   1  GeForce GTX 1070    Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   39C    P5    15W / 166W |      0MiB /  8119MiB |      2%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

if any of the above does not work, remember to update the .bashrc PATH variables to the cuda 9.0 folder

export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/include${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME="/usr/local/cuda"

I will talk about installing theano/tensorflow and keras in another post.

I finally upgraded from my previous GTX 980 Ti to GTX 1070 last week, unfortunately that meant revisiting some of my previous issues with ubuntu and various incompatibilities among the graphics drivers and cuda components.  In any case I decided this time I will document some of this stuff more cleanly so I can refer to it later.

My Setup:

My previous primary desktop currently re-purposed for machine learning and docker experiments etc..


  • Intel(R) Core(TM) i7-3770K CPU
  • ASUS MAXIMUS IV GENE-Z Motherboard
  • Nvidia GTX 1070


  • Ubuntu 16.04
  • Nvidia driver 367.35
  • CUDA 8.0 RC
  • Anaconda/Theano/Keras native and as docker containers


After replacing the 980 ti card with 1070 I reloaded the machine and it just went into a crash and backtrace loop as the previous nvidia driver nvidia-352 did not support the GTX 1070.

Step one was recovering the system, load into recovery mode, load networking (optional), drop into root shell.

sudo apt-get purge nvidia-*
sudo apt-get autoremove
sudo reboot

This will remove the previous nvidia drivers and dependencies and allow you to do a fresh install of the drivers.

if you haven’t done already make sure you are running  gcc 4.9 to avoid compile errors with Theano

sudo apt-get install gcc-4.9 g++-4.9
sudo ln -s  /usr/bin/gcc-4.9 /usr/bin/gcc -f
sudo ln -s  /usr/bin/g++-4.9 /usr/bin/g++ -f

Download CUDA 8.0 RC, download the runfile(local), when installing cuda 8.0 decline on installing NVIDIA drivers. then reboot.

Install NVIDIA 365.35 drivers

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
sudo reboot

At this point you should be able to run nvidia-smi and get some results like this

Thu Aug 4 01:19:40 2016
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| 0 GeForce GTX 1070 Off | 0000:01:00.0 Off | N/A |
| 0% 40C P8 11W / 166W | 103MiB / 8112MiB | 0% Default |

| Processes: GPU Memory |
| GPU PID Type Process name Usage |
| 0 4921 C ...y/anaconda3/envs/keras104_py27/bin/python 101MiB |

At this point make sure you have the binary and library path’s setup correctly and that nvcc is working fine, adding the following to your .bashrc should do the trick.

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

nvcc -V should give you

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

and you should be able to run the example here and get “Used the gpu” as output.

My several hours of research in a 10min post 🙂

I was re-purposing my old desktop for machine learning with gpu support for Theno and Keras, I ran into several issues and ended up writing some code to workaround some of them and make others easier and more manageable. Someday I will write a more detailed series of articles on how I did that and what I learnt in the process, but today I just wanted to document the last steps I ran into when trying to convert my anaconda based adhoc ipython notebook server into a persistent service.

The initial logic came from, this blog post. However I ran into several issues, first the config is for native ipython not anaconda based ipython and it did not pull in env variables need for theano to pull in the nvc compiler optimizations needed for gpu support.

Here is the final config of the file /etc/systemd/system/ipython-nb-srv.service

Description=Jupyter Notebook Server



After this you do

systemctl daemon-reload
systemctl enable ipython-nb-srv
systemctl start ipython-nb-srv

Some of the previous high-level steps are

  1. Install cuda packages for ubuntu
  2. Install Anaconda
  3. Create a py env
  4. Install the required packages (Theano, keras, numpy, scipy, ipython-notebook etc)
  5. Create .theanorc to make sure theano uses gpu
  6. Create .ipython notebook profile to run as server
  7. Create ipython notebook server service
  8. Enjoy your ipython notebooks from chomebook or windows machine 🙂

Was doing a upgrade on my VNX5200 at work, and halfway through the RDP session got disconnected when I connected back the Unisphere client was hung, I knew from the screen that it has gone all the way to the end and was waiting for me to do a post-install commit of the code. but I couldn’t and had to kill the run-away java client. Now I didn’t know how to commit the code, tried restarting the upgrade process, it rightfully said there is nothing to upgrade and exited. Then searched for a few minutes before running into this article on emc community.

Here is the answer updated to my setup:

1) Log into Unisphere Manager
2) Right-click on array icon (by default, it is the serial number of the array)
3) Choose “Properties”
4) Select the “Software” tab
5) Highlight the package “VNX-Block-Operating-Environment”
6) Click on the “Commit” button


There was some confusion on how to specify multiple dns server ip address or domain search names with the Set-VMHostNetwork cmdlet. Turns out is a simple comma separated list that get treated as a parameter array. Here is an example.

Connect-viserver vCenterServerFQDNorIP
$ESXiHosts  = Get-VMHost
foreach ($esx in $ESXiHosts) {
     Get-VMHostNetwork | Set-VmHostNetwork -DomainName -DnsAddress dnsAddress1,dnsAddress2

Or in one-line

Get-VMHost | Get-VMHostNetwork | Set-VmHostNetwork -DomainName -DnsAddress dnsAddress1,dnsAddress2