I finally upgraded from my previous GTX 980 Ti to GTX 1070 last week, unfortunately that meant revisiting some of my previous issues with ubuntu and various incompatibilities among the graphics drivers and cuda components.  In any case I decided this time I will document some of this stuff more cleanly so I can refer to it later.

My Setup:

My previous primary desktop currently re-purposed for machine learning and docker experiments etc..

Hardware:

  • Intel(R) Core(TM) i7-3770K CPU
  • ASUS MAXIMUS IV GENE-Z Motherboard
  • Nvidia GTX 1070

Software:

  • Ubuntu 16.04
  • Nvidia driver 367.35
  • CUDA 8.0 RC
  • Anaconda/Theano/Keras native and as docker containers

Steps:

After replacing the 980 ti card with 1070 I reloaded the machine and it just went into a crash and backtrace loop as the previous nvidia driver nvidia-352 did not support the GTX 1070.

Step one was recovering the system, load into recovery mode, load networking (optional), drop into root shell.

sudo apt-get purge nvidia-*
sudo apt-get autoremove
sudo reboot

This will remove the previous nvidia drivers and dependencies and allow you to do a fresh install of the drivers.

if you haven’t done already make sure you are running  gcc 4.9 to avoid compile errors with Theano

sudo apt-get install gcc-4.9 g++-4.9
sudo ln -s  /usr/bin/gcc-4.9 /usr/bin/gcc -f
sudo ln -s  /usr/bin/g++-4.9 /usr/bin/g++ -f

Download CUDA 8.0 RC, download the runfile(local), when installing cuda 8.0 decline on installing NVIDIA drivers. then reboot.

Install NVIDIA 365.35 drivers

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
sudo reboot

At this point you should be able to run nvidia-smi and get some results like this

Thu Aug 4 01:19:40 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:01:00.0 Off | N/A |
| 0% 40C P8 11W / 166W | 103MiB / 8112MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4921 C ...y/anaconda3/envs/keras104_py27/bin/python 101MiB |
+-----------------------------------------------------------------------------+

At this point make sure you have the binary and library path’s setup correctly and that nvcc is working fine, adding the following to your .bashrc should do the trick.

export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

nvcc -V should give you

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

and you should be able to run the example here and get “Used the gpu” as output.

My several hours of research in a 10min post 🙂

I was re-purposing my old desktop for machine learning with gpu support for Theno and Keras, I ran into several issues and ended up writing some code to workaround some of them and make others easier and more manageable. Someday I will write a more detailed series of articles on how I did that and what I learnt in the process, but today I just wanted to document the last steps I ran into when trying to convert my anaconda based adhoc ipython notebook server into a persistent service.

The initial logic came from, this blog post. However I ran into several issues, first the config is for native ipython not anaconda based ipython and it did not pull in env variables need for theano to pull in the nvc compiler optimizations needed for gpu support.

Here is the final config of the file /etc/systemd/system/ipython-nb-srv.service

[Unit]
Description=Jupyter Notebook Server

[Service]
Type=simple
Environment="PATH=/home/ipynbusr/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
ExecStart=/home/ipynbusr/anaconda3/bin/jupyter-notebook
User=ipynbusr
Group=ipynbusr
WorkingDirectory=/home/ipynbusr

[Install]
WantedBy=multi-user.target

After this you do

systemctl daemon-reload
systemctl enable ipython-nb-srv
systemctl start ipython-nb-srv

Some of the previous high-level steps are

  1. Install cuda packages for ubuntu
  2. Install Anaconda
  3. Create a py env
  4. Install the required packages (Theano, keras, numpy, scipy, ipython-notebook etc)
  5. Create .theanorc to make sure theano uses gpu
  6. Create .ipython notebook profile to run as server
  7. Create ipython notebook server service
  8. Enjoy your ipython notebooks from chomebook or windows machine 🙂

Was doing a upgrade on my VNX5200 at work, and halfway through the RDP session got disconnected when I connected back the Unisphere client was hung, I knew from the screen that it has gone all the way to the end and was waiting for me to do a post-install commit of the code. but I couldn’t and had to kill the run-away java client. Now I didn’t know how to commit the code, tried restarting the upgrade process, it rightfully said there is nothing to upgrade and exited. Then searched for a few minutes before running into this article https://community.emc.com/thread/123829?start=0&tstart=0 on emc community.

Here is the answer updated to my setup:

1) Log into Unisphere Manager
2) Right-click on array icon (by default, it is the serial number of the array)
3) Choose “Properties”
4) Select the “Software” tab
5) Highlight the package “VNX-Block-Operating-Environment”
6) Click on the “Commit” button

vnx5200-upgrade-commit

There was some confusion on how to specify multiple dns server ip address or domain search names with the Set-VMHostNetwork cmdlet. Turns out is a simple comma separated list that get treated as a parameter array. Here is an example.

Connect-viserver vCenterServerFQDNorIP
$ESXiHosts  = Get-VMHost
foreach ($esx in $ESXiHosts) {
     Get-VMHostNetwork | Set-VmHostNetwork -DomainName  eng.example.com -DnsAddress dnsAddress1,dnsAddress2
}

Or in one-line

Get-VMHost | Get-VMHostNetwork | Set-VmHostNetwork -DomainName  eng.example.com -DnsAddress dnsAddress1,dnsAddress2

Here’s a quick how to add iSCSI send targets on all hosts in your VC

Connect-viserver vCenterServerFQDNorIP
$targets = "StorageTargetIP1", "StorageTargetIP2"
$ESXiHosts  = Get-VMHost
foreach ($esx in $ESXiHosts) {
  $hba = $esx | Get-VMHostHba -Type iScsi | Where {$_.Model -eq "iSCSI Software Adapter"}
  foreach ($target in $targets) {
     # Check to see if the SendTarget exist, if not add it
     if (Get-IScsiHbaTarget -IScsiHba $hba -Type Send | Where {$_.Address -cmatch $target}) {
        Write-Host "The target $target does exist on $esx" -ForegroundColor Green
     }
     else {
        Write-Host "The target $target doesn't exist on $esx" -ForegroundColor Red
        Write-Host "Creating $target on $esx ..." -ForegroundColor Yellow
        New-IScsiHbaTarget -IScsiHba $hba -Address $target       
     }
  }
}

Now present the LUNs from storage and rescan all HBA to see the new storage on the hosts.

Get-VMHost | Get-VMHostStorage -RescanAllHba -RescanVmfs