Connecting RStudio on OS X to SQL Azure

I've been on a mission to learn R lately. And while I've been thoroughly enjoying the ride, there's nothing like a tangible objective to really turn the learning up to 11, right? Well, I just happen to have some data in SQL Azure that's begging to be mined. This post describes what I did to connect to it from RStudio.

For prosterity, I'm using version 3.1.3 of R inside of RStudio v0.98.1103 on OS X Yosemite (10.10.2). Oh, and I'm a big fan of homebrew, so I'm using it to install some of the important bits.

No DSNs

It's probably worth noting that I'm not using a DSN to connect. I'm just using a normal connection string that lives in the code. This meets my needs better since I'm not deploying to another machine. I also prefer to keep the configuration all together rather than having files scattered in obscure (or even obvious) locations on the filesystem.

Overview

At a high level, this process involves the following steps:

  • Install unixODBC using homebrew
  • Install FreeTDS using homebrew
  • Install RODBC from source using R
  • Connecting to the database

I'm assuming your SQL Azure database is already running and configured. You will need to know the password of the user account that's in the ODBC connection strings provided by Azure. I don't know how to get the password from the Azure interface, or if it's even possible.

Installing unixODBC

This updates homebrew and its formulae, installs wget (you can remove it if you already have it installed), and then installs unixODBC.

brew update && brew install wget unixodbc

Installing FreeTDS

This installs FreeTDS and wires it up to unixODBC.

brew install freetds --with-unixodbc

Now we need to tell unixODBC where it can find the FreeTDS driver. Add the following lines to the empty file at /usr/local/Cellar/unixodbc/[version]/etc/odbcinst.ini

[FreeTDS]
Description=FreeTDS Driver for Linux & MSSQL on Win32
Driver=/usr/local/Cellar/freetds/0.91_2/lib/libtdsodbc.so
Setup=/usr/local/Cellar/freetds/0.91_2/lib/libtdsodbc.so
UsageCount=1

Save and close the file.

Installing RODBC

This installs RODBC v1.3-11. Head over here to be sure you're installing the latest version (or at least the latest 1.x) and update the 2 locations in this command as necessary.

wget "http://cran.r-project.org/src/contrib/RODBC_1.3-11.tar.gz" && R CMD INSTALL RODBC_1.3-11.tar.gz

Connecting to the database

Because I'm not using a DSN, there isn't anything to configure. You just need to run a couple of R commands to get the data. First you'll need to grab the ODBC connection string from the Azure portal.

Get the connection string

  • If you're using the old portal, sign in, make sure you're on the right subscription (subscription link at the top of the page), select SQL Databases on the left, select your database name in the list, select Dashboard, and then select the link in the menu on the right side of the page labeled Show connection strings. Copy the ODBC one.

  • If you're using the new portal, sign in, make sure you're on the right subscription (top/right link from the home screen), select Browse in the left menu, select SQL databases, select the name of your database, select Properties, and then select the link labeled Show database connection strings. Select the icon to copy the ODBC connection string to the clipboard.

Getting results

Here are the commands you need to connect and retrieve data.

Let your program/script know that you're using RODBC:

library(RODBC)

Create a variable that contains the connection string (replace the driver in the connection string you copied with "FreeTDS" and add your password):

connection.string <- "Driver={FreeTDS};Server=[your_host].database.windows.net,1433;Database=[your_db];Uid=[your_username]@[your_host];Pwd=[your_password];Encrypt=yes;Connection Timeout=30;" 

Create a database connection:

db <- odbcDriverConnect(connection.string)

Get some data (replace your_table):

data <- sqlQuery(db, "SELECT * FROM [your_table]")

And don't forget to close your connection once you have all the data you need:

odbcClose(db)

Be sure to check out the RODBC vignette (pdf) and the package docs (pdf) for more information about the R commands available.