Managing R Packages via SaltStack
At Belvedere Trading we have applications that use R. Often these applications use one or more R packages, and managing R packages has distinct challenges. CRAN (http://cran.r-project.org/) provides a central place to get (most) R packages, but R doesn't provide an easy way of ensure a specific version of a package is installed. This can become quite messy when you are working with many different packages on many different servers running different versions of the same package. By installing via source, it's possible to ensure that a specific version of an R package is installed, but managing this by hand across many servers is quite difficult (and doesn't handle the spin up on new servers or 'local' servers (we use Vagrant a lot at Belvedere) very well). SaltStack provides a good set of tools to make handling R packages much easier. Out of the box Salt can use it's built in 'cmd.run' commands to install R packages on multiple machines:
salt -L machineA,machineB,machineC cmd.run '/usr/bin/R --silent -e "install.packages(\'NewRPackage\')"'
This installs the latest version from CRAN. It would be better if we installed a specific version so that we could ensure that all of our servers are using the same version. This is possible but will take multiple commands. First we'll have to get the source for the desired package version, and then install it from source.
salt -L machineA,machineB,machineC cmd.run '/usr/bin/wget http://cran.r-project.org/src/contrib/package.tar.gz -O /tmp/package.tar.gz'
salt -L machineA,machineB,machineC cmd.run '/usr/bin/R CMD INSTALL /tmp/package.tar.gz'
Doing the above can quickly become problematic when doing this across many machines with many packages (or if new machines are built or dynamic environments are being used like Vagrant). SaltStack has 'States' that can be applied to machines. This will ensure that the machine will have the configuration specified in the state. States can be used to ensure that all of the required R packages are at the correct versions whenever the state is applied. A state can be written using 'cmd.run' directly to run similar commands that are used to install packages manually. An 'unless' or 'onlyif' parameter can be used to ensure that commands aren't run every time the state is applied (i.e. only when it needs to be applied).
install_R_package:
cmd.run:
- name: /usr/bin/R --silent -e "install.packages('NewRPackage')"
- onlyif: /usr/bin/R --silent -e "packageVersion('NewRPackage')" | grep "[\""
While Jinja can be used to add more logic into the above code, this gets pretty messy pretty quickly with anything requiring complex commands- as an example, if you were downgrading an R package, you would first have to remove the existing package and then install the new version. It is especially bad if more than a single command is required, as you end up with either complex Jinja conditionals (hard to read, also hard to validate they're correct), or chained cmd.run blocks that are conditionally called based on circumstances. This can also get pretty difficult if it needs to handle multiple sets of circumstances: different operating systems, architectures, etc... This works better if the actual installation is handled via a custom execution module and a custom state module is used in states. Python can be used to handle complex multiple step tasks including working out what actual steps need to be taken to accomplish the desired task. This allows for a much more readable states:
install_R_package:
rpkg.installed:
- name: NewRPackage
install_specific_R_package:
rpkg.installed:
- name: NewSpecificRPackage==specificVersion
The custom state takes care of deciding what actions need to be taken and uses the execution module to take the correct actions. This allows the state to properly handle if the package is already installed, if the package is the wrong version, if the package needs to be removed, etc... Jinja is still helpful with the custom states. With the assistance of some basic Pillar data we can manage large numbers of packages with or without version info very easily. A pillar can be created that looks like this:
Rpackages:
- packagea
- packageb==1.2.3
- packagec==1.2.5
- packaged
And state code can look like this:
{% for package in salt['pillar.get']('Rpackages') %}
install_{{package}}:
rpkg.installed:
- name: {{package}}
{% endfor %}
This allows for managing R packages strictly via the pillar data. This is especially useful given you can assign different pillars to different hosts based on the individual hosts needs. The actual execution and state modules that we use are located here: rpackage. See README.md for instructions on how to install and use them. We use the custom modules, pillar data similar to the above, and an internal CRAN server to manage all of our R pacakges.