Seven Easy Graphs to Visualize Correlation Matrices in R

By James Marquez, April 15, 2017

I want to share seven insightful correlation matrix visualizations that are beautiful and simple to build with only one line of code. However, each graph does have many customization options for power users to explore. We'll use the built in mtcars dataset that consists of fuel consumption and 10 variables of automobile design, such as number of cylinders, horsepower, engine displacement, etc., for 32 automobiles. We'll start by saving five variables to a new object called mydata. We'll use the mydata object in all our examples.

In [4]:
mydata <- mtcars[, c('mpg', 'cyl', 'disp', 'hp', 'carb')]

PerformanceAnalytics Package

We'll start with the best implementation, in my opinion, from the PerformanceAnalytics package. This graph provides the following information:

  1. Correlation coefficient (r) - The strength of the relationship.
  2. p-value - The significance of the relationship. Significance codes 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  3. Histogram with kernel density estimation and rug plot.
  4. Scatter plot with fitted line.
In [55]:
library("PerformanceAnalytics")

chart.Correlation(mydata, histogram=TRUE, pch=19)

Corrr Package

I found this next graph particularly interesting and enjoy the different approach its author took at visualizing correlations between variables. We'll call it using the pipe method %>% where we feed our dataset into the correlate() function which is then fed into the network_plot() function.

In [82]:
library(corrr)

mydata %>% correlate() %>% network_plot(min_cor=0.6)

# It can also be called using the traditional method
# network_plot(correlate(mydata), min_cor=0.5)

This plot uses clustering to make it easy to see which variables are closely correlated with each other. The closer each variable is to each other the higher the relationship while the opposite is true for widely spaced variables. The color of the line represents the direction of the correlation while the line shade and thickness represent the strength of the relationship. The min_cor parameter is the minimum correlation coefficient required to display a line between variables.

Psych Package

This third plot is from the psych package and is similar to the PerformanceAnalytics plot. The scale parameter is used to automatically increase and decrease the text size based on the absolute value of the correlation coefficient. This graph provides the following information:

  1. Correlation coefficient (r) - The strength of the relationship.
  2. Histogram with kernel density estimation and rug plot.
  3. Scatter plot with fitted line and ellipses to display the strength of the relationship.
In [53]:
library(psych)

pairs.panels(mydata, scale=TRUE)

Corrplot Package

This next plot is simple, but has many customization options that you can view here An Introduction to corrplot Package. The size and shade of each circle represents the strength of each relationship, while the color represents the direction, either negative or positive.

In [27]:
library(corrplot)

corrplot.mixed(cor(mydata), order="hclust", tl.col="black")

GGally Package

This next plot uses ggplot2 if you like its style. This package also has many more chart types which you can expore here GGally - Extension to 'ggplot'.

In [6]:
library(GGally)

ggpairs(mydata)

This next plot is very simple, but actually requires the most arguments.

In [34]:
ggcorr(mydata, nbreaks=8, palette='RdGy', label=TRUE, label_size=5, label_color='white')

ggcorrplot Package

This next plot is like GGally because it uses ggplot2 as well. This package also has many more options which you can explore here ggcorrplot: Visualization of a correlation matrix using ggplot2. In this example, we're going to use the entire mtcars dataset to demonstrate displaying insignificant correlation coefficients. You must first call the cor() function on your dataset and then pass in the cor_pmat() function as an argument to the p.mat parameter to display the 'X's. You can also blank them out using the insig='blank' parameter.

In [25]:
library(ggcorrplot)

ggcorrplot(cor(mtcars), p.mat = cor_pmat(mtcars), hc.order=TRUE, type='lower')

That's it. Please leave a comment if you have any questions, spot any errors, or if you know of any other packages or graphs to display correlation matrices. You can grab the notebook from my GitHub here correlation_matrices_in_r.ipynb. Thanks for reading!