[r] Understanding `scale` in R

I'm trying to understand the definition of scale that R provides. I have data (mydata) that I want to make a heat map with, and there is a VERY strong positive skew. I've created a heatmap with a dendrogram for both scale(mydata) and log(my data), and the dendrograms are different for both. Why? What does it mean to scale my data, versus log transform my data? And which would be more appropriate if I want to look at the dendrogram illustrating the relationship between the columns of my data?

Thank you for any help! I've read the definitions but they are whooping over my head.

This question is related to r scale transformation heatmap

The answer is


It provides nothing else but a standardization of the data. The values it creates are known under several different names, one of them being z-scores ("Z" because the normal distribution is also known as the "Z distribution").

More can be found here:

http://en.wikipedia.org/wiki/Standard_score


I thought I would contribute by providing a concrete example of the practical use of the scale function. Say you have 3 test scores (Math, Science, and English) that you want to compare. Maybe you may even want to generate a composite score based on each of the 3 tests for each observation. Your data could look as as thus:

student_id <- seq(1,10)
math <- c(502,600,412,358,495,512,410,625,573,522)
science <- c(95,99,80,82,75,85,80,95,89,86)
english <- c(25,22,18,15,20,28,15,30,27,18)
df <- data.frame(student_id,math,science,english)

Obviously it would not make sense to compare the means of these 3 scores as the scale of the scores are vastly different. By scaling them however, you have more comparable scoring units:

z <- scale(df[,2:4],center=TRUE,scale=TRUE)

You could then use these scaled results to create a composite score. For instance, average the values and assign a grade based on the percentiles of this average. Hope this helped!

Note: I borrowed this example from the book "R In Action". It's a great book! Would definitely recommend.


This is a late addition but I was looking for information on the scale function myself and though it might help somebody else as well.

To modify the response from Ricardo Saporta a little bit.
Scaling is not done using standard deviation, at least not in version 3.6.1 of R, I base this on "Becker, R. (2018). The new S language. CRC Press." and my own experimentation.

X.man.scaled <- X/sqrt(sum(X^2)/(length(X)-1))
X.aut.scaled <- scale(X, center = F)

The result of these rows are exactly the same, I show it without centering because of simplicity.

I would respond in a comment but did not have enough reputation.


Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to scale

Plotting with ggplot2: "Error: Discrete value supplied to continuous scale" on categorical y-axis Understanding `scale` in R simple Jquery hover enlarge Adjusting the Xcode iPhone simulator scale and size How to scale Docker containers in production Scale the contents of a div by a percentage? Fit Image in ImageButton in Android Fit image into ImageView, keep aspect ratio and then resize ImageView to image dimensions? How can I shrink the drawable on a button? Auto Scale TextView Text to Fit within Bounds

Examples related to transformation

how to use the Box-Cox power transformation in R Understanding `scale` in R The easiest way to transform collection to array?

Examples related to heatmap

Make the size of a heatmap bigger with seaborn Understanding `scale` in R How to change heatmap.2 color range in R? Heatmap in matplotlib with pcolor? Making heatmap from pandas DataFrame Generate a heatmap in MatPlotLib using a scatter data set