One of the things that used to perplex me as a newby to R was how to format a number as a percentage for printing.
For example, display 0.12345
as 12.345%
. I have a number of workarounds for this, but none of these seem to be "newby friendly". For example:
set.seed(1)
m <- runif(5)
paste(round(100*m, 2), "%", sep="")
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"
sprintf("%1.2f%%", 100*m)
[1] "26.55%" "37.21%" "57.29%" "90.82%" "20.17%"
Question: Is there a base R function to do this? Alternatively, is there a widely used package that provides a convenient wrapper?
Despite searching for something like this in ?format
, ?formatC
and ?prettyNum
, I have yet to find a suitably convenient wrapper in base R. ??"percent"
didn't yield anything useful. library(sos); findFn("format percent")
returns 1250 hits - so again not useful. ggplot2
has a function percent
but this gives no control over rounding accuracy.
This question is related to
r
formatting
I did some benchmarking for speed on these answers and was surprised to see percent
in the scales
package so touted, given its sluggishness. I imagine the advantage is its automatic detector for for proper formatting, but if you know what your data looks like it seems clear to be avoided.
Here are the results from trying to format a list of 100,000 percentages in (0,1) to a percentage in 2 digits:
library(microbenchmark)
x = runif(1e5)
microbenchmark(times = 100L, andrie1(), andrie2(), richie(), krlmlr())
# Unit: milliseconds
# expr min lq mean median uq max
# 1 andrie1() 91.08811 95.51952 99.54368 97.39548 102.75665 126.54918 #paste(round())
# 2 andrie2() 43.75678 45.56284 49.20919 47.42042 51.23483 69.10444 #sprintf()
# 3 richie() 79.35606 82.30379 87.29905 84.47743 90.38425 112.22889 #paste(formatC())
# 4 krlmlr() 243.19699 267.74435 304.16202 280.28878 311.41978 534.55904 #scales::percent()
So sprintf
emerges as a clear winner when we want to add a percent sign. On the other hand, if we only want to multiply the number and round (go from proportion to percent without "%", then round()
is fastest:
# Unit: milliseconds
# expr min lq mean median uq max
# 1 andrie1() 4.43576 4.514349 4.583014 4.547911 4.640199 4.939159 # round()
# 2 andrie2() 42.26545 42.462963 43.229595 42.960719 43.642912 47.344517 # sprintf()
# 3 richie() 64.99420 65.872592 67.480730 66.731730 67.950658 96.722691 # formatC()
I much prefer to use sprintf
which is available in base R.
sprintf("%0.1f%%", .7293827 * 100)
[1] "72.9%"
I especially like sprintf
because you can also insert strings.
sprintf("People who prefer %s over %s: %0.4f%%",
"Coke Classic",
"New Coke",
.999999 * 100)
[1] "People who prefer Coke Classic over New Coke: 99.9999%"
It's especially useful to use sprintf
with things like database configurations; you just read in a yaml file, then use sprintf to populate a template without a bunch of nasty paste0
's.
This pattern is especially useful for rmarkdown reports, when you have a lot of text and a lot of values to aggregate.
Setup / aggregation:
library(data.table) ## for aggregate
approval <- data.table(year = trunc(time(presidents)),
pct = as.numeric(presidents) / 100,
president = c(rep("Truman", 32),
rep("Eisenhower", 32),
rep("Kennedy", 12),
rep("Johnson", 20),
rep("Nixon", 24)))
approval_agg <- approval[i = TRUE,
j = .(ave_approval = mean(pct, na.rm=T)),
by = president]
approval_agg
# president ave_approval
# 1: Truman 0.4700000
# 2: Eisenhower 0.6484375
# 3: Kennedy 0.7075000
# 4: Johnson 0.5550000
# 5: Nixon 0.4859091
Using sprintf
with vectors of text and numbers, outputting to cat
just for newlines.
approval_agg[, sprintf("%s approval rating: %0.1f%%",
president,
ave_approval * 100)] %>%
cat(., sep = "\n")
#
# Truman approval rating: 47.0%
# Eisenhower approval rating: 64.8%
# Kennedy approval rating: 70.8%
# Johnson approval rating: 55.5%
# Nixon approval rating: 48.6%
Finally, for my own selfish reference, since we're talking about formatting, this is how I do commas with base R:
30298.78 %>% round %>% prettyNum(big.mark = ",")
[1] "30,299"
The tidyverse
version is this:
> library(dplyr)
> library(scales)
> set.seed(1)
> m <- runif(5)
> dt <- as.data.frame(m)
> dt %>% mutate(perc=percent(m,accuracy=0.001))
m perc
1 0.2655087 26.551%
2 0.3721239 37.212%
3 0.5728534 57.285%
4 0.9082078 90.821%
5 0.2016819 20.168%
Looks tidy as usual.
This function could transform the data to percentages by columns
percent.colmns = function(base, columnas = 1:ncol(base), filas = 1:nrow(base)){
base2 = base
for(j in columnas){
suma.c = sum(base[,j])
for(i in filas){
base2[i,j] = base[i,j]*100/suma.c
}
}
return(base2)
}
Here's my solution for defining a new function (mostly so I can play around with Curry and Compose :-) ):
library(roxygen)
printpct <- Compose(function(x) x*100, Curry(sprintf,fmt="%1.2f%%"))
Check out the percent
function from the formattable
package:
library(formattable)
x <- c(0.23, 0.95, 0.3)
percent(x)
[1] 23.00% 95.00% 30.00%
try this~
data_format <- function(data,digit=2,type='%'){
if(type=='d') {
type = 'f';
digit = 0;
}
switch(type,
'%' = {format <- paste("%.", digit, "f%", type, sep='');num <- 100},
'f' = {format <- paste("%.", digit, type, sep='');num <- 1},
cat(type, "is not a recognized type\n")
)
sprintf(format, num * data)
}
You can use the scales package just for this operation (without loading it with require or library)
scales::percent(m)
Check out the scales
package. It used to be a part of ggplot2
, I think.
library('scales')
percent((1:10) / 100)
# [1] "1%" "2%" "3%" "4%" "5%" "6%" "7%" "8%" "9%" "10%"
The built-in logic for detecting the precision should work well enough for most cases.
percent((1:10) / 1000)
# [1] "0.1%" "0.2%" "0.3%" "0.4%" "0.5%" "0.6%" "0.7%" "0.8%" "0.9%" "1.0%"
percent((1:10) / 100000)
# [1] "0.001%" "0.002%" "0.003%" "0.004%" "0.005%" "0.006%" "0.007%" "0.008%"
# [9] "0.009%" "0.010%"
percent(sqrt(seq(0, 1, by=0.1)))
# [1] "0%" "32%" "45%" "55%" "63%" "71%" "77%" "84%" "89%" "95%"
# [11] "100%"
percent(seq(0, 0.1, by=0.01) ** 2)
# [1] "0.00%" "0.01%" "0.04%" "0.09%" "0.16%" "0.25%" "0.36%" "0.49%" "0.64%"
# [10] "0.81%" "1.00%"
Source: Stackoverflow.com