Disclaimer: this is just a general outline of things I've found useful.
- Data structures
- Numbers
- Strings
- General
- Numerical summaries
- Plotting
- Modelling data
- Probability distributions
- Useful functions
1. DATA STRUCTURES
Data frame
data.frame(a, b, ...)
Vector
vector() # empty or c(a, b, ...) # specific values or c(a:b) # integer sequence
Matrix
matrix(data, nrow = y, ncol = x)
List
list(a, b, ...)
Display structure
str(data)
... or length
length(data)
Convert data structure
as.vector(data) /matrix, etc.
Rename table column
names(data)[index] <- name
Add row
rbind(data, c(a, b, ...))
... or column
cbind(data, c(a, b, ...))
Append to specific location
append(data, c(a, b, ...), i) # where i is the position before append
Note on indexing
Starts from 1 as opposed to 0;
variable[row, column] (e.g. list first ten rows of matrix data[1:10, ] or first ten columns data[ , 1:10])
2. NUMBERS
Sequence
seq(min, max, increment)
Random numbers (uniform distribution)
runif(n, min, max) # where n is the quantity of values to be generated
Rounding
round(value, dp)
Transpose
t(data)
Cumulative sum
cumsum(data)
3. STRINGS
Concatenate vectors to character strings
paste(x, y, ..., sep = " ", collapse = NULL)
Convert characters from lower to uppercase and vice versa
toupper(data)
tolower(data)
casefold(x, upper = TRUE/FALSE) # alternative method
chartr(old = "example", new = "eXaMpLe", data) # specify new character pattern
Convert string to expression
parse(text = data)
4. GENERAL
Define variable
x <- vector() # or x <- a; y <- b; z <- c for a series of expressions
Loops
for (i in seq(min, max)) {
do something
}
while (condition) {
do something
}
repeat {
do something
if (condition) {
break
}
}
Conditionals
if (condition) {
do a
} else if (condition) {
do b
} else {
do c
}
Functions
function(arguments) {
return(something)
}
(Note that curly brackets are not needed for single-expression functions.)
Manipulate function body
body(f) <- as.call(c(as.name("{"), e)) # where f is an arbitrary function
Manipulate function arguments
formals(f) <- alist(a = , b = , ...) # where f is an arbitrary function; a and b new arguments
Operators
Errors
to do: message warning, stop
Return n number of first or last values
head(data, n) or tail(data, n)
Look for object
exists("variable")
Check for duplicates in data
duplicated(data)
# Returns a vector of Boolean values.
# If input is a vector, compares consecutive values;
# and if input is a matrix, compares consecutive row vectors.
Remove duplicates
unique(data)
Read clipboard data (MacOS)
read.table(pipe("pbpaste"), sep = "\t", header = F)
Read line from terminal
readline(prompt = "Enter something: ") # disable quoting in strings by including quote = ""
Substitute values (e.g., decimal commas to dots)
scan(text = x, dec = ",", sep = ".") or as.numeric(sub(",", ".", sub(".", "", var, fixed = TRUE), fixed = TRUE))
Check if argument value was specified to a function
missing(data)
e.g.,
a <- function(x, y) {
if (missing(y)) {
do something
}
}
5. NUMERICAL SUMMARIES
Summary
summary(data)
Measures of location
Mean
mean(data)
Weighted mean
weighted.mean(data, weights) # where the two input variables are arbitrary vectors
(Note that if data contains NAs include argument na.rm = TRUE.)
Median
median(data)
Mode
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
mode(data)
Extreme upper
max(data)
Extreme lower
min(data)
Measures of spread
Range
range(data)
IQR
IQR(data)
Variance
var(data)
Standard deviation
sqrt(var(data))
6. PLOTTING
Saving images of plots (in current directory)
png("filename.png")
x() # where x() is an arbitrary function for producing graphical representations
dev.off()
Stemplot
stem(data, scale = 1)
Simple bar chart
barplot(data, xlab = "", ylab = "", main = "", names.arg = c(), col = "")
Side-by-side bar chart
... add argument beside = TRUE;
modify width of bars with width = 0-1;
and input data may be a table or individual vectors.
Add legend
legend("position", title = "", legend = sort(unique(data$groups)), fill = c(), box.lty = 1, box.lwd = 1, box.col = "")
# where position is defined as coordinates x, y, or right, left, etc.;
# data$groups represents a discrete random variable that describes each group;
# lty refers to type, lwd to width, and col colour.
Frequency histogram
hist(data, xlim = c(), right = FALSE, breaks = c(), main = "", xlab = "", col = "")
(Note that right = FALSE transforms bins from right-closed left-open (a, b] to left-closed right-open [a, b), e.g., will include 5 in bin 5-10 instead of 0-5.)
Unit-area histogram
... add argument freq = FALSE
Boxplot
boxplot(vector, data_frame, horizontal = TRUE, main = "", col = "")
(Note that horizontal = TRUE results in horizontal boxplot.)
Comparative boxplot
boxplot(values ~ groups, data,
xlab = "",
ylab = "",
ylim = c(),
main = "",
names = c(),
horizontal = TRUE,
notch = FALSE, # confidence interval around median
varwidth = TRUE, # box width (/height in horizontal) proportionate to sample size
col = c()
)
Scatterplot (to fit a line see below)
plot(x, y,
xlab = "",
ylab = "",
xlim = c(),
ylim = c(),
main = ""
)
7. MODELLING DATA
Linear regression
model <- lm(dependent variable ~ independent variable, data = data.frame)
plot(dependent variable ~ independent variable, data = data.frame)
abline(model)
Normal probability plot
qqnorm(data)
qqline(data)
8. PROBABILITY DISTRIBUTIONS
to do
d - for density - density/mass function
p - for probability - cumulative distribution function
q - for quantile - inverse c.d.f.
r - for random - sampling function
Distributions:
Binomial - binom
Geometric - geom
Poisson - pois
Uniform - unif
Normal - norm
9. USEFUL FUNCTIONS
Check out scripts/ folder in the repository.