Fine Print

I am not an expert in any of the software that is covered in this document although I frequently use them.
If you mess up your files while using R or RStudio command prompt, git, and etc. or crash/burn/blow up your computer, please be aware that I do not accept any responsibility.
Please use the information contained at your own risk!!!

Important Notes

The entire R mini BootCamp workshop is version controlled by Git.
All the files of this workshop are hosted on Bitbucket in a remote public git repository which can be found here.
- To download the remote git public repository, please click here.
- For the detailed software installation, please see Installation_and_Software_Notes.html file.
- Please feel free to use and share any of the content without the permission of the repository owner.
This document is prepared with RStudio using R and R Markdown.
- Please see here for the options of building a HTML document with R Markdown as this one.
All comments, suggestions, and other correspondences should be sent to Omer Kara.

1 Introduction

This document intents to introduce you the basic information, concepts and tools in R such as
- information about R and related software
- R basics
- R objects
- R packages
- data frames and data tidying
- descriptive statistics of tidy data
- exploratory data analysis
- linear regression
- reproducible research and related software
At the end of this workshop, you will
- know how to get help about R
- have a basic knowledge of R programming
- perform statistical analysis
- create powerful graphics
- write your own R functions
- import/download/scrap, load, transform, and tidy the data
- conduct basic exploratory data and regression analyses
- create reproducible research materials
In the development of this document and examples of it, I am not assuming any background on computer programming. Just a little knowledge of statistics, matrix algebra, and data structure is required.

2 R and RStudio

What is R?

R is a sequentially interpreted object-oriented programming language for statistical computing, data mining, web scraping, graphics and more.
Sequential interpretation means that the R cannot handle two procedures at the same time. As a consequence, the code you write will be read starting from the first line, then the second, until the last line.
In R, you can perform simple calculations, vector and matrix operations, data manipulation, create your own functions and procedures, and do almost anything you want with data in an easy and ordered way.
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in 1993.
R is named partly after the first names of the first two R authors and partly as a play on the name of S programming language.
R is currently developed by the R Development Core Team and supported by The R Foundation.

Why to Use R?

Some advantages of using R are as following
- In its latest versions, R matches and exceeds the most of the features of the current available statistical packages.
- R is open-source and completely free in the sense of monetary cost.
- It gives you high flexibility and freedom in the sense of coding options. You can start your coding from scratch or use some built-in functions from R packages.
- In R, you can write code and save it for replication, debugging and modification. You cannot do that when using commands from menus or dialog boxes of some econometric programs.
- The programming language used in R is very similar across methods. So, if you know other programming languages, your job will be relatively easy.
Some disadvantages of R are as following
- R has a very steep learning curve.
- It has a bad graphical user interface (GUI) which is considered not user-friendly. For this reason, we will be using RStudio which is the integrated development environment (IDE) for R.
- R compels you to type in commands for every task. However, it is a good practice to get use to type in your own commands from scratch.

RStudio

RStudio is the best IDE available for R.
It is open-source and free.
It makes R easier to use.
It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for data viewing, plotting, history checking, debugging, workspace management, R packages, and git.
The features of RStudio that I like most are
- syntax highlighting, code completion, and smart indentation
- quickly jumping to function definitions
- integrated R help and documentation
- R Project feature for easily managing multiple working directories
- workspace browser and data viewer
- authoring with Sweave and R Markdown

Note that we will use R as our main programming software, but to make coding easier and more fun we will use RStudio as our IDE, which will use R at the background.
If you want to see my RStudio preferences settings in details, please see R vs. RStudio section in the Installation_and_Software_Notes.html file.
- Note that I am using Mac and the settings might be slightly different for PC users.
- Also, see here for customizing RStudio preferences.
Finally, please take a look at the following screenshots to compare R and RStudio in terms of user functionality.

R General View

RStudio General View

My RStudio View

Downloading and Installing

First, as a starting point for becoming a creative R programmer, the first place you should check is The R Project for Statistical Computing web page.
- In this website, you will find basic and advanced information about R, help guides and manuals, along with the latest distribution of R for different platforms (Windows, Mac, Linux) hosted by Comprehensive R Archive Network (CRAN).
- The CRAN is a collection of sites. These sites carry identical materials consisting of R distributions, contributed extensions, documentation for R, and binaries.
Second, you need to download and install R.
- For Mac users: The necessary R installation files can be found here. On the files section, you need to use the installation package matching your operation system.
- For PC users: The necessary R installation files can be found here.
- Once R software is installed, you can open it to see how it looks but you don’t need to since we will be using RStudio.
Third, after R is installed, you need to download and install RStudio.
- RStudio can be downloaded from here.
- Note that you should download the RStudio Desktop version not the RStudio Server.
- At the bottom of that page you will see the installation files. Please use the appropriate link matching your operation system.
Finally, once RStudio is installed, open the software and start exploring.
- To open the R mini BootCamp R project directly, go to the main file directory of the workshop downloaded to your computer from the remote git repository.
- Then, open the R mini BootCamp.Rproj file.
- Opening RStudio directly from the R mini BootCamp.Rproj file will set your working directory as the main file directory of the workshop, and loads some pre-defined paths, functions and even installs and loads some R packages.
- To see these pre-defined items, please see hidden .Rprofile file in the main file directory of the workshop.
- Also, see the R Session Info section, for the R version used to built this document.
For the details of software installation, please see Installation_and_Software_Notes.html file.

Getting Help

In getting help about R, your best friends will be
- Google
- Stack Overflow
- Rseek
- Advanced R by Hadley Wickham
- Use R!
For online learning tools such as tutorials, cheat sheets, articles, and examples to help you learn R and its extensions, please see
For example codes, small tutorials, exploring what others are doing, and even publishing your R code, please see
- R Views
- R-bloggers
- RPubs
Finally, see the Notes folder for R cheat sheets, R manuals from the R Development Core Team, notes of Data Science Specilization online class by Xing Su, and some important articles.

3 R Basics

This section covers some of the important R basics such as
- console and editor
- R objects such as vectors, factors, logicals, matrices, arrays, lists and data
- working directory, worksapce, packages, and more

Console vs. Editor

The first thing you see after opening RStudio is console and editor.
Let’s see how we can utilize from both of them.

Console

After starting RStudio, console shows some basic information about the program’s version, license and citation information.
You will see a > symbol (prompt) and a blinking cursor which indicates that RStudio is waiting for some input.
Use the console, only for short codes, quickly observing your results, error and warning messages and etc.
To start, let’s use it as a calculator

> 10 + 5
#> [1] 15
> 
> (2 * ((10 + 5 - 3) * 2 - 9)) / 2^1
#> [1] 15

After each command, you need to hit the enter key to get a result.
In this document, the printed results are right under the command and starts with #> symbol.
Please note the [1] at the beginning of each result.
- It shows the dimension of the result.
- For example, [1] 15 tells us that it is a one-dimensional array with containing only the number 15.
To make copying codes from this document easy, the prompt sign > will be dropped after this point.

Editor

Console is easy and fast way to run couple lines of code; however, you cannot save it for future use and need to re-write your code every time you want to run it.
As an alternative, R commands can be run on editor which allows you to save your code, manipulate and run whenever necessary.
In the editor, commands are not run automatically even if you press the enter button.
- You need to run your code either line by line, by region or all at the same time.
- In this type of execution, command lines and results are displayed in the console.
In my opinion, if you want to be a programmer, editor is the way you should go.
Let’s see how we can use editor as a calculator.

15 + 5
#> [1] 20

((10 - 2)/4)^2
#> [1] 4

log(10) ## Takes the natural logarithm of the input.
#> [1] 2.3025851

log(exp(5)) ## Note the exponential function written as "exp()".
#> [1] 5

sin(pi/6) ## Sine function and π.
#> [1] 0.5

In R, the # sign is used to comment out your code, text and etc.
After the # sign, all the code, text and etc. will be ignored by R and printed as text.
The convention is to use # for a full comment line and use ## for a code line with a command at the end.
If the project I am dealing with is long and has complicated coding, then I prefer to use # for sections and ## for subsections to ease the reading.

# This is a command line which can be very long if you wish.
3 * 3 ## This is a code line with a command at the end.
#> [1] 9

# This is a comment line for a section.

## This is a comment for the subsection.

R Objects

R regards everything as objects.
A number, string, vector, matrix, data set, result of a regression, plot, function, and etc. are all R objects.
This section covers some of the basic R objects such as vectors, factors, logicals, matrices, arrays, data frames, and lists.

Creating Objects

Before the details of different R objects, let’s see how to create a simple R object.
There are 3 approaches to create any R object which are shown in the below code.

# Approach 1: try to use this approach.
x <- 6
print(x) ## Prints the object. I rarely use this function.
#> [1] 6
x ## Also prints the object.
#> [1] 6


z <- "R mini BootCamp" ## A character string. Note that strings in R are contained within double quotes.
z
#> [1] "R mini BootCamp"

r <- 1:10 ## ":" operator generates regular sequences.
r
#>  [1]  1  2  3  4  5  6  7  8  9 10

# Approach 2: try to avoid this approach.
y = 4
y
#> [1] 4

w = "Go Wolfpack"
w
#> [1] "Go Wolfpack"

# Approach 3: try to avoid this approach.
assign("ncsu", "Go Wolfpack") ## Assings the "Go Wolfpack" character string to ncsu object.
ncsu
#> [1] "Go Wolfpack"

R programming language is case sensitive.
In naming R objects, there is not a certain naming convention agreed upon like other programming languages. However, in general I recommend following the below coding style.
- You should use new.object instead of new-object or new_object.
- Names starting with a digit (e.g., 2b) is not accepted by R.
- You should avoid using object names that are the same as R functions.
  - The code conflicts(detail = TRUE)$.GlobalEnv checks if a user-created Robject conflicts with built-in objects supplied from R packages.
  - All user-created R objects listed by that code should be removed in order to use the built-in objects with the same name. Otherwise, you might get unexpected results.
You can override a R object by creating a different R object with the same name.

x <- 6 ## With lower case name.
x
#> [1] 6

X <- "NCSU" ## With capital letter name.
X
#> [1] "NCSU"

new.object <- 2:6 ## Don't use new_object or new-object.
new.object
#> [1] 2 3 4 5 6

conflicts(detail = TRUE)$.GlobalEnv
#> [1] "income" "x"      "y"      "year"   "lines"  "data"   "c"

y <- "R mini BootCamp" ## Creating an object.
y
#> [1] "R mini BootCamp"
y <- 10:15 ## Overriding it with different values.
y
#> [1] 10 11 12 13 14 15

Vectors

Vectors are one of the most used objects of the R language.
They consist of one or more values of the same type.
- Vectors are one-dimensional data structure.
- To create an empty vector, you can use vector function.
- The easiest way to create a vector is the use of the concatenation function c.
To check the length of a vector, length function should be used.
You can use class, str and some specific control functions to check the class and structure of vectors.
Use the following codes to create a vector containing numeric, integer, complex, character, and logical values.

# Creating an empty vector.
x <- vector("numeric", length = 10) ## Defines the class and length of a vector. You can use other classes that we will see in detail later.
x
#>  [1] 0 0 0 0 0 0 0 0 0 0

# Vector with numeric value.
a <- c(5, 6, 7, 8, 9) ## Concatenate function which created a vector with numeric values.
a
#> [1] 5 6 7 8 9
length(a) ## Give the length of a vector.
#> [1] 5
dim(a) ## There is no dimension for vectors.
#> NULL
is.numeric(a) ## Checks whether the vector is numeric.
#> [1] TRUE
is.double(a) ## Numeric class in R is also called "double". So you can use "is.double" function as well.
#> [1] TRUE
class(a) ## Gives the class of an object in a character string.
#> [1] "numeric"
str(a) ## Gives the details of object structure (class of the object and its values). Try to use it frequently, it is very useful.
#>  num [1:5] 5 6 7 8 9

seq(from = 1, to = 10, by = 1) ## seq function generates regular sequences.
#>  [1]  1  2  3  4  5  6  7  8  9 10
seq(from = 1, to = 10, by = 4)
#> [1] 1 5 9
rep(1, each = 4) ## Replicates each value 3 times.
#> [1] 1 1 1 1
rep(c(2:5), times = 3) ## Replicates all value 4 times.
#>  [1] 2 3 4 5 2 3 4 5 2 3 4 5
rep(c(2:5), each = 3) ## Replicates each value 3 times.
#>  [1] 2 2 2 3 3 3 4 4 4 5 5 5

# Vector with integer values.
x <- c(1L, 2L, 3L, 4L) ## To create an integer in R use "L" after the numeric value.
x
#> [1] 1 2 3 4
is.integer(x) ## Checks whether the vector is integer.
#> [1] TRUE
class(x)
#> [1] "integer"
str(x)
#>  int [1:4] 1 2 3 4

# Vector with complex values.
a <- c(1 + 0i, 2 + 4i) ## Vector with complex values.
a
#> [1] 1+0i 2+4i
is.complex(a) ## Checks whether the vector is complex.
#> [1] TRUE
class(a)
#> [1] "complex"
str(a)
#>  cplx [1:2] 1+0i 2+4i

# Vector with character value.
a <- c("NCSU", "Wolfpack")
a
#> [1] "NCSU"     "Wolfpack"
is.character(a) ## Checks whether the vector is character.
#> [1] TRUE
class(a)
#> [1] "character"
str(a)
#>  chr [1:2] "NCSU" "Wolfpack"

# Vector with logical values.
x <- c(TRUE, FALSE) ## Logical vector. See Logicals section for more details.
x
#> [1]  TRUE FALSE
is.logical(x) ## Checks whether the vector is logical.
#> [1] TRUE
class(x)
#> [1] "logical"
str(x)
#>  logi [1:2] TRUE FALSE

# Vector with names.
a <- c(1:3)
str(a)
#>  int [1:3] 1 2 3
attr(a, "names") <- c("First", "Second", "Third") ## A new attribute and description is added.
a ## Vector with names.
#>  First Second  Third 
#>      1      2      3
str(a) ## Named num.
#>  Named int [1:3] 1 2 3
#>  - attr(*, "names")= chr [1:3] "First" "Second" "Third"

b <- c(First = 1, Second = 2, Third = 3, Fourth = 4, Fifth = 5)
b ## Vector with names.
#>  First Second  Third Fourth  Fifth 
#>      1      2      3      4      5
str(b) ## Named num.
#>  Named num [1:5] 1 2 3 4 5
#>  - attr(*, "names")= chr [1:5] "First" "Second" "Third" "Fourth" ...

Note that a vector should consist only same type of elements.
- Mixing numeric and character values in the same vector, creates a character vector.
- Mixing logical and character values in the same vector, creates a character vector.
- Mixing logical and numeric values in the same vector, creates a numeric vector.
Also note that a numeric vector is not an integer vector, but an integer vector is also numeric.

# Vector with numeric and character values.
a <- c(5, 6, 7, 8, "d") ## Note tha the last element in vector a is a character value.
str(a) ## Note tha class of vector a.
#>  chr [1:5] "5" "6" "7" "8" "d"

# Vector with logical and character values.
b <- c("a", TRUE) ## Character vector.
str(b)
#>  chr [1:2] "a" "TRUE"

# Vector with logical and numeric values.
x <- c(TRUE, 2) ## Numeric (TRUE will be converted into number 1). See Logicals section for more details.
str(x)
#>  num [1:2] 1 2
str(c(FALSE, 2)) ## Numeric (FALSE will be converted into number 0).
#>  num [1:2] 0 2

# Numeric vs. Integer values.
a <- c(1, 2)
is.integer(a)
#> [1] FALSE
is.numeric(a)
#> [1] TRUE

b <- c(1:2)
is.integer(b)
#> [1] TRUE
is.numeric(b)
#> [1] TRUE

You can create a new vector by combining two or more vectors.

a <- c(5:15)
b <- c(10:20)
c <- c(25:30)
x <- c(b, a) ## Combining two vectors.
x 
#>  [1] 10 11 12 13 14 15 16 17 18 19 20  5  6  7  8  9 10 11 12 13 14 15
y <- c(a, b, c) ## Combining three vectors.
y
#>  [1]  5  6  7  8  9 10 11 12 13 14 15 10 11 12 13 14 15 16 17 18 19 20 25
#> [24] 26 27 28 29 30

Vectors are very useful to perform simultaneous operations (vectorized operations).

a <- c(1:10)
2 + a ## Vectorized operation.
#>  [1]  3  4  5  6  7  8  9 10 11 12
1/a
#>  [1] 1.00000000 0.50000000 0.33333333 0.25000000 0.20000000 0.16666667
#>  [7] 0.14285714 0.12500000 0.11111111 0.10000000

When only one vector or multiple vectors with the same length are used in vectorized operations then the length of the vector(s) are not important.
- Consider the vectors a and b defined as below.
If you perform vectorized operations with vectors of different lengths, then the shortest one will be recycled over the longer one.
- Consider the vectors x and y defined as below.

# Vectors with same length.
a <- seq(from = 1, to = 10, by = 2)
a
#> [1] 1 3 5 7 9
b <- seq(from = 1, to = 15, by = 3)
b
#> [1]  1  4  7 10 13
a/b
#> [1] 1.00000000 0.75000000 0.71428571 0.70000000 0.69230769

# Vectors with different length.
x <- seq(from = 1, to = 10, by = 2)
x
#> [1] 1 3 5 7 9
y <- seq(from = 1, to = 10, by = 3)
y
#> [1]  1  4  7 10

x/y ## Note the last value in this vector.
#> Warning in x/y: longer object length is not a multiple of shorter object
#> length
#> [1] 1.00000000 0.75000000 0.71428571 0.70000000 9.00000000

There are some operations that are specific to vectors.
For example, some basic statistical tools are given by the following codes.

a <- rnorm(n = 10000, mean = 0, sd = 1) ## Random number generator for the normal distribution with the specified mean and standard deviation. This is standard normal distribution.
head(x = a, n = 5) ## Prints the first 5 elements of a vector.
#> [1] -1.20706575  0.27742924  1.08444118 -2.34569770  0.42912469
tail(x = a, n = 5) ## Prints the last 5 elements of a vector.
#> [1]  0.019739024 -2.126745287 -0.050222009 -0.238174080  0.776405308
mean(a) ## Mean.
#> [1] 0.006115893
var(a) ## Variance.
#> [1] 0.97521426
sum((a - mean(a))^2) / (length(a) - 1) ## Same as above.
#> [1] 0.97521426
sd(a) ## Standard deviation.
#> [1] 0.98752937
sqrt(var(a)) ## Same as above. sqrt function is for square root.
#> [1] 0.98752937
min(a) ## Minimum value.
#> [1] -3.3960635
max(a) ## Maximum value.
#> [1] 3.6181065
sum(a) ## Total of all elements in a vector.
#> [1] 61.15893

b <- c(seq(from = 10, to = 20, by = 4), seq(from = 10, to = 20, by = 2))
b
#> [1] 10 14 18 10 12 14 16 18 20
sqrt(b) ## Square root.
#> [1] 3.1622777 3.7416574 4.2426407 3.1622777 3.4641016 3.7416574 4.0000000
#> [8] 4.2426407 4.4721360
log(b)
#> [1] 2.3025851 2.6390573 2.8903718 2.3025851 2.4849066 2.6390573 2.7725887
#> [8] 2.8903718 2.9957323
sort(b, decreasing = FALSE, na.last = TRUE) ## Sorts the value of a vector alphabetically. na.last puts the missing values at the end. We will see the missing values later.
#> [1] 10 10 12 14 14 16 18 18 20
unique(b) ## Gives you the unique values in a vector.
#> [1] 10 14 18 12 16 20
sort(unique(b)) ## Unique values are sorted.
#> [1] 10 12 14 16 18 20

When the vectors are long, the numbers in squared brackets help you to identify the position of the elements in the vector.
The exact display will depend on the width of your console.

a <- seq(from = 1, to = 200, by = 3)
a
#>  [1]   1   4   7  10  13  16  19  22  25  28  31  34  37  40  43  46  49
#> [18]  52  55  58  61  64  67  70  73  76  79  82  85  88  91  94  97 100
#> [35] 103 106 109 112 115 118 121 124 127 130 133 136 139 142 145 148 151
#> [52] 154 157 160 163 166 169 172 175 178 181 184 187 190 193 196 199

For example, the 18th element in vector a is 52.

Factors

Factors are used to represent categorical data.
- They are one-dimensional data structure like vectors.
- One can think of a factor as an integer vector where each integer has a label (level).
Factors can be unordered or ordered.
- The order of the levels can be set using the levels argument in the factor function.
- This can be important in linear modelling since the first level is used as the baseline level.
- Without the levels statement the baseline level is determined by alphabetic order.

a <- c("yes", "yes", "no", "yes", "no") ## A character vector.
str(a)
#>  chr [1:5] "yes" "yes" "no" "yes" "no"

b <- factor(x = a) ## Creates the factor and gives you the "Levels".
b
#> [1] yes yes no  yes no 
#> Levels: no yes
is.factor(b) ## Checks whether the object is a factor.
#> [1] TRUE
class(b)
#> [1] "factor"
str(b) ## Note that the levels are automatically identified by alphabetic order.
#>  Factor w/ 2 levels "no","yes": 2 2 1 2 1
attributes(b) ## Gives the object's attributes.
#> $levels
#> [1] "no"  "yes"
#> 
#> $class
#> [1] "factor"
levels(b) ## Gives the levels.
#> [1] "no"  "yes"
table(b) ## Gives the number of levels by factors.
#> b
#>  no yes 
#>   2   3
unclass(b) ## Shows the factors in numbers (no:1, yes:2).
#> [1] 2 2 1 2 1
#> attr(,"levels")
#> [1] "no"  "yes"

b <- factor(x = a, levels = c("yes", "no"))
b ## Note that the levels are identified with levels argument.
#> [1] yes yes no  yes no 
#> Levels: yes no
table(b) ## Gives the number of levels by factors.
#> b
#> yes  no 
#>   3   2
unclass(b) ## Shows the factors in numbers (no:1, yes:2).
#> [1] 1 1 2 1 2
#> attr(,"levels")
#> [1] "yes" "no"

attr(b, "levels") <- c("Aye", "Nay") ## Changin the level by using the attr function.
b
#> [1] Aye Aye Nay Aye Nay
#> Levels: Aye Nay

# Factor with names
a <- c(1:3)
attr(a, "names") <- c("First", "Second", "Third") ## A new attribute and description is added.
b <- factor(x = a)
b ## Note that the levels are identified with levels argument.
#>  First Second  Third 
#>      1      2      3 
#> Levels: 1 2 3

You can also use gl function for easily creating factors in R.

gl(n = 2, k = 4, labels = c("yes", "no")) ## Creates a factor object with 2 levels and 8 replications.
#> [1] yes yes yes yes no  no  no  no 
#> Levels: yes no

Sometimes our numeric variable needs to be redefined as a factor variable with the appropriate levels that corresponds to various intervals.
This can be done by using cut function.

a <- c(1:15)
a.factor <- cut(x = a, breaks = c(min(a), 6, 12, max(a))) ## Note that first value is not included in the interval.
table(a.factor)
#> a.factor
#>   (1,6]  (6,12] (12,15] 
#>       5       6       3
a.factor <- cut(x = a, breaks = c(min(a) - 1, 12, max(a))) ## Now all values are included.
table(a.factor)
#> a.factor
#>  (0,12] (12,15] 
#>      12       3

a.factor <- cut(x = a, breaks = c(6, 12, max(a)), include.lowest = TRUE) ## If the minumum value is not specified, you need to use include.lowest argument.
table(a.factor)
#> a.factor
#>  [6,12] (12,15] 
#>       7       3

a.factor <- cut(x = a, breaks = c(min(a) - 1, 6, 12, max(a)), labels = c("1st Group", "2nd Group", "3rd Group")) ## Note that we also defined the labels.
table(a.factor)
#> a.factor
#> 1st Group 2nd Group 3rd Group 
#>         6         6         3
str(a.factor)
#>  Factor w/ 3 levels "1st Group","2nd Group",..: 1 1 1 1 1 1 2 2 2 2 ...

Logicals

Another key element in the R language is logical objects.
- Logical objects in R are one-dimensional data structure like vectors.
- They can take only two values which are TRUE and FALSE.
You define logicals with names but it is rarely used.
You can create a logical object by explicitly assigning it to a variable like in the below code.

a <- TRUE
a
#> [1] TRUE
is.logical(a) ## Checks whether the vector is logical.
#> [1] TRUE
class(a)
#> [1] "logical"
str(a)
#>  logi TRUE

b <- FALSE
str(b)
#>  logi FALSE

x <- "TRUE"
str(x)
#>  chr "TRUE"

y <- c(TRUE, FALSE, FALSE)
str(y)
#>  logi [1:3] TRUE FALSE FALSE

In the most cases, logical objects are created while performing element comparison using boolean operators.
Boolean operators are fundamental in any programming language.
The special boolean operators available in R are equal (==), not equal (!), and (& , &&) and or (|, ||).
- The shorter form of and and or evaluate your code element by element in the same way as arithmetic operators.
- The longer form of and and or evaluate your code by examining only the first element of each vector. Evaluation proceeds only until the result is determined.
- The longer form is appropriate for programming control-flow and typically preferred in if clauses.
Also, %in% can be used to logically check whether there is a match in the right object for the elements of the left object.
Let’s consider the following example codes for boolean operators in R.

36 == 36 ## Checks equality of two numeric objects.
#> [1] TRUE
6 * 6 == 30 + 6 ## Checks the equality of two seperate calculations.
#> [1] TRUE
TRUE == FALSE ## Checks the equaility of two logical objects.
#> [1] FALSE
"NCSU" == "ncsu" ## Checks equality of two character objects.
#> [1] FALSE

36 != 36 ## Checks non-equaility.
#> [1] FALSE
36 != 6 ## Checks non-equaility.
#> [1] TRUE
TRUE != FALSE
#> [1] TRUE

1 < 0 ## Smaller than.
#> [1] FALSE
1 > 0 ## Bigger than.
#> [1] TRUE

1 <= 0 ## Smaller than and equal to.
#> [1] FALSE
1 >= 0 ## Bigger than and equal to.
#> [1] TRUE

-2:2 >= 0 ## Elementwise evaluation. Note that the shorter object is recycled fully over the longer one since the longer object length is a multiple of shorter object length.
#> [1] FALSE FALSE  TRUE  TRUE  TRUE

(-2:2 >= 0) & (-2:2 <= 0) ## Elementwise evalutation. Note that the lengths of two vectors are same.
#> [1] FALSE FALSE  TRUE FALSE FALSE

((-2:2) >= 0) && ((-2:2) <= 0) ## Only evaluates the first elements in each vector.
#> [1] FALSE

(-3:3 >= 0) & (-2:2 <= 0) ## Elementwise evaluation. Note that the shorter object cannot be recycled fully over the longer one since the longer object length is a multiple of shorter object length.
#> Warning in (-3:3 >= 0) & (-2:2 <= 0): longer object length is not a
#> multiple of shorter object length
#> [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
length(-3:3) %% length(-2:2) ## The reminder.
#> [1] 2

1:6 %in% 3:10 ## Elements of the left object is checked individually whether it matches with any of the right object elements.
#> [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

R gives a numeric interpretation to logical objects.
- FALSE is equivalent to scalar 0.
- TRUE is equivalent to scalar 1.

a <- 0
class(a) ## Object "a" is a numeric object.
#> [1] "numeric"
a == FALSE ## But it is still considered as FALSE since its value is "0".
#> [1] TRUE

b <- 1
class(b) ## Object "b" is a numeric object.
#> [1] "numeric"
b == TRUE ## But it is still considered as TRUE since its value is "1".
#> [1] TRUE

x <- 2
class(x)
#> [1] "numeric"
x == TRUE
#> [1] FALSE

You can use logical value to create control structures.
We will see the details of control structures in the Control Structures section.

a <- c(1:4, 2:5)
a == 3 ## Note tha TRUE values.
#> [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

a <- c(5:15) ## First vector.
b <- c(10:20) ## Second vector.
x <- c(b, a) ## First and second vector are combined.
x
#>  [1] 10 11 12 13 14 15 16 17 18 19 20  5  6  7  8  9 10 11 12 13 14 15

sort(unique(x)) ## Unique values are sorted.
#>  [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

sort(unique(x)) == sort(unique(c(a, b))) ## Checking if the sorting the unique values worked. Note the order of vectors (here it is c(a, b) but c vector is defined as c(b, a)).
#>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> [15] TRUE TRUE
sum(!(sort(unique(x)) == sort(unique(c(a, b))))) == 0 ## A quick control structure using the logical object and vectorized operations. So, there is no need to check it item by item.
#> [1] TRUE

You can make your code perform different calculations based on a statement being TRUE or FALSE.
We will see the details of if statements in the Control Structures section.

x <- TRUE
if (x == TRUE) { ## We will see the details of IF statements later.
    print("My first IF statement.")
}
#> [1] "My first IF statement."

if (x) {
    print("IF statement result is TRUE, so it will be printed.")
}
#> [1] "IF statement result is TRUE, so it will be printed."

if (!x) {
    print("IF statement result is FALSE, so it won't be printed.")
} else {
    print("IF statement result is TRUE, so it will be printed.")
}
#> [1] "IF statement result is TRUE, so it will be printed."

Matrices

Matrices are frequently used in statistics, and so they play an important role in R.
Unlike the vectors, matrices have 2 dimensions, which are rows and columns.
To create a matrix you can use the function matrix as shown below.

matrix(data = 1:6, nrow = 2, ncol = 3, byrow = FALSE) ## Default is to fill the matrix by column.
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
matrix(data = 1:6, nrow = 2) ## You can define the length of one dimension.
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
matrix(data = 1:2, nrow = 2, ncol = 3) ## If the elements are not enought the data vector will be recycled to fill the whole matrix.
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> [2,]    2    2    2
matrix(0, 2, 3) ## Creates a zero matrix. Note that value 0 recycles.
#>      [,1] [,2] [,3]
#> [1,]    0    0    0
#> [2,]    0    0    0

matrix(data = 1:6, nrow = 2, ncol = 3, byrow = TRUE) ## Filled by row.
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6

a <- matrix(data = 1:6, nrow = 2, ncol = 3, dimnames = list(c("row1", "row2"), c("col1", "col2", "col3"))) ## We can also define the dimension names. Note that you need to use list() function. Wee will see the details of lists later.
a
#>      col1 col2 col3
#> row1    1    3    5
#> row2    2    4    6
is.matrix(a) ## Checks whether the object is a matrix.
#> [1] TRUE
class(a)
#> [1] "matrix"
attributes(a) ## Gives the object's attributes.
#> $dim
#> [1] 2 3
#> 
#> $dimnames
#> $dimnames[[1]]
#> [1] "row1" "row2"
#> 
#> $dimnames[[2]]
#> [1] "col1" "col2" "col3"
str(a)
#>  int [1:2, 1:3] 1 2 3 4 5 6
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:2] "row1" "row2"
#>   ..$ : chr [1:3] "col1" "col2" "col3"

x <- c(1:6) ## You can also create matrix by first creating a vector then assigning its dimension.
dim(x) <- c(2, 3) ## Note that the matrix is filled by columns.
x
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

The below section introduces some functions which can be used for matrices in R.
Note that a matrix with dimension names has very similar characteristics to R data frames.
- So, most of the functions can also be used for data frames.
- The details of data frames will be covered in the Data Frames and Data sections.

x <- matrix(data = 9:4, nrow = 3, ncol = 2, dimnames = list(c("row1", "row2", "row3"), c("col1", "col2")))
x
#>      col1 col2
#> row1    9    6
#> row2    8    5
#> row3    7    4

dim(x) ## Dimensions of a matrix.
#> [1] 3 2
dimnames(x) ## Dimension names.
#> [[1]]
#> [1] "row1" "row2" "row3"
#> 
#> [[2]]
#> [1] "col1" "col2"
nrow(x) ## Number of rows.
#> [1] 3
ncol(x) ## Number of columns.
#> [1] 2
rownames(x) ## Gives the row names.
#> [1] "row1" "row2" "row3"
colnames(x) ## Gives the column names.
#> [1] "col1" "col2"

rowSums(x) ## Row sums.
#> row1 row2 row3 
#>   15   13   11
colSums(x) ## Column sums.
#> col1 col2 
#>   24   15
rowMeans(x) ## Row means.
#> row1 row2 row3 
#>  7.5  6.5  5.5
colMeans(x) ## Column means.
#> col1 col2 
#>    8    5

diag(x = 2, nrow = 3, ncol = 3) ## Extract or replace the diagonal of a matrix, or construct a diagonal matrix. Here it creates an identity matrix.
#>      [,1] [,2] [,3]
#> [1,]    2    0    0
#> [2,]    0    2    0
#> [3,]    0    0    2
diag(x = 3, nrow = 2, ncol = 2) ## Creates a diagonal matrix with values of 3.
#>      [,1] [,2]
#> [1,]    3    0
#> [2,]    0    3
diag(x = 3, nrow = 2, ncol = 3) ## One additional column.
#>      [,1] [,2] [,3]
#> [1,]    3    0    0
#> [2,]    0    3    0
diag(x = 3, nrow = 4, ncol = 3) ## One additional row.
#>      [,1] [,2] [,3]
#> [1,]    3    0    0
#> [2,]    0    3    0
#> [3,]    0    0    3
#> [4,]    0    0    0

y <- diag(x = 3, nrow = 5, ncol = 5)
diag(y) ## Extracts the diagonal.
#> [1] 3 3 3 3 3

z <- matrix(0 , nrow = 3, ncol = 3)
diag(z) <- 1:3 ## Assigns the given values to diagonal.
z
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    2    0
#> [3,]    0    0    3

lower.tri(z, diag = FALSE) ## Gives the lower triangle of a matrix in logical values. You can use the result of this function to subset the lower triangle.
#>       [,1]  [,2]  [,3]
#> [1,] FALSE FALSE FALSE
#> [2,]  TRUE FALSE FALSE
#> [3,]  TRUE  TRUE FALSE
upper.tri(z, diag = FALSE) ## Gives the upper triangle of a matrix.
#>       [,1]  [,2]  [,3]
#> [1,] FALSE  TRUE  TRUE
#> [2,] FALSE FALSE  TRUE
#> [3,] FALSE FALSE FALSE
upper.tri(z, diag = TRUE) ## Diagonal is included.
#>       [,1]  [,2] [,3]
#> [1,]  TRUE  TRUE TRUE
#> [2,] FALSE  TRUE TRUE
#> [3,] FALSE FALSE TRUE

The below section introduces the matrix operations in R.

x <- matrix(data = 1:6, nrow = 2, ncol = 3)
y <- matrix(data = 6:11, nrow = 3, ncol = 2)
z <- matrix(data = 10:15, nrow = 2, ncol = 3)
w <- matrix(data = 10:13, nrow = 2, ncol = 2)

t(x) ## Transpose.
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
#> [3,]    5    6
det(w) ## Determinant.
#> [1] -2
solve(w) ## inverse.
#>      [,1] [,2]
#> [1,] -6.5    6
#> [2,]  5.5   -5
eigen(w) ## Eigen values.
#> $values
#> [1] 23.086630226 -0.086630226
#> 
#> $vectors
#>             [,1]        [,2]
#> [1,] -0.67584466 -0.76549652
#> [2,] -0.73704409  0.64344003

x + 1 ## Scalar summation.
#>      [,1] [,2] [,3]
#> [1,]    2    4    6
#> [2,]    3    5    7
x + z ## Matrix summation.
#>      [,1] [,2] [,3]
#> [1,]   11   15   19
#> [2,]   13   17   21
x / z ## Division by element.
#>            [,1]       [,2]       [,3]
#> [1,] 0.10000000 0.25000000 0.35714286
#> [2,] 0.18181818 0.30769231 0.40000000

2 * x ## Scalar multiplication.
#>      [,1] [,2] [,3]
#> [1,]    2    6   10
#> [2,]    4    8   12
c(2, 10) * x ## Scalar multiplication by row.
#>      [,1] [,2] [,3]
#> [1,]    2    6   10
#> [2,]   20   40   60
c(1, 2, 3) * y
#>      [,1] [,2]
#> [1,]    6    9
#> [2,]   14   20
#> [3,]   24   33

x %% z ## Matrix multiplication
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

crossprod(x) ## Cross product.
#>      [,1] [,2] [,3]
#> [1,]    5   11   17
#> [2,]   11   25   39
#> [3,]   17   39   61
kronecker(x, y) ## Kronecker product.
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    6    9   18   27   30   45
#> [2,]    7   10   21   30   35   50
#> [3,]    8   11   24   33   40   55
#> [4,]   12   18   24   36   36   54
#> [5,]   14   20   28   40   42   60
#> [6,]   16   22   32   44   48   66
c(1:5) %o% c(1:5) ## Outer product
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    2    4    6    8   10
#> [3,]    3    6    9   12   15
#> [4,]    4    8   12   16   20
#> [5,]    5   10   15   20   25
matrix(1:5, 5, 1) %*% matrix(1:5, 1, 5) ## Same result as above.
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    2    4    6    8   10
#> [3,]    3    6    9   12   15
#> [4,]    4    8   12   16   20
#> [5,]    5   10   15   20   25

You can merge matrices using rbind and cbind functions.
Note that if the dimensions do not match, R employs recycling.

x <- matrix(data = 1:6, nrow = 2, ncol = 3)
y <- matrix(data = 6:11, nrow = 3, ncol = 2)
z <- matrix(data = 10:15, nrow = 2, ncol = 3)

rbind(x, z)
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
#> [3,]   10   12   14
#> [4,]   11   13   15
rbind(x, 1, 0) ## Recycling by columns.
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
#> [3,]    1    1    1
#> [4,]    0    0    0

cbind(x, z)
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    3    5   10   12   14
#> [2,]    2    4    6   11   13   15
cbind(x, t(y))
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    3    5    6    7    8
#> [2,]    2    4    6    9   10   11
cbind(x, 1, 0) ## Recyling by rows.
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    3    5    1    0
#> [2,]    2    4    6    1    0

Arrays

Unlike the vectors, arrays have multiple dimensions.
Note the subtle difference between matrices and arrays.
- All the matrices are arrays, but not all arrays are matrices.
- An array is simply a vector with additional attributes and dimension.
- A matrix is a two-dimensional array, but you can create arrays of higher dimension.
To create an array use the function array as shown below.

array(data = 1:6, dim = c(2, 3), dimnames = NULL) ## Two-dimensional array.
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
matrix(data = 1:6, nrow = 2, ncol = 3, dimnames = NULL) ## Same as above.
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

x <- array(data = 1:12, dim = c(3, 2, 2)) ## 3-dimensional arrays.
x
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11
#> [3,]    9   12
dim(x)
#> [1] 3 2 2
is.array(x) ## Checks whether the object is an array.
#> [1] TRUE
class(x)
#> [1] "array"
str(x)
#>  int [1:3, 1:2, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
attributes(x) ## Gives the object's attributes.
#> $dim
#> [1] 3 2 2

y <- array(data = 1:12, dim = c(3, 2, 2), dimnames = list(c("Row.1", "Row.2", "Row.3"), c("Col.1", "Col.2"), c("Dim.1", "Dim.2"))) ## Dimension names should be in a list form. See the Lists section first and check this function again.
y
#> , , Dim.1
#> 
#>       Col.1 Col.2
#> Row.1     1     4
#> Row.2     2     5
#> Row.3     3     6
#> 
#> , , Dim.2
#> 
#>       Col.1 Col.2
#> Row.1     7    10
#> Row.2     8    11
#> Row.3     9    12
str(y)
#>  int [1:3, 1:2, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, "dimnames")=List of 3
#>   ..$ : chr [1:3] "Row.1" "Row.2" "Row.3"
#>   ..$ : chr [1:2] "Col.1" "Col.2"
#>   ..$ : chr [1:2] "Dim.1" "Dim.2"

Data Frames

A data frame is a series of records represented by rows, each containing values in several fields which are called as columns.
- Data frames are the most common way of storing data in R.
- They are not matrices but they share many characteristics of matrices.
- Actually, data frames are lists equal-length vectors with some additional structure.
There are many types of data and ways to load them into the R workspace as objects.
- Most of the time you will have big data sets that need to be loaded into R and typing them one by one is not a good idea.
- However, in this section we will create data frames from scratch and cover only the basics of data frames.
- The details of data frames will be covered in the Data section.
An empty data frame can be created by using data.frame() code.

my.data <- data.frame() ## Creates an empty data frame.
is.data.frame(my.data) ## Checks whether the object is a data frame.
#> [1] TRUE
class(my.data)
#> [1] "data.frame"
str(my.data)
#> 'data.frame':    0 obs. of  0 variables

You can create a non-empty data frame by supplying named vectors as input in the data.frame function.

sample.size <- 30 ## Defining the sample size for later use.
column.1 <- round(rnorm(n = sample.size, mean = 5, sd = 1), digits = 2) ## Random number generator for the normal distribution with the specified mean and standard deviation. ## round function rounds the numeric values.
column.2 <- sample(x = c(-50:50, NA), size = sample.size, replace = TRUE, prob = NULL) ## Sample function takes a sample of the specified size from the elements of x using either with or without replacement. Creates an integer class.
column.3 <- sample(x = c("NCSU", "CALS", "Economics"), size = sample.size, replace = TRUE, prob = NULL)
column.4 <- factor(sample(x = c("Yes", "No"), size = sample.size, replace = TRUE, prob = NULL))
column.5 <- sample(x = c(TRUE, FALSE), size = sample.size, replace = TRUE, prob = NULL)

my.data <- data.frame(Column.1 = column.1, Column.2 = column.2, Column.3 = column.3, Column.4 = column.4, Column.5 = column.5) ## Creating data frame from scratch.
my.data

is.data.frame(my.data)
#> [1] TRUE
class(my.data)
#> [1] "data.frame"
str(my.data) ## Note that column.3 is factor variable but we wanted a character class.
#> 'data.frame':    30 obs. of  5 variables:
#>  $ Column.1: num  3.18 5.63 5.52 5.14 6.46 4.51 2.88 4.87 4.57 5.09 ...
#>  $ Column.2: int  NA -21 -33 32 6 9 34 21 -42 -9 ...
#>  $ Column.3: Factor w/ 3 levels "CALS","Economics",..: 1 2 2 2 3 2 2 1 2 1 ...
#>  $ Column.4: Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 2 2 2 ...
#>  $ Column.5: logi  FALSE FALSE TRUE FALSE TRUE TRUE ...
attributes(my.data) ## Gives the object's attributes.
#> $names
#> [1] "Column.1" "Column.2" "Column.3" "Column.4" "Column.5"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> [24] 24 25 26 27 28 29 30
#> 
#> $class
#> [1] "data.frame"

Note that data.frame’s default behavior turns character strings into factors.
Using stringsAsFactors = FALSE argument suppresses this behavior.

new.data <- data.frame(Column.1 = column.1, Column.2 = column.2, Column.3 = column.3, Column.4 = column.4, Column.5 = column.5, stringsAsFactors = FALSE) ## Creating data frame from scratch.
new.data

is.data.frame(new.data)
#> [1] TRUE
class(new.data)
#> [1] "data.frame"
str(new.data) ## Note that column.3 is character class which is what we wanted.
#> 'data.frame':    30 obs. of  5 variables:
#>  $ Column.1: num  3.18 5.63 5.52 5.14 6.46 4.51 2.88 4.87 4.57 5.09 ...
#>  $ Column.2: int  NA -21 -33 32 6 9 34 21 -42 -9 ...
#>  $ Column.3: chr  "CALS" "Economics" "Economics" "Economics" ...
#>  $ Column.4: Factor w/ 2 levels "No","Yes": 2 2 1 2 2 2 2 2 2 2 ...
#>  $ Column.5: logi  FALSE FALSE TRUE FALSE TRUE TRUE ...

The below section introduces some functions which can be used for data frames in R.
Note that a data frame has very similar characteristics to a matrix with dimension names.
- So, most of the functions can also be used for matrices.
- The details of matrices will be covered in the Matrices section.

data.1 <- data.frame(c(1:5), c(6:10), c(11:15), c(16:20), stringsAsFactors = FALSE) ## Creating data frame from scratch without specific column names.
data.1


dim(data.1) ## Dimensions of a data frame.
#> [1] 5 4
dimnames(data.1) ## Dimension names.
#> [[1]]
#> [1] "1" "2" "3" "4" "5"
#> 
#> [[2]]
#> [1] "c.1.5."   "c.6.10."  "c.11.15." "c.16.20."
nrow(data.1) ## Number of rows.
#> [1] 5
ncol(data.1) ## Number of columns.
#> [1] 4

rownames(data.1) ## Gives the row names.
#> [1] "1" "2" "3" "4" "5"
colnames(data.1) ## Gives the column names.
#> [1] "c.1.5."   "c.6.10."  "c.11.15." "c.16.20."
names(data.1) ## Gives the column names.
#> [1] "c.1.5."   "c.6.10."  "c.11.15." "c.16.20."

column.names <- paste("Column", ".", 1:ncol(data.1), sep = "") ## Creating the generic column names automatically. paste function pastes the supplied values with a given string using vectorized operations.
column.names <- paste0("Column", ".", 1:ncol(data.1)) ## Same result as above. paste0 function pastes the supplied values with nothing using vectorized operations.
column.names
#> [1] "Column.1" "Column.2" "Column.3" "Column.4"
colnames(data.1) <- column.names ## Assignes the column names to the data frame by using the colnames.

# View(data.1) ## View the data frame in a new tab in interactive R session.
head(x = data.1, n = 2) ## Prints the first 2 elements of a data frame.

tail(x = data.1, n = 2) ## Prints the last 2 elements of a data frame.


rowSums(data.1) ## Row sums.
#> [1] 34 38 42 46 50
colSums(data.1) ## Column sums.
#> Column.1 Column.2 Column.3 Column.4 
#>       15       40       65       90
rowMeans(data.1) ## Row means.
#> [1]  8.5  9.5 10.5 11.5 12.5
colMeans(data.1) ## Column means.
#> Column.1 Column.2 Column.3 Column.4 
#>        3        8       13       18

Lists

A list is a generic vector containing different type of objects.
- Unlike the vectors, lists can be very helpful to vectorize different class of values.
- Lists are similar to vectors, except that each entry can be any R object, even another list.
Other than vectors, matrices and arrays, the main object for holding data in R is a list.

# List without names.
a <- list(1, "a", TRUE, 1 + 4i)
a
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "a"
#> 
#> [[3]]
#> [1] TRUE
#> 
#> [[4]]
#> [1] 1+4i
is.list(a) ## Checks whether the object is a list.
#> [1] TRUE
class(a)
#> [1] "list"
str(a)
#> List of 4
#>  $ : num 1
#>  $ : chr "a"
#>  $ : logi TRUE
#>  $ : cplx 1+4i

# List with names.
a <- list(Numeric = 1, Character = "a", Logical = TRUE, Complex = 1 + 4i)
a
#> $Numeric
#> [1] 1
#> 
#> $Character
#> [1] "a"
#> 
#> $Logical
#> [1] TRUE
#> 
#> $Complex
#> [1] 1+4i
names(a) ## Gives a character vector of all the names of objects in a list.
#> [1] "Numeric"   "Character" "Logical"   "Complex"

# List inside of a list.
a <- list(c(2:4), "k", TRUE, list(rep(1, 3), rep(2, 4)))
a
#> [[1]]
#> [1] 2 3 4
#> 
#> [[2]]
#> [1] "k"
#> 
#> [[3]]
#> [1] TRUE
#> 
#> [[4]]
#> [[4]][[1]]
#> [1] 1 1 1
#> 
#> [[4]][[2]]
#> [1] 2 2 2 2
is.list(a) ## Checks whether the object is a list.
#> [1] TRUE
str(a)
#> List of 4
#>  $ : int [1:3] 2 3 4
#>  $ : chr "k"
#>  $ : logi TRUE
#>  $ :List of 2
#>   ..$ : num [1:3] 1 1 1
#>   ..$ : num [1:4] 2 2 2 2

# List consists of different objects.
x <- c(1:2) ## Numeric vector.
y <- c("NCSU","Wolfpack", "Economics") ## Character vector.
z <- c(TRUE, FALSE, TRUE, FALSE, FALSE) ## Logical vector.
w <- factor(c("yes", "no", "no", "yes")) ## Factor vector.
v <- c(1 + 4i, 4 + 6i, 3 + 3i, 2 + 5i) ## Vector for complex values.
a <- matrix(data = 1:4, nrow = 2, ncol = 2, byrow = FALSE) ## Matrix.
b <- array(1:8, dim = c(2, 2, 2), dimnames = NULL) ## Array
data.1 <- data.frame(Column.1 = c(1:3), Column.2 = c(4:6), Column.3 = c(7:9), stringsAsFactors = FALSE)
my.list <- list(3, x, y, z, w, v, a, b, data.1) ## The list contains diffrent class of objects.
my.list
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 1 2
#> 
#> [[3]]
#> [1] "NCSU"      "Wolfpack"  "Economics"
#> 
#> [[4]]
#> [1]  TRUE FALSE  TRUE FALSE FALSE
#> 
#> [[5]]
#> [1] yes no  no  yes
#> Levels: no yes
#> 
#> [[6]]
#> [1] 1+4i 4+6i 3+3i 2+5i
#> 
#> [[7]]
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> [[8]]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    5    7
#> [2,]    6    8
#> 
#> 
#> [[9]]
#>   Column.1 Column.2 Column.3
#> 1        1        4        7
#> 2        2        5        8
#> 3        3        6        9
str(my.list)
#> List of 9
#>  $ : num 3
#>  $ : int [1:2] 1 2
#>  $ : chr [1:3] "NCSU" "Wolfpack" "Economics"
#>  $ : logi [1:5] TRUE FALSE TRUE FALSE FALSE
#>  $ : Factor w/ 2 levels "no","yes": 2 1 1 2
#>  $ : cplx [1:4] 1+4i 4+6i 3+3i ...
#>  $ : int [1:2, 1:2] 1 2 3 4
#>  $ : int [1:2, 1:2, 1:2] 1 2 3 4 5 6 7 8
#>  $ :'data.frame':    3 obs. of  3 variables:
#>   ..$ Column.1: int [1:3] 1 2 3
#>   ..$ Column.2: int [1:3] 4 5 6
#>   ..$ Column.3: int [1:3] 7 8 9

Working with RStudio

In this section, you will learn the basics of
- working directory concept.
- workspace and file system in RStudio.
- getting help directly in RStudio session.
- R packages.
- citing R software and R packages.

Working Directory

R is always pointed at a directory on your computer file system.
- This directory is generally called as working directory.
- While loading and writing data sets or any kind of R objects, R uses this pre-specified working directory as the base path for file operations.
If you are not working a specific project and if you want R to start in a certain directory, you have to specify a path in your computer file system to be your default working directory.
- This can be easily done in RStudio > Preferences > General > Default working directory (when not in a project).
- Setting the default working directory, allows R to start the R session on this pre-specified path.
If you are working in a project like this one, it is a good idea to create a directory for the project and start R from there.
- This makes it easy to save your work and find it in later R sessions.
- For this purpose, you can and should use the RStudio Project feature.
- For more information about the RStudio Project feature, see RStudio.

In R console or editor, you can
- check and change the working directory of R interactive session.
- also assign the working directory to a character string.
Note that when using older versions of R under any Windows Operating Systems, the slashes must be replaced with double backslashes.

# R code chunk is not evaluated.

R.home() ## Gives you the home directory of R software itself.

getwd() ## Gives the current working directory.
my.current.dir <- getwd() ## Assigns the current working directory to an object.

setwd("Path of Working Directory") ## Sets the working directory to a new one. Note that this can be a relative path or a full path.
setwd(my.current.dir) ## Using the assigned object, setting the working directory.

setwd("~") ## Changes the working directory to home directory.
setwd("../") ## Double dots are used for moving up in the folder hierarchy.
setwd("./") ## A single dot represents the current directory itself.
setwd("/") ## Forward slash changes the working directory to the root.

Workspace

The workspace is your current R working environment and includes any user-created objects such as vectors, matrices, data frames, lists, and functions.
In R console or editor, you can check the class, structure, and attributes of the R objects saved in your workspace.

x <- "NSCU"
class(x)  ## Gives the class of an object in a character string.
#> [1] "character"
str(x) ## Gives the details of object structure (class of the object and its values). Try to use it, very useful.
#>  chr "NSCU"

attributes(x) ## This object does not have any attibutes yet.
#> NULL
attr(x, "Awesomeness Level") <- "Top Notch" ## A new attribute and description is added.
attributes(x) ## Not it has an user assign attribute.
#> $`Awesomeness Level`
#> [1] "Top Notch"
structure(x, new.attribute = "This is a new attribute") ## Returns a new object with modified attributes.
#> [1] "NSCU"
#> attr(,"Awesomeness Level")
#> [1] "Top Notch"
#> attr(,"new.attribute")
#> [1] "This is a new attribute"

attributes(cars) ## cars is a dataset from the datasets package in R.
#> $names
#> [1] "speed" "dist" 
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
#> [47] 47 48 49 50
#> 
#> $class
#> [1] "data.frame"
class(cars)
#> [1] "data.frame"
str(cars)
#> 'data.frame':    50 obs. of  2 variables:
#>  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
#>  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...

In R console or editor, you can
- print all user created R objects in a character vector.
- print all user created R objects along with their structure.
- remove R objects from your workspace.

# R code chunk is not evaluated.

ls() ## Shows the user created objects in your workspace.
objects() ## Shows the user created objects in your workspace.
ls.str() ## Shows the details of all objects in your workspace.

rm(x) ## Removes the object from your workspace.
rm(c(x, y)) ## Removes multiple objects at the same time from your workspace.
rm(list = ls()) ## Removes all objects from your workspace.

In R console or editor, you can
- view current option settings of your R session.
- change option settings of your R session.
- display your coding history of your R session.
- use q() function to end your R session.

# R code chunk is not evaluated.

help(options)
options() ## View current option settings.
options(digits = 3) ## Change an option setting. Number of digits to print on output.

history() # Displays last 25 commandss
history(max.show = Inf) ## Displays all previous commandss

q() ## Ends R session. You will be prompted to save the workspace.

File System

In R console or editor, you can perform file related operations such as create, delete, modify and etc.
You can do much more, but the below code will be sufficient for most of the time.
Note that when using older versions of R under any Windows Operating Systems, the slashes must be replaced with double backslashes.

# R code chunk is not evaluated.

dir() ## Shows the files and folders in the working directory.
dir.create("./folder") ## Will create a directory if it doesn't exist.
file.exists("./RStudi_Setup.R") ## Will check to see if the directory exists.
file.remove("./file.csv") ## Deletes the file or folder in the given path.
unlink("./data.R") ## Deletes the file(s) or directories specified.

list.files("./") ## Lists the files in the given directory.
list.files(pattern = "(.Rmd)") ## Lists the files with the selected pattern in the given directory.

if (!file.exists("./file.txt")) {
    dir.create("./folder") ## Chekcs if the file exists, if not then creates the folder called "data".
}
file.remove("./file.xlsx") ## Removes the file or folder in the given path.

unzip("./file.zip") ## Extract the file from a zip archive.

?files ## See the help file for more information such as renaming, appending and copying files.

fake.data <- rnorm(n = 1e6, mean = 0, sd = 1) ## Creates a fake data for size checking.
object.size(fakeData) ## Gives the size of the R object in bytes.
print(object.size(fakeData), units = "Mb") ## Gives the size of the R object in MB.
file.info("../../R mini BootCamp.Rproj") ## Gives the file information of a file.
file.info("../../R mini BootCamp.Rproj")$size ## Size of the file.

If you want to save your code written in the editor to execute, debug and modify in the future, you can use script files.
These files are plain text files which contains the commands you want to execute, one after another.
To create a new script go to the menu File > New File > R Script and an empty script will appear.
- Then, type in all the commands you want to execute.
- You can put as many commands as needed in a script file.
When you are ready you can save the file, which will end up with the .R extension.
If you want to load and run all the code in this script, you can use source function as shown below.
- Loading a script with the source function, executes all the code lines in the script but does not generate any command line in the console.
- You need to make sure that the desired script is located at the current working directory, otherwise you will need to provide the full path, or at least the relative path, as argument for the source function.

# R code chunk is not evaluated.

source("my_script.R") ## Loads "my_script.R" file from the current working directory.
source("./my_script.R") ## Loads "my_script.R" file from the current working directory.
source("../A Folder/my_script.R") ## Loads "my_script.R" file from another folder. See File System section for more information about file paths in R.
source("FULL PATH of my_script.R file") ## Loads "my_script.R" with the full path.

The easiest way to save and load R objects is by using .RData file format.
- .RData file format allows you to save a R object with its current state in your workspace.
- Loading a .RData file to your workspace, loads the R object with its attributes when it was saved.
- To save and load any R object in RData format, you should use save and load commands.
- Note that load function, loads the R objects with the exact same name when they were saved.
- You can use save.image() command to save your current workspace (all R objects) as a hidden .RData file. +You can use saveRDS and readRDS functions for saving single R object and loading it with a different name.
Another way to save and load R objects is to use function pairs of dump - source and dput - dget functions.
- The functions dump and dput produce text representation of the R objects and save it in a R script file (.R).
- These two functions basically writes R code in a .R file which can create the R object in the future.
- The functions source and dget reads code in a .R file and create the R object.

# R code chunk is not evaluated.

x <- rbinom(n = 10, size = 1, prob = 0.5) ## Random number generator for the binomial distribution with parameters size and prob. n = number of observations, size = number of trials, prob = probability of success on each trial.
y <- c("NCSU", "Wolfpack")
save(x, file = "./x_object.RData") ## Saves the given objects to the RData file.
save(x, y, file = "./xy_object.RData") ## You can save multiple objects in a Rdata file at the same time.
rm(c(x, y)) ## Removes x and y objects.
load("./xy_object.RData") ## Loads the RData file. Note that it overwrites the existing x and y objects if they are still in your workspace..

save.image() ## Saves the current workspace as .RData file. Note that it save as a hidden file.

saveRDS(object = x, file = "./xy_object.RData") ## Saves a single R object to a file.
new.x <- readRDS(file = "xy_object.RData", refhook = NULL) ## Loads the x object as a new object.
new.x <- load("./x.RData") ## Gives error.

dump(c("x", "y"), file = "./data.R") ## Dump can be used for multiple objects.
rm(x)
source("./data.R") ## Loads and runs the R code in the script.

dput(x, file = "./data.R") ## Dput can be used on single R objects.
rm(x)
new.x <- dget("./data.R") ## Loads and runs the R code in the script, then assigns a new name.

Getting Help

To get help for the functions and data sets in R, use help() or ?.
The related help information will be shown in the “Help” tab in RStudio.
Note that the package that contains the function or data sets you seek help about should be installed and loaded.
In my opinion, the built in “Help” tab in RStudio is very useful in learning about what function does, its arguments, detailed explanations, examples and more. You can even find the related functions which will help you learn even more functions and code.

# R code chunk is not evaluated.

help(lm) ## Opens the help page for "lm" function which is for fitting linear models.
?lm
?"lm"
??lm ## Gives the search results for word "lm".
??errorsarlm ## If the package that contains the function is not installed, then you should use "??".
?":" ## Help for operator.
?"%in%"
help.start() ## Opens the main page for R Help.
help.search("covariance") ## Gives the search results for word "covariance".
RSiteSearch("vecm") ## Opens your browser and searches for "vecm" on http://search.r-project.org.

find("lm") ## Tells you what package the function is in.
apropos("lm") ## Returns a character vector giving the names of all objects in the search list that match your query.

args(lm) ## Presents the arguments of the function.
example(lm) ## Presents an example of the searched function.
demo(graphics) ## Gives a user-friendly interface to run some demonstration R scripts.

Packages

The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools, and etc.
R packages are collections of R functions, data, and compiled code in a well-defined format.
Packages are the fundamental units of reproducible R code.
- A core set of packages is included with the installation of R.
- One of the great things about R is the thousands of user-written packages that solve specific problems in various disciplines.
- There are 11752 packages only in CRAN as of December 04, 2017.
- More packages are available in other sources such as Bioconductor, Omegahat, GitHub, and other repositories.
For more information about all available R packages see the followings:
- Contributed packages page for all available packages in CRAN.
- CRAN Task View for packages categorized by areas such as Finance, Genetics, High Performance Computing, Machine Learning, Medical Imaging, Social Sciences and Spatial Statistics.
- RDocumentation and Crantastic for package usage statistics, manuals, example codes, details and more.
- Awesome R and RStudio for recommended packages.

To use R packages, you need to first install the package, which needs to be done just once.
Then, load the package which needs to be performed every time you start R or Rstudio.
The below code shows the most basic methods of installing and loading the R packages.

# R code chunk is not evaluated.

install.packages("tidyr") ## Installs single package.
library("tidyr") ## Loads single package.

install.packages(c("RColorBrewer", "stringr")) ## Installs multiple packages.
lapply(c("RColorBrewer", "stringr"), library, character.only = TRUE) ## Loads multiple packages.

packageVersion("tidyr") ## Current Version of the package.
detach("package:RColorBrewer", unload = TRUE) ## Unloads package.

In some circumstances, you might want to use advanced techniques in installing, loading and updating R packages.
The below code shows you some example codes.

## R chunk is not evaluated.

# Check the CRAN mirror. 
getOption("repos")

# Lists loaded packages in your global environment.
(.packages())

# Some packages need to be installed from the source.
install.packages("rgdal", type = "source")

# Installing a package from GitHub repository.
install.packages("devtools") ## devtools package is necessary to install packages from GitHub repositories.
devtools::install_github("tidyverse/ggplot2") ## user.name/package.name
library("ggplot2")
# devtools::install_github("hadley/devtools")

# Installing a package from bioconductor website.
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
library("rhdf5")

# Installing CRAN Task Views.
## CRAN Task View (https://cran.r-project.org/web/views/) gives you the collection of packges in terms of area. For example spatial, econometrics, graphics and etc. To automatically install these views, the "ctv"" package needs to be installed, and then the views can be installed via "install.views" or "update.views" (which first assesses which of the packages are already installed and up-to-date) functions.
install.packages("ctv") ## Intalling the necessary package to install views.
library("ctv") ## Loading the package.

available.views(repos = NULL) ## Gives you all the views available.
install.views("Econometrics") ## Installing the "Econometrics" view.
update.views("Econometrics") ## Updating the "Econometrics" view.

# Updating packages from the editor or console.
update.packages() ## Updates all packages from CRAN.

devtools::install_github("hrbrmstr/dtupdate") ## Updates Git sourced package. dtupdate package is used for this purpose.
library("dtupdate")
github_update() ## See what packages are avilable to update.

# Some other tools with packages.
find.package("devtools") ## Shows you where (file loaction) the packge is installed in your computer.

search() ## Displays all the packages in the global environment.
utils::installed.packages() ## Displays all the packages that are installed in your computer.
available.packages() ## Displays all the R packages that are available.
head(rownames(available.packages()), 3) ## Shows the names of the first 3 packages which are available.

# Package help.
install.packages("sp") ## "sp" package is for spatial analysis.
library("sp")
vignette("sp") ## Opens the vignette for selected package if available.

Sometimes, it requires a lot of effort and coding to install bunch of packages at the same time.
To get away from installing and then loading packages, you can automate this process by employing a user-written function.
The below function, first installs multiple packages (only if it is not installed before), and loads each of them.
In writing this function, I have inspired from other people and modified it depending on my needs. I recommend you to do the same. It works quite well for me; however, for some packages you might need to use the conventional way.
For details of user-written R functions, please see the Creating R Functions section.

# R code chunk is not evaluated.

Load.Install <- function(package_names) {
    is_installed <- function(mypkg) is.element(mypkg, utils::installed.packages()[ ,1])
    for (package_name in package_names) {
        if (!is_installed(package_name)) {
            utils::install.packages(package_name, dependencies = TRUE)
        }
        suppressMessages(library(package_name, character.only = TRUE, quietly = TRUE, verbose = FALSE))
    }
}

Load.Install(c("plyr", "dplyr", "tidyr", "sp"))

Citation

To cite R software itself and R packages, see the following codes.

# Cite R software.
citation()
#> 
#> To cite R in publications use:
#> 
#>   R Core Team (2017). R: A language and environment for
#>   statistical computing. R Foundation for Statistical Computing,
#>   Vienna, Austria. URL https://www.R-project.org/.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {R: A Language and Environment for Statistical Computing},
#>     author = {{R Core Team}},
#>     organization = {R Foundation for Statistical Computing},
#>     address = {Vienna, Austria},
#>     year = {2017},
#>     url = {https://www.R-project.org/},
#>   }
#> 
#> We have invested a lot of time and effort in creating R, please
#> cite it when using it for data analysis. See also
#> 'citation("pkgname")' for citing R packages.

# Cite R packages.
citation("ggplot2")
#> 
#> To cite ggplot2 in publications, please use:
#> 
#>   H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
#>   Springer-Verlag New York, 2009.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Book{,
#>     author = {Hadley Wickham},
#>     title = {ggplot2: Elegant Graphics for Data Analysis},
#>     publisher = {Springer-Verlag New York},
#>     year = {2009},
#>     isbn = {978-0-387-98140-6},
#>     url = {http://ggplot2.org},
#>   }

Operators

This section tabulates some of the arithmetic and logical operators, along with some others.

Arithmetic Operations
Operator	Description
+	Addition
-	Substraction
*	Multiplication
/	Division
^ or **	Exponentiation
%%	Reminder
%/%	Quotient

Logical Operators
Operator	Description
<	Less than
<=	Less than or equal to
>	Greater than
>=	Greater than or equal to
==	Exactly equal to
!=	Not equal to
\| and \|\|	OR
& and &&	AND
%in%	QLeft to rigth matching

Other Operators
Operator	Description
:	Generates regular sequences

4 R Details

This section covers some important details of R programming language such as
- missing values
- subsetting R objects
- explicit coercion, dealing with date and time in R, and more

Missing Values

Like other statistical software, R is capable of handling missing values.
Missing values in R appears as NA.
- It is an indicator of missingness.
- NA is not a string or a numeric value.
We can create vectors, factors, logicals, matrices, arrays, data frames, and lists with missing values.
- You can use is.na function to logically check whether the object has missing values.
- You can use complete.cases function to logically check whether the object has non-missing values.

Vectors

In vectors, is.na function checks the missingness element by element.
In character vectors, "NA" returns a character value. For missingness, NA should be used.

x <- c(1:3, NA, 5:7, NA) ## Numeric vector.
x
#> [1]  1  2  3 NA  5  6  7 NA
str(x) ## Fourth value is missing.
#>  int [1:8] 1 2 3 NA 5 6 7 NA
is.na(x) ## Checks the missing values.
#> [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
complete.cases(x) ## Checks the non-missing values.
#> [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
is.na(x) == !complete.cases(x) ## Same functions.
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
!is.na(x) == complete.cases(x) ## Same functions.
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
sum(is.na(x)) ## Gives the number of missing values.
#> [1] 2
any(is.na(x)) ## Is there a missing value?
#> [1] TRUE

y <- c("a", "b", "c", NA, "NA") ## Character vector.
y
#> [1] "a"  "b"  "c"  NA   "NA"
str(y)
#>  chr [1:5] "a" "b" "c" NA "NA"
is.na(y) ## Note that "NA" is not missing, it is a character string with values "NA".
#> [1] FALSE FALSE FALSE  TRUE FALSE

Factors

In factors, is.na function checks the missingness element by element like in vectors.
A missing value in a factor object is displayed as <NA> rather than just NA.

a <- factor(x = c("yes", NA, "no", "yes", NA)) ## Creates the factor and gives you the 'Levels'.
a
#> [1] yes  <NA> no   yes  <NA>
#> Levels: no yes
str(a)
#>  Factor w/ 2 levels "no","yes": 2 NA 1 2 NA
is.na(a)
#> [1] FALSE  TRUE FALSE FALSE  TRUE

Logicals

In logicals, is.na function checks the missingness element by element like in vectors.

a <- c(TRUE, NA, FALSE, NA)
a
#> [1]  TRUE    NA FALSE    NA
str(a)
#>  logi [1:4] TRUE NA FALSE NA
is.na(a)
#> [1] FALSE  TRUE FALSE  TRUE

Matrices

In matrices, is.na function checks the missingness element by element like in vectors.
- This procedure is performed for every row and column of the matrix.
- Therefore, the result of is.na function is a logical matrix with the same dimension attributes.

a <- rep(c(3, NA, 2), each = 2)
b <- matrix(data = a, nrow = 2, ncol = 3)
b
#>      [,1] [,2] [,3]
#> [1,]    3   NA    2
#> [2,]    3   NA    2
str(b)
#>  num [1:2, 1:3] 3 3 NA NA 2 2
is.na(b)
#>       [,1] [,2]  [,3]
#> [1,] FALSE TRUE FALSE
#> [2,] FALSE TRUE FALSE

na.check <- is.na(b)
class(na.check)
#> [1] "matrix"
str(na.check)
#>  logi [1:2, 1:3] FALSE FALSE TRUE TRUE FALSE FALSE

Arrays

In arrays, is.na function checks the missingness element by element like in vectors.
- This procedure is performed for every dimension of the array.
- Therefore, the result of is.na function is a logical array with the same dimension attributes.

a <- sample(x = c(1:3, NA), size = 12, replace = TRUE, prob = NULL) ## Sample function takes a sample of the specified size from the elements of x using either with or without replacement. Creates an integer class.
b <- array(data = a, dim = c(2, 3, 2)) ## Two-dimensional array.
b
#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    2    2    3
#> [2,]    3    3    2
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    1    2   NA
#> [2,]    1    1   NA
str(b)
#>  int [1:2, 1:3, 1:2] 2 3 2 3 3 2 1 1 2 1 ...
is.na(b)
#> , , 1
#> 
#>       [,1]  [,2]  [,3]
#> [1,] FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE
#> 
#> , , 2
#> 
#>       [,1]  [,2] [,3]
#> [1,] FALSE FALSE TRUE
#> [2,] FALSE FALSE TRUE

na.check <- is.na(b)
class(na.check)
#> [1] "array"
str(na.check)
#>  logi [1:2, 1:3, 1:2] FALSE FALSE FALSE FALSE FALSE FALSE ...

Data Frames

NA can arise when you load a data set with empty cells.
Note that data frames are very similar to matrices so the below code applies to data frames.
The details of NA values in data frames will be covered in the Data section.

x <- data.frame(c(NA, 1:3, NA), c(NA, 4, NA, 5:6), c(7:9, NA, NA), c(10:14), stringsAsFactors = FALSE)  ## Creating data frame from scratch without specific column names.
colnames(x) <- paste0("Column", ".", 1:ncol(x)) ## Assignes the column names to the data frame by using the colnames.
x

str(x)
#> 'data.frame':    5 obs. of  4 variables:
#>  $ Column.1: int  NA 1 2 3 NA
#>  $ Column.2: num  NA 4 NA 5 6
#>  $ Column.3: int  7 8 9 NA NA
#>  $ Column.4: int  10 11 12 13 14
is.na(x)
#>      Column.1 Column.2 Column.3 Column.4
#> [1,]     TRUE     TRUE    FALSE    FALSE
#> [2,]    FALSE    FALSE    FALSE    FALSE
#> [3,]    FALSE     TRUE    FALSE    FALSE
#> [4,]    FALSE    FALSE     TRUE    FALSE
#> [5,]     TRUE    FALSE     TRUE    FALSE

sum(is.na(x)) ## Gives number of the missing values
#> [1] 6
any(is.na(x)) ## Is there a missing value?
#> [1] TRUE
colSums(is.na(x)) ## Missing values by columns.
#> Column.1 Column.2 Column.3 Column.4 
#>        2        2        2        0
rowSums(is.na(x)) ## Missing values by rows.
#> [1] 2 0 1 1 2

Lists

In lists, is.na function checks the missingness element by element like in vectors.
- However, unlike for the other objects, this procedure is performed only on the highest level.
- For example, if the element of a list is exactly equals to NA, then it is considered as missing.
- For example, if the element of a list is given as “NA, NA, NA”, then it is considered as non-missing.
To check the missigness of all lower-level elements, you should use unlist function.
- This functions binds all lower-level elements of a list and returns a character vector.

a <- list(NA, c(2:4), "k", rep(NA, 2), list(rep(NA, 3)))
a
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] 2 3 4
#> 
#> [[3]]
#> [1] "k"
#> 
#> [[4]]
#> [1] NA NA
#> 
#> [[5]]
#> [[5]][[1]]
#> [1] NA NA NA
is.na(a) ## Returns only 3 logical results. This is because is.na function thinks c(2:4) and rep(NA, 2) are single elements.
#> [1]  TRUE FALSE FALSE FALSE FALSE
str(a)
#> List of 5
#>  $ : logi NA
#>  $ : int [1:3] 2 3 4
#>  $ : chr "k"
#>  $ : logi [1:2] NA NA
#>  $ :List of 1
#>   ..$ : logi [1:3] NA NA NA

b <- unlist(a)
b
#>  [1] NA  "2" "3" "4" "k" NA  NA  NA  NA  NA
str(b)
#>  chr [1:10] NA "2" "3" "4" "k" NA NA NA NA NA
is.na(b) ## Unlisting gives the correct result.
#>  [1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Miscellaneous

You will also see NA when you try certain operations that are illegal or don’t make sense.
Almost every operation performed on an NA produces an NA.
- Using arithmetic operators with NA always produces an NA.
- In some R functions, there is a way to exclude missing values in their calculations. You can use the following functions separately or as arguments in some R functions. The below code shows how they are used as arguments in functions.
  - na.rm: Remove the missing values.
  - na.fail: Stop if any missing values are encountered.
  - na.omit: Drop out any rows with missing values anywhere in them and forgets them forever.
  - na.exclude: Drop out rows with missing values, but keeps track of where they were.
  - na.pass: Take no action.
Although NA indicates a missing value, it is still considered a value by R.
- Use the length function to see that.

var(10) ## Variance of a number which returns NA.
#> [1] NA
sd(8) ## Standard deviation of a number which returns NA.
#> [1] NA
c(1, NA) == NA
#> [1] NA NA

NA + 1
#> [1] NA
NA * 2
#> [1] NA

x <- c(4, 5, NA)
x < 10
#> [1] TRUE TRUE   NA
x + 2
#> [1]  6  7 NA
x * 2
#> [1]  8 10 NA
sum(x)
#> [1] NA
sum(x, na.rm = TRUE) ## Only the missing values are removed.
#> [1] 9
mean(x)
#> [1] NA
mean(x, na.rm = TRUE)
#> [1] 4.5
var(x)
#> [1] NA
var(x, na.rm = TRUE)
#> [1] 0.5

a <- rep(c(3, NA, 2), each = 2)
b <- matrix(data = a, nrow = 2, ncol = 3)
b
#>      [,1] [,2] [,3]
#> [1,]    3   NA    2
#> [2,]    3   NA    2
sum(b, na.rm = TRUE) ## Only the missing values are removed.
#> [1] 10
sum(b, na.omit = TRUE) ## The row of the missing value is removed.
#> [1] NA

length(x)
#> [1] 3

Impossible values (e.g., 0/0, 1/0) are represented by NaN (not a number).
- NaN is a NA, but NA is not a NaN.
- You can use is.nan function to check whether a value is NaN.

x <- c(NA, NaN) ## Numeric vector.
str(x)
#>  num [1:2] NA NaN
is.na(x) ## Both NA and NaN are considered as NA.
#> [1] TRUE TRUE
is.nan(x) ## Only NA is not considered as NaN.
#> [1] FALSE  TRUE
is.nan(0/0)
#> [1] TRUE

Subsetting

Frequently, you will need to operate with some of the elements contained in a R object, but not all of them. In such cases the subsetting operators become handy.
Simplifying vs. Preserving subsetting
- Simplifying subsetting returns the simplest possible object structure that can represent the output without preserving the input object structure.
- Preserving subsetting preserves the structure of the output the same as the input.
- Simplifying and preserving subsetting differs for different R objects.
In R, you can use three different subsetting operators to select an individual element or group of elements.
- Square brackets [ ].
- Double square brackets [[ ]].
- Dollar sign $ for simplifying subsetting.
In general, to select a particular element with [ ] and [[ ]] you have to put the position number (index) of the element into square brackets.
- For example, [index], [[index]], [index.1, index.2], [index.1, index.2, index.3] and so on (depending on the dimensions of the R object).
- As clearly seen from above examples, [ ] can be used with multiple indices but it is not possible for [[ ]].
- For all dimensions of a R object, indexing starts from 1.
- Negative indices can be used to specify for elements that should not be selected.
Index names can also be used for subsetting.
- Index names can be used with any of the subsetting operators.
- When using index names with [ ] and [[ ]], you need to specify the names with quotes.
The below codes introduce the subsetting in vectors, factors, logicals, matrices, arrays, data frame, and lists.

Vectors

Since vectors have only one dimension, you should use only one index.
In general for vectors, use
- [index] for preserving subsetting.
- [[index]] for simplifying subsetting.
For simplifying subsetting
- When a vector is without names, using [index] and [[index]] does not matter.
- With named vectors; however, you need to use [[index]].
You can also subset the previously subsetted element. For instance, [index.1][index.2].
To drop the names of a R object you should utilize from unname function.

# Vector with numeric values.
a <- c(1:10)
a
#>  [1]  1  2  3  4  5  6  7  8  9 10

a[3] ## Selecting the 3rd element. Preserving subsetting with unnamed vectors.
#> [1] 3
a[[3]] ## Use it for vectors with names. Simplifying subsetting with unnamed vectors. Same result as above.
#> [1] 3

a[c(2, 3, 5)] ## Selecting the 2nd, 3rd and the 5th elements.
#> [1] 2 3 5
a[1:3]
#> [1] 1 2 3
a[10:5]
#> [1] 10  9  8  7  6  5
a[3:length(a)]
#> [1]  3  4  5  6  7  8  9 10
a[c(seq(1, 10, 2))]
#> [1] 1 3 5 7 9
a[c(2, 3, 5, 6)][c(1, 2)] ## Subsetting two times.
#> [1] 2 3
a[c(2, 3, 5, 6)][c(1, 2)][1] ## Subsetting three times..
#> [1] 2

# Negative indexing
a <- c(1:10)
a[-1]
#> [1]  2  3  4  5  6  7  8  9 10
a[-c(1:4)]
#> [1]  5  6  7  8  9 10
a[-c(1, 4)]
#> [1]  2  3  5  6  7  8  9 10

# Vector with names.
b <- c(First = 1, Second = 2, Third = 3, Fourth = 4, Fifth = 5)
b
#>  First Second  Third Fourth  Fifth 
#>      1      2      3      4      5
b[1] ## Preseving subsetting with named vectors.
#> First 
#>     1
b[[1]] ## Simplifying subsetting with named vectors. Compared to above result, they are not the same.
#> [1] 1

b["First"]
#> First 
#>     1
b[["First"]] ## Simplifying subsetting.
#> [1] 1
b[[1]] ## Same as above.
#> [1] 1
b[c("First", "Third")]
#> First Third 
#>     1     3
b[c(1, 3)] ## Same as above.
#> First Third 
#>     1     3
# b[[c("First", "Third")]] ## Error. You cannot use multiple indices with "[[ ]]".Instead use the below command.
unname(b[c("First", "Third")]) ## You can use "unname" functio to drop the names.
#> [1] 1 3
c(b[["First"]], b[["Third"]])
#> [1] 1 3
b[c(names(b)[c(3, 5)])]
#> Third Fifth 
#>     3     5
b[c(names(b)[-c(3, 5)])]
#>  First Second Fourth 
#>      1      2      4

Factors

Subsetting in factors are very similar to subsetting vectors.
While subsetting factors, you can use drop = TRUE argument to drop the unused levels.

a <- factor(x = c("yes", NA, "no", "yes", NA))  ## Creates the factor and gives you the 'Levels'.
a
#> [1] yes  <NA> no   yes  <NA>
#> Levels: no yes
a[3] ## Selecting the 3rd element. Preserving subsetting with unnamed factors.
#> [1] no
#> Levels: no yes
a[[3]] ## Use it for factors with names. Simplifying subsetting with unnamed vectors. Same result as above.
#> [1] no
#> Levels: no yes

a[c(1:3)]
#> [1] yes  <NA> no  
#> Levels: no yes
a[1, drop = TRUE] ## Drops the unused levels.
#> [1] yes
#> Levels: yes

a <- factor(x = c(First = "yes", Second = NA, Third = "no", Fourth = "yes", Fifth = NA))
a
#>  First Second  Third Fourth  Fifth 
#>    yes   <NA>     no    yes   <NA> 
#> Levels: no yes
a[1] ## Preseving subsetting with named factors.
#> First 
#>   yes 
#> Levels: no yes
a[[1]] ## Simplifying subsetting with named factors. Compared to above result, they are not the same.
#> [1] yes
#> Levels: no yes
unname(a[1]) ## Same as above.
#> [1] yes
#> Levels: no yes
a["First"] ## You can also use names.
#> First 
#>   yes 
#> Levels: no yes

a[1, drop = TRUE] ## Preserving subsetting. Drops the unused levels.
#> First 
#>   yes 
#> Levels: yes
a[[1]][ , drop = TRUE] ## Simplifying subsetting. Drops the unused levels.
#> [1] yes
#> Levels: yes

Logicals

Subsetting in logicals are very similar to subsetting vectors.
Note that named logicals are not covered here since they are rarely used.

y <- sample(x = c(TRUE, FALSE, NA), size = 5, replace = TRUE)
y
#> [1] TRUE   NA   NA   NA TRUE
y[1]
#> [1] TRUE
y[[1]]
#> [1] TRUE
y[c(2:4)]
#> [1] NA NA NA

# Named logicals are dropped since they are rarely used.

Matrices

Since matrices are two-dimensional R objects, you can use up to two indices for subsetting.
For matrices, use
- [index.1, , drop = FALSE] or [, index.1, drop = FALSE] for preserving subsetting.
- [index.1, ] or [, index.1] for simplifying subsetting.
Note that index.1 is for rows, and index.2 is for columns.
You can use [index] to subset a single element in a matrix.
- Note that element index starts with the [1, 1] and continues as [2, 1], [nrow, 1], [1, 2] … [nrow, ncol].
You can also utilize from unname function to drop the names.

a <- matrix(data = 1:9, nrow = 3, ncol = 3, dimnames = list(c("row1", "row2", "row3"), c("col1", "col2", "col3")))
a
#>      col1 col2 col3
#> row1    1    4    7
#> row2    2    5    8
#> row3    3    6    9

a[2] ## Gives the second element in a matrix. Note that to index of elements are by columns. Second element is on the first column second row.
#> [1] 2
a[[2]] ## Same result.
#> [1] 2

a[1, 1] ## Simplifying subsetting. First row and first column.
#> [1] 1
a[1, 1, drop = FALSE] ## Preserving subsetting.
#>      col1
#> row1    1
a[1:2, 1:2]
#>      col1 col2
#> row1    1    4
#> row2    2    5
a["row1", "col1"] ## You can also use names with quotes.
#> [1] 1

a[ , 1] ## Simplifying subsetting. First column.
#> row1 row2 row3 
#>    1    2    3
a[ , 1, drop = FALSE] ## Preserving subsetting.
#>      col1
#> row1    1
#> row2    2
#> row3    3
unname(a[, 1])
#> [1] 1 2 3
a[, 2:3]
#>      col2 col3
#> row1    4    7
#> row2    5    8
#> row3    6    9

a[1, ] ## Simplifying subsetting. First row.
#> col1 col2 col3 
#>    1    4    7
a[1, , drop = FALSE] ## Preserving subsetting.
#>      col1 col2 col3
#> row1    1    4    7
unname(a[1, ])
#> [1] 1 4 7
a[2:3, ]
#>      col1 col2 col3
#> row2    2    5    8
#> row3    3    6    9

a[-1, ] ## Negative subsetting
#>      col1 col2 col3
#> row2    2    5    8
#> row3    3    6    9
a[, -1]
#>      col2 col3
#> row1    4    7
#> row2    5    8
#> row3    6    9
a[-c(2:3), -1]
#> col2 col3 
#>    4    7

Arrays

Subsetting in arrays are very similar to subsetting matrices.
In arrays, you might have more than two indices to subset.
- Note that index.1 is for rows, and index.2 is for columns.
- index.3 to index.K indicates the other dimensions.

a <- array(data = 1:12, dim = c(3, 2, 2)) ## 3-dimensional array.
a
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11
#> [3,]    9   12

a[12] ## Gives the 12th element.
#> [1] 12
a[[12]] ## Same result.
#> [1] 12

# a[1, 1] ## Incorrect number of dimensions
a[1, 1, 1] ## Simplifying subsetting.
#> [1] 1
a[1, 1, 1, drop = FALSE] ## Preserving subsetting.
#> , , 1
#> 
#>      [,1]
#> [1,]    1
a[1:2, 1:2, 1:2]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11

a[1, , ] ## Simplifying subsetting.
#>      [,1] [,2]
#> [1,]    1    7
#> [2,]    4   10
a[1, , , drop = FALSE] ## Preserving subsetting.
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
a[1:2, , ]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11

a[, 1, ] ## Simplifying subsetting.
#>      [,1] [,2]
#> [1,]    1    7
#> [2,]    2    8
#> [3,]    3    9
a[, 1, , drop = FALSE] ## Preserving subsetting.
#> , , 1
#> 
#>      [,1]
#> [1,]    1
#> [2,]    2
#> [3,]    3
#> 
#> , , 2
#> 
#>      [,1]
#> [1,]    7
#> [2,]    8
#> [3,]    9
a[, 1:2, ]
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11
#> [3,]    9   12

a[ , , 1] ## Simplifying subsetting.
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
a[ , , 1, drop = FALSE] ## Preserving subsetting.
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
a[, , 1]
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6

a[-1, , ] ## Negative subsetting
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    2    5
#> [2,]    3    6
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    8   11
#> [2,]    9   12
a[, -1, ]
#>      [,1] [,2]
#> [1,]    4   10
#> [2,]    5   11
#> [3,]    6   12
a[-c(2:3), , -1]
#> [1]  7 10

Data Frames

Since data frames are two-dimensional R objects, you can use up to two indices for subsetting.
For subsetting columns in data frames, use
- [, index.2, drop = FALSE] or [index.2] for preserving subsetting.
- [, index.2], [[index.2]] and $ with names for simplifying subsetting.
The $ subsetting operator is generally used to select a dimension of a R objects with its name, with or without quotes.
You can use [index.1, index.2] for subsetting rows and columns simultaneously.
- Note that index.1 is for rows, and index.2 is for columns.
You can also utilize from unname function to drop the names.

x <- data.frame(c(1:5), c(6:10), c(11:15), c(16:20), stringsAsFactors = FALSE)  ## Creating data frame from scratch without specific column names.
colnames(x) <- paste0("Column", ".", 1:ncol(x)) ## Assignes the column names to the data frame by using the colnames.
x


# Subsetting columns.
x[, 1] ## Simplifying subsetting for columns. Column 1 values only.
#> [1] 1 2 3 4 5
x[[1]] ## Same as above. Note that it subsets the columns only.
#> [1] 1 2 3 4 5
x[["Column.1"]]
#> [1] 1 2 3 4 5
x[, "Column.1"]
#> [1] 1 2 3 4 5
x["Column.1"]

x$Column.1 ## Same as above. 
#> [1] 1 2 3 4 5
x$"Column.1" ## Same as above.
#> [1] 1 2 3 4 5
x[2:4, 1]
#> [1] 2 3 4

x[, 1, drop = FALSE] ## Preserving subsetting for columns. Column 1 values only.

x[1] ## Note that it subsets the columns only.

x[2:4, 1, drop = FALSE]


# Subsetting rows.
x[1, ] ## Structure of the data is preserved

unname(as.matrix(x[1, ])[1, ]) ## This is the simplified subsetting for rows. Note that as.matrix function coerce the subsetted data frame into matrix. We will see the details of coercion later.
#> [1]  1  6 11 16

# Subsetting row and columns.
x[1, 1]
#> [1] 1
x[2:4, 1]
#> [1] 2 3 4
x[c(1, 3), c(2, 4)]


# Negative subsetting
x[-1, -1] ## First row and first column is deleted.

x[-c(2:4), ] ## Row 2, 3, 4 are deleted.

Lists

Subsetting in factors are very similar to subsetting vectors.
You can also use $ for simplifying subsetting with named lists.

a <- list(c(1:5), c("a", "b"), c(TRUE, FALSE), list(c(6:10), c("c, d")))
a
#> [[1]]
#> [1] 1 2 3 4 5
#> 
#> [[2]]
#> [1] "a" "b"
#> 
#> [[3]]
#> [1]  TRUE FALSE
#> 
#> [[4]]
#> [[4]][[1]]
#> [1]  6  7  8  9 10
#> 
#> [[4]][[2]]
#> [1] "c, d"

a[[1]] ## Simplifying subsetting.
#> [1] 1 2 3 4 5
a[1] ## Preserving subsetting. Note that the result is still a list.
#> [[1]]
#> [1] 1 2 3 4 5

a[[2]][1] ## [[]] helps us to get in the second element in the list. [] helps us the subset the second element in the list.
#> [1] "a"

a[[4]][[1]]
#> [1]  6  7  8  9 10
a[[4]][[1]][2]
#> [1] 7

a <- list(Numeric = c(1:3), Character = c("a", "b"), Logical = c(TRUE, FALSE))
a
#> $Numeric
#> [1] 1 2 3
#> 
#> $Character
#> [1] "a" "b"
#> 
#> $Logical
#> [1]  TRUE FALSE
a[["Numeric"]]
#> [1] 1 2 3
a$Numeric
#> [1] 1 2 3
a$"Numeric"
#> [1] 1 2 3
a$Numeric[2]
#> [1] 2

Advanced Subsetting

The section only covers some of the advanced subsetting techniques in vectors, matrices and data frames.
I think subsetting is very important for these R objects, and you need to do a lot of exercise to be experienced in it.
For more details of subsetting in R, please see Subsetting section of Advanced R by Hadley Wickham.

Conditional Subsetting

In R, there is one other method for subsetting which is called conditional subsetting.
In this method
- First, you need to use arithmetic operators to create a logical vector with the same length.
- Then, using this logical vector, you can subset any R object with your condition.
- TRUE and FALSE values in the logical vector determine which element should be selected.

# Vectors
x <- c(1:10)
x
#>  [1]  1  2  3  4  5  6  7  8  9 10
a <- x > 4 ## This is our condition.
x[a]
#> [1]  5  6  7  8  9 10
x[!a]
#> [1] 1 2 3 4
x[x < 2 | x > 8]
#> [1]  1  9 10

# Matrices
a <- matrix(data = 1:9, nrow = 3, ncol = 3, dimnames = list(c("row1", "row2", "row3"), c("col1", "col2", "col3")))
a
#>      col1 col2 col3
#> row1    1    4    7
#> row2    2    5    8
#> row3    3    6    9
a > 4 ## Condition.
#>       col1  col2 col3
#> row1 FALSE FALSE TRUE
#> row2 FALSE  TRUE TRUE
#> row3 FALSE  TRUE TRUE
a[a > 4] ## Condition is applied to all matrix elements.
#> [1] 5 6 7 8 9
b <- unname(a[, 2, drop = TRUE]) ## Second column.
b
#> [1] 4 5 6
b[b > 4] ## Condition on the second column.
#> [1] 5 6

# Data frames
x <- data.frame(c(1:5), c(6:10), c(11:15), c(16:20), stringsAsFactors = FALSE)  ## Creating data frame from scratch without specific column names.
colnames(x) <- paste0("Column", ".", 1:ncol(x)) ## Assignes the column names to the data frame by using the colnames.
x

x[x > 4] ## Condition is applied to all data frame elements.
#>  [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
x[x$Column.1 > 2, ]

x[x$Column.1 > 2 & x$Column.3 > 13, ]

x[x$Column.1 > 2 & x$Column.3 > 13, ]$Column.4
#> [1] 19 20
x[x$Column.1 > 2 & x$Column.3 > 13, c("Column.2", "Column.3")]

Missing Values Problem

Any arithmetic operation applied on NA value(s) create logical vector with NA value(s).
Logical vector with NA value(s) creates unexpected results while performing conditional subsetting.
Therefore, you need to be careful when dealing with NA values.

# Vectors
x <- sample(x = c(1:10, rep(NA, 5)), size = 10, replace = TRUE, prob = NULL)
x
#>  [1]  7  6  8 NA 10  2 10 NA  9 NA
a <- x > 4 ## When there is NA the condition produces NA.
a
#>  [1]  TRUE  TRUE  TRUE    NA  TRUE FALSE  TRUE    NA  TRUE    NA
b <- x[!is.na(x)] ## NA values are excluded.
b
#> [1]  7  6  8 10  2 10  9
b[b > 4] ## Values larger than 4.
#> [1]  7  6  8 10 10  9
x[which(x > 4)] ## You can use this directly. which functions omits the NA value automatically.
#> [1]  7  6  8 10 10  9

# Matrices
x <- sample(x = c(1:10, rep(NA, 5)), size = 9, replace = TRUE, prob = NULL)
a <- matrix(data = x, nrow = 3, ncol = 3, dimnames = list(c("row1", "row2", "row3"), c("col1", "col2", "col3")))
a
#>      col1 col2 col3
#> row1   NA   10    9
#> row2   NA    9   10
#> row3    6   NA    4
a > 4 ## When there is NA the condition produces NA.
#>      col1 col2  col3
#> row1   NA TRUE  TRUE
#> row2   NA TRUE  TRUE
#> row3 TRUE   NA FALSE
a[which(a > 4)] ## Condition is applied to all matrix elements.
#> [1]  6 10  9  9 10
b <- unname(a[, 2, drop = TRUE]) ## Second column.
b
#> [1] 10  9 NA
b[which(b > 4)] ## Condition on the second column.
#> [1] 10  9

# Data frames
x <- data.frame(c(NA, 1:3, NA), c(NA, 4, 10, 5:6), c(7:9, NA, NA), c(10:14), stringsAsFactors = FALSE)  ## Creating data frame from scratch without specific column names.
colnames(x) <- paste0("Column", ".", 1:ncol(x)) ## Assignes the column names to the data frame by using the colnames.
x

x > 4 ## When there is NA the condition produces NA.
#>      Column.1 Column.2 Column.3 Column.4
#> [1,]       NA       NA     TRUE     TRUE
#> [2,]    FALSE    FALSE     TRUE     TRUE
#> [3,]    FALSE     TRUE     TRUE     TRUE
#> [4,]    FALSE     TRUE       NA     TRUE
#> [5,]       NA     TRUE       NA     TRUE
is.na(x)
#>      Column.1 Column.2 Column.3 Column.4
#> [1,]     TRUE     TRUE    FALSE    FALSE
#> [2,]    FALSE    FALSE    FALSE    FALSE
#> [3,]    FALSE    FALSE    FALSE    FALSE
#> [4,]    FALSE    FALSE     TRUE    FALSE
#> [5,]     TRUE    FALSE     TRUE    FALSE
complete.cases(x) ## Gives the row with all non-NA values.
#> [1] FALSE  TRUE  TRUE FALSE FALSE
a <- x[complete.cases(x), ] ## Rows with non missing elements.
a ## Non-NA data frame.

a[a$Column.1 > 1, ] ## Apply the condition.

a[a$Column.1 > 1, c("Column.2", "Column.4")]


x[which(x$Column.1 > 1), ]

x[which(x$Column.1 > 1 & x$Column.3 > 7), ]

x[which(x$Column.1 > 2 & x$Column.4 > 11), ]$Column.4
#> [1] 13
x[which(x$Column.1 > 1 & x$Column.3 > 8), c("Column.2", "Column.3")]

Assignment by Subsetting

This section covers how to assign values to subsetted parts of R objects.

# Vectors
x <- c(1:10)
x
#>  [1]  1  2  3  4  5  6  7  8  9 10
x[x > 4] <- NA
x
#>  [1]  1  2  3  4 NA NA NA NA NA NA
x[is.na(x)] <- 0

# Matrices
a <- matrix(data = 1:9, nrow = 3, ncol = 3, dimnames = list(c("row1", "row2", "row3"), c("col1", "col2", "col3")))
a
#>      col1 col2 col3
#> row1    1    4    7
#> row2    2    5    8
#> row3    3    6    9
a[1, c(1:3)] <- NA
a
#>      col1 col2 col3
#> row1   NA   NA   NA
#> row2    2    5    8
#> row3    3    6    9

# Data frames
x <- data.frame(c(1:5), c(6:10), c(11:15), c(16:20), stringsAsFactors = FALSE)  ## Creating data frame from scratch without specific column names.
colnames(x) <- paste0("Column", ".", 1:ncol(x)) ## Assignes the column names to the data frame by using the colnames.
x

x[3, 4] <- 10000
x

x[x$Column.1 > 2, ] <- NA
x

Miscellaneous

This section presents some other important topics and concepts in R programming language.

Coercion

In R, you can convert the class of some objects into other classes by explicit coercion.
For explicit coercion, you can use the following functions.

x <- c(0:6)
class(x) ## The class of x is integer.
#> [1] "integer"

as.numeric(x) ## Coerces x as a numeric.
#> [1] 0 1 2 3 4 5 6
as.character(x) ## Coerces x as a character.
#> [1] "0" "1" "2" "3" "4" "5" "6"
as.complex(x) ## Coerces x as a complex.
#> [1] 0+0i 1+0i 2+0i 3+0i 4+0i 5+0i 6+0i
as.factor(x) ## Coerces x as a factor.
#> [1] 0 1 2 3 4 5 6
#> Levels: 0 1 2 3 4 5 6
as.logical(x) ## Coerces x as a logical (0 is FALSE and everything greater than 0 is TRUE).
#> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
as.matrix(x) ## Coerces x as a matrix.
#>      [,1]
#> [1,]    0
#> [2,]    1
#> [3,]    2
#> [4,]    3
#> [5,]    4
#> [6,]    5
#> [7,]    6
as.array(x) ## Coerces x as an arrray.
#> [1] 0 1 2 3 4 5 6
as.data.frame(x) ## Coerces x as a data frame.

as.list(x) ## Coerces x as a list.
#> [[1]]
#> [1] 0
#> 
#> [[2]]
#> [1] 1
#> 
#> [[3]]
#> [1] 2
#> 
#> [[4]]
#> [1] 3
#> 
#> [[5]]
#> [1] 4
#> 
#> [[6]]
#> [1] 5
#> 
#> [[7]]
#> [1] 6

Dates and Times

R has developed a special representation of dates and times.
- Dates are represented by Date class.
- Dates are stored internally as the number of days since 1970-01-01.
- Times are represented by POSIXct or POSIXlt class.
- Times are stored internally as the number of seconds since 1970-01-01.

d1 <- base::date() ## Current date and time.
d1
#> [1] "Tue Feb  6 02:35:10 2018"
class(d1) ## Class is character
#> [1] "character"

d2 <- Sys.Date() ## System date.
d2
#> [1] "2018-02-06"
class(d2) ## Class is "Date".
#> [1] "Date"

d3 <- Sys.time() ## System time.
d3
#> [1] "2018-02-06 02:35:10.170 EST"
str(x) ## Class is "POSIXct"
#>  int [1:7] 0 1 2 3 4 5 6

In R, times are represented using the POSIXct or the POSIXlt class.
- POSIXct is just a very large integer under the hood; it use a useful class when you want to store times in something like a data frame. # POSIXlt is a list underneath and it stores a bunch of other useful information like the day of the week, day of the year, month and day of the month.
Times can be coerced from a character string using the as.POSIXlt or as.POSIXct functions.

x <- Sys.time()
x
#> [1] "2018-02-06 02:35:10.187 EST"
str(x) ## Class is "POSIXct"
#>  POSIXct[1:1], format: "2018-02-06 02:35:10.187"

a <- as.POSIXlt(x) ## Coerced to as.POSIXlt.
str(a) ## Class is "POSIXltt"
#>  POSIXlt[1:1], format: "2018-02-06 02:35:10.187"
names(unclass(a)) ## Gives the names, after it is unclassed.
#>  [1] "sec"    "min"    "hour"   "mday"   "mon"    "year"   "wday"  
#>  [8] "yday"   "isdst"  "zone"   "gmtoff"
a$sec
#> [1] 10.187761
a$yday
#> [1] 36
a$mday
#> [1] 6

as.POSIXct(a) ## Coerced to as.POSIXct.
#> [1] "2018-02-06 02:35:10.187 EST"

Dates can coerced from a character string using the as.Date function.
Using the unclass function Date class objects can be coerced to numeric objects which indicates the number of the date since the first date (1970-01-01).
You can also coerce numeric values to Date class.

x <- as.Date("1970-01-01")
str(x) ## Class is "Date".
#>  Date[1:1], format: "1970-01-01"

unclass(x) ## Note that the starting date is 1970-01-01 for date class.
#> [1] 0
unclass(as.Date("1970-01-02"))
#> [1] 1
unclass(as.Date("1969-12-31")) ## Dates before the starting date are represented by negative numbers.
#> [1] -1
unclass(as.Date(Sys.Date())) ## Number of days since the first date.
#> [1] 17568

as.Date(0) ## The first date.
#> [1] "1970-01-01"
as.Date(c(-2:2))
#> [1] "1969-12-30" "1969-12-31" "1970-01-01" "1970-01-02" "1970-01-03"

In R, there are a number of generic functions that work on dates and times.
You can also re-format the default date representation by using format, strptime, and strftime functions.
In re-formatting the date representation you need to use conversion specifications. The commonly used one are
- %d: day as number (0-31).
- %a: abbreviated weekday.
- %A: unabbreviated weekday.
- %m: month (00-12).
- %b: abbreviated month.
- %B: unabbrevidated month.
- %y: 2 digit year.
- %Y: four digit year.
- %H: hours as decimal number (00–23)
- %M: minute as decimal number (00-59).
- %S: second as integer (00-61).
- %T: equivalent to %H:%M:%S.

x <- Sys.Date() ## System date.

weekdays(x, abbreviate = FALSE) ## The weekday.
#> [1] "Tuesday"
months(d2, abbreviate = FALSE) ## The month.
#> [1] "February"
julian(x) ## Gives the number of the days since the origin and the origin is given in the result.
#> [1] 17568
#> attr(,"origin")
#> [1] "1970-01-01"

format(x, "%a %b %d") ## Formats the date class object with the desired representation.
#> [1] "Tue Feb 06"
strftime(x, origin = "1970-01-01", tz = "UTC", format = "%B %d, %Y %H:%M") ## Similar to above.
#> [1] "February 06, 2018 00:00"
strftime(x, origin = "1970-01-01", tz = "UTC", format = "%A %B %Y") ## Similar to above.
#> [1] "Tuesday February 2018"
format(as.POSIXct("Feb 03, 2017 09:12 PM", tz = "UTC", format = "%b %d, %Y %I:%M %p"), "%Y%m%d%H%M")
#> [1] "201702032112"

date.string <- c("January 10, 2012 10:40", "December 9, 2011 09:10:00") ## A date string with a specific representation.
a <- strptime(date.string, format = "%B %d, %Y %H:%M") ## Note that the format needs match your date string format.
a
#> [1] "2012-01-10 10:40:00 EST" "2011-12-09 09:10:00 EST"
str(a)
#>  POSIXlt[1:2], format: "2012-01-10 10:40:00" "2011-12-09 09:10:00"
as.Date(a)
#> [1] "2012-01-10" "2011-12-09"
# ?strptime ## Check the arg of strptime function.

I think R is very powerful in date and time operations.
For example take the following case:
- How can we find the last Tuesday of every month in a given year?
- How can we find the last Saturdays of every month in a given year?

year <- "2017"
d <- as.Date(paste0(year, "-01-01"))
tuesdays <- d + seq(by = 7, (2 - as.POSIXlt(d)$wday) %% 7, 364 + (months(d + 30 + 29) == "February")) ## Note that the day number starts on sunday with 0 and ends on saturday with 6.
tuesdays[tapply(seq_along(tuesdays), as.POSIXlt(tuesdays)$mon, max)]
#>  [1] "2017-01-31" "2017-02-28" "2017-03-28" "2017-04-25" "2017-05-30"
#>  [6] "2017-06-27" "2017-07-25" "2017-08-29" "2017-09-26" "2017-10-31"
#> [11] "2017-11-28" "2017-12-26"

d <- as.Date(paste0(year, "-01-01"))
saturdays <- d + seq(by = 7, (6 - as.POSIXlt(d)$wday) %% 7, 364 + (months(d + 30 + 29) == "February"))
saturdays[tapply(seq_along(saturdays), as.POSIXlt(saturdays)$mon, max)]
#>  [1] "2017-01-28" "2017-02-25" "2017-03-25" "2017-04-29" "2017-05-27"
#>  [6] "2017-06-24" "2017-07-29" "2017-08-26" "2017-09-30" "2017-10-28"
#> [11] "2017-11-25" "2017-12-30"

5 R Functions

This sections will introduce you
- how to use built-in R functions.
- how to write your own R function.
For more details of R function, please see Function and Functional Programming sections of Advanced R by Hadley Wickham.

Built-in R Functions

Everything performed in R is done by functions.
R functions
- are created using the function command.
- are stored as R objects just like anything else, they are R objects of class function.
- are just like mathematical functions.
- have some inputs called arguments, and an output called return value.
- can return only one object.
- can be passed as arguments to other functions.
- can be nested, so that you can define a function inside of another function.

Function Call

The below code shows how functions are called in R.

log(10) ## Takes the natural logarithm of the input.
#> [1] 2.3025851
is.function(log) ## Checks whether the object is a function.
#> [1] TRUE
log(exp(1)) ## Note the exponential function written as "exp().
#> [1] 1
c(1, 2, 3, 4) ## Concatenate function which created a vector.
#> [1] 1 2 3 4

Function Parts

If you type a function’s name in the console and hit enter, you can see its source code.
There are three important parts of a function
- The formals: the list of arguments which controls how you can call the function, which is shown in function(x, y).
- The body: the code inside the function, which is shown between the curly braces {}.
- The environment: the map of the location of the function’s variables.
When you print a function in R, it shows you these three important components. If the environment is not displayed, it means that the function was created in the global environment.
The below example shows you how to find the source code of a function in the editor.

str(paste)
#> function (..., sep = " ", collapse = NULL)
formals(paste) ## Prints the arguments of a function.
#> $...
#> 
#> 
#> $sep
#> [1] " "
#> 
#> $collapse
#> NULL
body(paste) ## Prints the body of a function.
#> .Internal(paste(list(...), sep, collapse))
environment(paste) ## Prints the environment of a function.
#> <environment: namespace:base>

getMethod("log") ## Shows the source code of one of the functions with the same name in the global environment. If this function does not work, try the below functions.
#> function (x, base = exp(1))  .Primitive("log")
str(log) ## str function gives the structure of the function with its arguments. I rarely use this function for functions but it might be usefull to reveal the full structure of a function.
#> function (x, base = exp(1))

getAnywhere(log) ## Shows the information of matching function names in all pacakges.
#> A single object matching 'log' was found
#> It was found in the following places
#>   package:base
#>   namespace:base
#> with value
#> 
#> function (x, base = exp(1))  .Primitive("log")
getAnywhere(log)[1] ## Selecting the first function.
#> function (x, base = exp(1))  .Primitive("log")

getAnywhere(paste) ## Since there is only one function with the mathing name, functions source code is revealed immediately.
#> A single object matching 'paste' was found
#> It was found in the following places
#>   package:base
#>   namespace:base
#> with value
#> 
#> function (..., sep = " ", collapse = NULL) 
#> .Internal(paste(list(...), sep, collapse))
#> <bytecode: 0x1049ad070>
#> <environment: namespace:base>

getAnywhere(Head.Tail) ## This is a user-written R function. Note the curly braces which represents the body of the function.
#> A single object matching 'Head.Tail' was found
#> It was found in the following places
#>   .GlobalEnv
#> with value
#> 
#> function(x, Select) {
#>     if (Select %% 1 != 0)
#>         stop("Invalid Select. Please choose a whole number as Select.\n")
#> 
#>     rbind(head(x, Select), tail(x, Select))
#> }
formals(Head.Tail)
#> $x
#> 
#> 
#> $Select
body(Head.Tail)
#> {
#>     if (Select%%1 != 0) 
#>         stop("Invalid Select. Please choose a whole number as Select.\n")
#>     rbind(head(x, Select), tail(x, Select))
#> }
environment(Head.Tail)
#> <environment: R_GlobalEnv>

# edit(log) ## Use "edit" function to open the source code of a function in a small editor window in RStudio.

Function Arguments

When a R function is called, it takes the information in the arguments, applies the code in the body, and then returns the final expression (i.e., return value) in the function.
Most of the R functions have named arguments.
- The formal arguments are the arguments included in the function definition.
- The args and formals functions return all the formal arguments of a function with its usage.
- Not every function call in R makes use of all the formal arguments.
- In some R functions, arguments can be missing or might have default values.

args(c) ## No argument exists.
#> NULL
args(log) ## Has one argument.
#> function (x, base = exp(1)) 
#> NULL
args(setdiff) ## Has two arguments.
#> function (x, y) 
#> NULL
args("+") ## Has two arguments.
#> function (e1, e2) 
#> NULL
args(mean) ## At least one argument. "..." means that some other arguments can be passed to other functions.
#> function (x, ...) 
#> NULL
formals(mean)
#> $x
#> 
#> 
#> $...
args(nb2mat) ## Has 4 arguments and the last three have default values.
#> function (neighbours, glist = NULL, style = "W", zero.policy = NULL) 
#> NULL
formals(nb2mat)
#> $neighbours
#> 
#> 
#> $glist
#> NULL
#> 
#> $style
#> [1] "W"
#> 
#> $zero.policy
#> NULL

Argument Matching

R function arguments can be matched by argument order or by argument name.
- If the function is called only by argument names, then the order is not important.
- If the function is not called by argument names, you have to give the inputs by argument order.
- If the function is called with some of the argument names but not all, then it will use the ordering to deduce the others.
I recommend using the argument names if possible, at least for large functions.
Also, I don’t recommend mixing the order of the arguments too much, since it might lead to some errors.

a <- c(1, 2, 3, 4) ## Vector 1.
b <- c(1, 2, 5, 9) ## Vector 2.

# setdiff: Everything in "x" and not in "y".
getAnywhere(setdiff)[3]
#> standardGeneric for "setdiff" defined from package "base"
#> 
#> function (x, y) 
#> standardGeneric("setdiff")
#> <environment: 0x10e61d678>
#> Methods may be defined for arguments: x, y
#> Use  showMethods("setdiff")  for currently available ones.
str(setdiff)
#> Formal class 'standardGeneric' [package "methods"] with 8 slots
#>   ..@ .Data     :function (x, y)  
#>   ..@ generic   : atomic [1:1] setdiff
#>   .. ..- attr(*, "package")= chr "base"
#>   ..@ package   : chr "base"
#>   ..@ group     : list()
#>   ..@ valueClass: chr(0) 
#>   ..@ signature : chr [1:2] "x" "y"
#>   ..@ default   :Formal class 'derivedDefaultMethod' [package "methods"] with 4 slots
#>   .. .. ..@ .Data  :function (x, y)  
#>   .. .. ..@ target :Formal class 'signature' [package "methods"] with 3 slots
#>   .. .. .. .. ..@ .Data  : chr "ANY"
#>   .. .. .. .. ..@ names  : chr "x"
#>   .. .. .. .. ..@ package: chr "methods"
#>   .. .. ..@ defined:Formal class 'signature' [package "methods"] with 3 slots
#>   .. .. .. .. ..@ .Data  : chr "ANY"
#>   .. .. .. .. ..@ names  : chr "x"
#>   .. .. .. .. ..@ package: chr "methods"
#>   .. .. ..@ generic: atomic [1:1] setdiff
#>   .. .. .. ..- attr(*, "package")= chr "base"
#>   ..@ skeleton  : language (structure(function (x, y)  { ...

setdiff(x = a, y = b) ## With arguments.
#> [1] 3 4
setdiff(y = b, x = a) ## With arguments.
#> [1] 3 4
setdiff(a, b) ## Without arguments.
#> [1] 3 4
setdiff(b, x = a) ## With some arguments.
#> [1] 3 4

setdiff(b, a)
#> [1] 5 9

Creating R Functions

Sometimes you may want to
- perform certain operations that are not available as a built-in R functions.
- execute the same script many times changing some of its values.
- shorten your coding length.
- automate some operations depending on selected values.
In such cases it is convenient to write your own function.

Syntax

The general syntax for declaring functions is shown below.
- The user defined functions are declared using the function command.
- The function arguments should be separated by commas.
- The function commands should be grouped using curly braces {} unless there is only one line of coding.

# R function syntax.
function.syntax <- function(arguments) {
    ## Do something interesting
}
str(function.syntax)
#> function (arguments)  
#>  - attr(*, "srcref")=Class 'srcref'  atomic [1:8] 2 20 4 1 20 1 2 4
#>   .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x121ba53b8>

function.syntax(arguments) ## Calling the function.
#> NULL

My R Functions

Let’s create our first R function with only one argument.
Remember, the last expression will be return as the output of the function.
- You can also use return function to return the value you want to be printed.

# Function with no arguemnts.
myfunction <- function() {
    x <- rnorm(100)
    mean(x) ## The last expression will be returned.
}
myfunction()
#> [1] 0.047333324
formals(myfunction)
#> NULL
body(myfunction)
#> {
#>     x <- rnorm(100)
#>     mean(x)
#> }
environment(myfunction)
#> <environment: R_GlobalEnv>

# Function with one argument.
my.cube.func <- function(x) {
    x^3
}
my.cube.func(2)
#> [1] 8
formals(my.cube.func)
#> $x
body(my.cube.func)
#> {
#>     x^3
#> }
environment(my.cube.func)
#> <environment: R_GlobalEnv>

# Function with one argument and return function.
my.variance <- function(x) { ## Sample variance.
    a <- (sum(x^2) - length(x) * mean(x)^2) / (length(x) - 1)
    return(a) ## You can also use return function to return a specific value.
}
my.variance(c(1:5))
#> [1] 2.5
var(c(1:5)) ## Built-in R function gives the same result.
#> [1] 2.5

Note that R has separate namespaces for functions and non-functions. Thus, it’s possible to have an object named c and a function named c. However, I do not recommend that.

# R code chunk is not evaluated.

# Naming a user created R function with a known built-in R function name.
c <- function(x) {
    a <- x^x
    return(a)
}
c(4)
class(c)
rm(c) ## The created c function is removed.

Default Values

Let’s create a R function with some default values.
- For some arguments, you can assign default values.
- Default values can be from any R class.
In addition to specifying a default value, you can also set an argument value to NULL.
- A NULL value in R is mainly used to represent a list with zero length.
- A NULL value is not a missing value.

# Function with default values.
sum.of.squares <- function(x, About = mean(x)) { ## Sum of squares.
    x <- x[!is.na(x)]
    a <- sum((x - About)^2)
    a <- return(a)
}
sum.of.squares(x = c(-2:2))
#> [1] 10
sum.of.squares(x = c(-2:2), About = 0)
#> [1] 10
sum.of.squares(x = c(-2:2), About = 1)
#> [1] 15

# Function with default values including NULL value.
my.function <- function(x, Power = 2, Addition = 10, Remove.NA = NULL) { ## Just a function.
    a <- sum(x + Addition, na.rm = Remove.NA)^Power
    a <- return(a)
}
my.function(x = c(1:5), Power = 3, Addition = 1, Remove.NA = TRUE)
#> [1] 8000
my.function(x = c(1:5, NA), Power = 3, Addition = 1, Remove.NA = FALSE)
#> [1] NA
my.function(x = c(1:5, NA), Power = 3, Addition = 1, Remove.NA = NULL)
#> [1] 8000

Nested Functions 1

Let’s see how we can use a function inside of another function.

# First function.
my.square <- function(x) {
    x <- x[!is.na(x)] ## NA values are subsetted.
    a <- x^2
    return(a)
}

# Second function.
my.cube <- function(x) {
    x <- x[!is.na(x)] ## NA values are subsetted.
    a <- x^3
    return(a)
}

# The main function which nests the first and second functions.
sum.square.cube <- function(x) {
    a <- sum(my.square(x))
    b <- sum(my.cube(x))
    c <- a + b
    return(c)
}
sum.square.cube(1)
#> [1] 2
sum.square.cube(2)
#> [1] 12
sum.square.cube(c(1:5))
#> [1] 280
sum.square.cube(c(1:5, rep(NA, 3)))
#> [1] 280

Nested Functions 2

Now, let’s see how we can create a function inside of a function.
You can use ls, environment, get functions to list the object and their values inside of a function.

# Function which creates functions as output.
make.power <- function(power) {
    power.func <- function(base) {
        return(base^power)
        }
    return(power.func)
}
make.power(2)
#> function(base) {
#>         return(base^power)
#>         }
#> <environment: 0x1270ca8e8>

# New function 1.
square.func <- make.power(2) ## Created a square function.
square.func(3) ## Takes the square of 2.
#> [1] 9

## New function 2.
cube.func <- make.power(3) ## Created a cube function.
cube.func(3) ## Takes the cube of 3.
#> [1] 27

# What's in a function's environment?
ls(environment(cube.func)) ## Gives the defined object in cube.func.
#> [1] "power"      "power.func"
get("power", environment(cube.func)) ## Gives the value of "power" in cube.func.
#> [1] 3

Scoping

The scoping rules of a language describe how the values of variables are determined.
R has two types of scoping: lexical scoping, implemented automatically at the language level, and dynamic scoping, used in select functions to save typing during interactive analysis.
This section introduces only the lexical scoping since it is very important in function creation.
- Lexical scoping looks up symbol values based on how functions were nested when they were created, not how they are nested when they are called.
- You just need to look at the definition of the function.
- Generally, if a function func.1 is defined within a function func.2, the variables in func.2 are visible in func.1.
Let’s see how lexical scoping works two different cases.

Case Study 1

Let’s see how lexical scoping works in the Case 1.
- func.1 and func.2 are created in the global environment.

# Case Study 1
y <- 10 ## A variable defined in global environment not inside of a function.

func.1 <- function(x) { ## Function 1.
    x*y
}

func.2 <- function(x) { ## Function 2.
    y <- 2 ## A variable defined in the environment of func.2 function.
    y^2 + func.1(x)
}

## Which y values does func.1 and func.2 use?
func.2(3) ## Check the result.
#> [1] 34
### With lexical scoping the value of y in the function func.1 is looked up in the environment in which the function is created, in this case the global environment, so the value of y is 10.
### For func.2() function, y value is 2 which is defined while the func.2 is created.

Case Study 2

Let’s see how lexical scoping works in the Case 2.
- func.2 is created in the global environment.
- func.1 is created inside of the func.2, which means it is created in func.2’s environment.

# Case Study 2
y <- 10 ## A variable defined in global environment not inside of a function.

func.2 <- function(x) { ## Function 2.
    y <- 2 ## A variable defined in the environment of func.2 function.

    func.1 <- function(x) { ## Function 1.
        x*y
    }
    
    y^2 + func.1(x)
}

## Which y values does func.1 and func.2 use?
func.2(3) ## Check the result.
#> [1] 10
### This time since func.1 is created inside the func.2, the func.1 will use 2 as the y value.
### Func.2 also uses value 2 for y.

Important R Functions

6 Control Structures

Control structures in R
- contain conditionals and loop statements like any other programming languages.
- allow you to control the execution flow of an R program.
- are generally used while writing functions, longer expressions and codes for repetitive executions.
Common control structures in R are
- if, else if and else: Executes the code if a condition is TRUE.
- for: Executes a loop for a fixed number of times.
- while: Executes a loop while a condition is TRUE.
- repeat: Execute an infinite loop.
- break: Breaks the execution of a loop.
- next: Skips an iteration of a loop.
- functions in R: For the details of functions please see the R Functions section.
- return: Exits a function and returns a given value. For the details, please see the My R Functions section.
For more information about control structure, please run the code ?control in R.

Conditional Statements

The conditional statements in R allow the functions and programs to perform different calculations according to the value of a logical object.
In R, conditional statement are performed by using the if, else if, and else statements.

Syntax

The general and well-written syntax for a conditional statement starts with an if statement and
- might end without having other conditional statements.
- might end with else statement
- might continue with single or multiple else if statements, and end with else statement.
You can also nest conditional statements in other conditional statements.
It is important to note that R evaluates conditional statements in the order it is written.
- If the condition is TRUE, then R executes the code just below the conditional statement and ignores the rest of the conditional statements.
- If the condition is FALSE, then R skips to the next conditional statement and repeats the previous process.
Note that the code below the conditional statements should be in curly braces {}.

# R code chunk is not evaluated.

# Syntax Case 1
if(conditional.statement) {
    ## Executes the code if the conditional.statement is TRUE.
}

# Syntax 2
if(conditional.statement) {
    ## Executes the code if the conditional.statement is TRUE.
} else {
    ## Executes the code if the conditional.statement is FALSE.
}

# Syntax 3
if(conditional.statement.1) {
    ## Executes the code if the "conditional.statement.1" is TRUE.
} else if (conditional.statement.2) {
    ## Executes the code if the "conditional.statement.1" is FALSE but "conditional.statement.2" is TRUE.
} else {
    ## Executes the code if the "conditional.statement.1" and "conditional.statement.2" are FALSE.
}

# Syntax 4
if(conditional.statement.1) {
    ## Executes the code if the "conditional.statement.1" is TRUE.
    if (conditional.statement.2) {
        ## Executes the code if the "conditional.statement.1" and "conditional.statement.2" is TRUE.
            if (conditional.statement.2) {
            ## Executes the code if the "conditional.statement.1", "conditional.statement.2" and "conditional.statement.2" are TRUE.
            }
    }
}

Case 1

The below examples present a conditional statement with a single if statement.
In the coding part of conditional statements, you can use
- message function to give informational notes to the reader
- warning function to warn the coder about unusual results
- stop function to stop the execution of conditional statements.

# Simple if (single) statement.
x <- -8
if (x < 0) {
   print("Input is a negative number.")
}
#> [1] "Input is a negative number."

## Simple if (multiple) statements with message function.
x <- sample(x = c(-1000:1000), size = 1, replace = TRUE, prob = NULL)
if (x < 0) {
    print("Input is a negative number.")
    message(paste0("Your input is ", x)) ## You can use "message" function to give informational note on the console.
    warning("Something unusual is going on.")
}
#> [1] "Input is a negative number."
#> Your input is -998
#> Warning: Something unusual is going on.
if (x > 0) {
    print("Input is a positive number.")
    message(paste0("Your input is ", x))
    warning("Something unusual is going on.")
}

# Simple if (single) statement with stop function.
if (!("$" %in% letters)) { ## See how to use "stop" function to end the conditional statement if
    stop("Invalid letter.") ## Invalid letter.
}
#> Error in eval(expr, envir, enclos): Invalid letter.

Case 2

The below examples present a conditional statement that ends with an else statement.

# Simple if and else statemenst.
x <- sample(x = c(-1000:1000), size = 1, replace = TRUE, prob = NULL)
if (x < 0) {
   print(paste0("Input, ", x , ", is a negative number."))
} else {
   print(paste0("Input, ", x , ", is a negative number."))
}
#> [1] "Input, 154, is a negative number."

# Simple if and else statement with value assigning inside the conditional statement.
a <- 10
if (a > 3) {
    b <- 10 ## Creating a new variable.
} else {
    b <- 0
}
b
#> [1] 10

# Simple if and else statement with value assigning.
x <- 50
y <- if (x == 50) {
    0
    } else {
        1
    }
y
#> [1] 0

Case 3

The below examples present a conditional statement that has single or multiple else if statements and ends with an else statement.

# If, else if (single) and else statements.
x <- 5
if (x < 0) {
    print("x is a negative number.")
} else if (x == 0) {
    print("x is zero.")
} else {
    print("x is a positive number.")
}
#> [1] "x is a positive number."

# If, else if (multiple) and else statements.
grade <- 100
if (grade < 70) {
    print("Keep studying!!!")
} else if (grade < 80) {
    print("Average")
} else if (grade < 90) {
    print("Good")
} else if (grade < 100) {
    print("Very Good")
} else {
    print("Excellent")
}
#> [1] "Excellent"

Case 4

The below examples present a conditional statement that is nested in another conditional statement.

# If-else statements are nested in if-else statements.
## This conditional statement yields the same answer.
grade <- 100
if (grade < 100) {
    if (grade < 90) {
        if (grade < 80) {
            if (grade < 70) {
                print("Keep studying!!!")
            } else {
                print("Average")
            }
        } else {
            print("Good")
        }
    } else {
        print("Very Good")
    }
} else {
    print("Excellent")
}
#> [1] "Excellent"

# If statements are nested in if statements.
## This conditional statement yields the same answer.
grade <- 100
if (grade < 100) {
    if (grade < 90) {
        if (grade < 80) {
            if (grade < 70) {
                print("Keep studying!!!")
            } 
            if (grade >= 70) {
                print("Average")
            }
        } 
        if (grade >= 80) {
            print("Good")
        }
    } 
    if (grade >= 90) {
        print("Very Good")
    }
} else {
    print("Excellent")
}
#> [1] "Excellent"

ifelse Function

Note that if - else statement is not vectorized.
For vectorized if - else statements, you should use ifelse function.
- ifelse function is a vectorized if statement which checks all the elements of a given R object individually.
- If the condition is correct, ifelse function uses the first value and if not uses the second value.

# If else statement for vectorized objects.
x <- 1:10
if (x < 5) {
    x <- 0
}
#> Warning in if (x < 5) {: the condition has length > 1 and only the first
#> element will be used
x
#> [1] 0

# ifelse function for vectorized object.
x <- 1:10
y <- ifelse(x < 5, 0, 1) 
y
#>  [1] 0 0 0 0 1 1 1 1 1 1

Loops

The most common way to execute a block of code multiple times is with loops.
In R, you can use three different loops.
- for loop to execute the code for fixed number of times.
- while loop to execute the code as long as the tested condition is true.
- repeat loop to execute the code for infinite number of times.
You can also use the next and break statements to skip some iterations and to terminate the loop.

for Loops

for loops take a counter variable and assign it to successive values of a sequence or vector and execute the loop for fixed number of times.
They are commonly used for iterating over the elements of an object such as vector, list, and etc.
In R, the general syntax of a for loop starts with the for statement and
- continues with the counter information i in x for some vector (or list) x, where the counter i takes iterative values of x,
- and then continues with executing the code body, which should be in curly braces {}.
Most commonly, x is a vector of n natural numbers.
Also note that i is a dummy variable, and can be called as whatever you like, though it retains its value outside the loop.

# R code chunk is not evaluated.

# The general syntax of a for loop
for (counter in sequence) {
    ## Executes the code for each iteration of counter.
}

# The most common syntax of a for loop
for (i in x) {
    ## Executes the code for each iteration of counter i in x.
}

Let’s see how simple for loops work in R.

# Simple for loop.
## This loop takes the variable "i" and for each iteration of the loop 1, 2, 3, ..., 10, are assigned to it. After the last iteration the loop exits.
for (i in 1:10) { ## Counter is "i".
    print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10

# Simple for loop with a different counter.
x <- c("a", "b", "c")
for (NCSU in 1:length(x)) { ## Counter is "NCSU"
    print(x[NCSU])
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
NCSU
#> [1] 3

# Simple for loop with seq-along function.
sample.size <- sample(x = c(5:10), size = 1)
x <- sample(x = c(-10:10), size = sample.size, replace = TRUE, prob = NULL)
for (j in seq_along(x)) { ## Counter is "j"
    print(x[j])
}
#> [1] -1
#> [1] 10
#> [1] 10
#> [1] 4
#> [1] 5
j
#> [1] 5

for loops can be nested in other loops.
Nesting beyond 3-4 levels of for loop is often very difficult to read and understand. Thus, be careful with nesting loops.

# Simple nested for loops.
x <- matrix(data = c(1:6), nrow = 2, ncol = 3)
x
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
for (i in 1:nrow(x)) { ## Looping over rows.
    for (j in 1:ncol(x)) { ## Looping over columns.
        print(x[i, j])
    }
}
#> [1] 1
#> [1] 3
#> [1] 5
#> [1] 2
#> [1] 4
#> [1] 6
i ## Number of rows.
#> [1] 2
j ## Number of columns.
#> [1] 3

The next statement can be used to skip an iteration of a loop.
The break statement can be used to terminate any loop.

# Next statement.
for (i in 1:7) {
    if (i <= 5) {
        next ## Skip the first 5 iterations.
    }
    print(i)
}
#> [1] 6
#> [1] 7

# Break statement.
for (i in 1:7) {
    if (i > 5) {
        break ## Terminates the loop on the 6th iteration.
    }
    print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5

while Loops

while loops are used to execute a code body for many times as long as the tested condition is true.
In R, the general syntax of a while loop starts with the while statement and
- continues with a condition to be tested
- if the condition is TRUE, then it executes the code body, which should be in curly braces {},
- once the code body is executed, the condition is tested again, and so forth.
While loops can potentially result in infinite loops if not written properly. Use with care!

# Simple while loop
count <- 0 ## "Count" variable initializes with 0.
while (count < 10) {
    print(count)
    count <- count + 1 ## Count variable is updated and the loop starts again.
}
#> [1] 0
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9

# Sometimes there will be more than one condition in the test.
x <- 5
while (x >= 3 & x <= 10) {
    print(x)
    coin <- rbinom(n = 1, size = 1, prob = 0.5) ## Flips a fair coin. 0 means fail, 1 means success.
    if (coin == 1) { ## random walk
        x <- x + 1
    } else {
        x <- x - 1
    }
}
#> [1] 5
#> [1] 6
#> [1] 7
#> [1] 8
#> [1] 9
#> [1] 10
#> [1] 9
#> [1] 10
#> [1] 9
#> [1] 10
#> [1] 9
#> [1] 10
#> [1] 9
#> [1] 10

repeat Loops

repeat loops are used to execute a code body for infinite times.
break is the only way to terminate repeat loops.
Infinite loops are not commonly used in statistical applications but they do have their uses.
Infinite loops should generally be avoided, even if they are theoretically correct.

# Simple repeat loop.
x <- 1 ## Initial value.
repeat {
   print(x)
   if (x == 6) {
       break ## If the condition TRUE then stop.
   } else {
       x <- x + 1 ## If the condition is FALSE then run the this code.
   }
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5
#> [1] 6

repeat loops are dangerous since there’s no guarantee it will stop.
You need to be very careful when writing repeat loops.

# R code chunk is not evaluated.

# Simple repeat loop which does not stop.
x <- 1 ## Initial value.
repeat {
   print(x)
   if (x > Inf) {
       break ## If the condition TRUE then stop.
   } else {
       x <- x + 1 ## If the condition is FALSE then run the this code.
   }
}

Loop Functions

for loops are primarily useful for writing programs but not particularly easy when working interactively on the command line (coding in console).
For command-line interactive work, the loop functions are more useful.
There are some functions which implement looping to make life easier.
- lapply: Loops over a list and evaluates a function on each element.
- sapply: Same as lapply but tries to simplify the result.
- apply: Apply a function over the margins of an array.
- tapply: Apply a function over subsets of a vector.
- mapply: Multivariate version of lapply.

lapply

lapply takes three arguments
- a list
- a function FUN (the name of the function)
- other arguments which can be passed from other functions.
If X is not a list, it will be coerced to a list using as.list function.
lapply always returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.
An auxiliary function split is also useful, particularly in conjunction with lapply.

# lapply function with a list.
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(X = x, FUN = mean) ## Gives the mean of each element on a list.
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.31572065
#> 
#> $c
#> [1] 1.0130227
#> 
#> $d
#> [1] 5.0642876

# lapply function with a numeric vector
x <- c(1:4)
lapply(x, runif) ## "runif" function creates uniform rondom variables. First arguement in "runif" is the number of the variables that you want to create uniform random variables. lapply gives you runif(1), runif(2).... Note that "runif" has other arguments but we dont need to specify these right now since they have default values. The default is uniform between 0 and 1.
#> [[1]]
#> [1] 0.46511458
#> 
#> [[2]]
#> [1] 0.50077008 0.97716812
#> 
#> [[3]]
#> [1] 0.038445408 0.202400500 0.308886211
#> 
#> [[4]]
#> [1] 0.73275655 0.31960197 0.64804636 0.66449284

# lapply function with a numeric vector and passing arguments from other functions.
x <- c(1:4)
lapply(x, runif, min = 0, max = 10) ## "min" and "max" arguements are passed from "runif" function.
#> [[1]]
#> [1] 1.8824985
#> 
#> [[2]]
#> [1] 6.1161249 7.9408740
#> 
#> [[3]]
#> [1] 9.25222353 0.53183536 7.33753542
#> 
#> [[4]]
#> [1] 5.3858203 5.6780073 6.7352234 1.6439398

You can also use anonymous function in loop functions.
Let’s see an example of anonymous function in lapply.

# lapply function with a anonymous function.
x <- list(a = matrix(1:8, 4, 2), b = matrix(1:12, 3, 4))
lapply(x, function(col) col[, 1]) ## An anonymous function for extracting the first column of each matrix. There is no function "col" but we just write it and used in lapply. After lapply is finished this function will go away so this "elt" function is anonymous function.
#> $a
#> [1] 1 2 3 4
#> 
#> $b
#> [1] 1 2 3

An auxiliary function split is also useful, particularly in conjunction with lapply.
- split function takes a vector or other objects and splits it into groups determined by a factor or list of factors.
- It always returns a list.
- split function is not a loop function but is very useful that can be used in conjunction with loop functions.
Actually, using the split and lapply functions does the same thing as tapply function does. We will see the details of tapply function later.

# Using split and lapply functions together.
x <- c(rnorm(5), runif(5), rnorm(5, 1))
f <- gl(n = 3, k = 5) ## Generates factor levels.
split(x, f)
#> $`1`
#> [1]  1.264886486 -0.030771014  0.879678098 -1.785263360 -1.334145395
#> 
#> $`2`
#> [1] 0.67387138 0.88993253 0.43541792 0.20668692 0.77929156
#> 
#> $`3`
#> [1]  1.34543996  1.54950796  1.08142072 -0.52866736  0.54534641
lapply(split(x, f), mean)
#> $`1`
#> [1] -0.20112304
#> 
#> $`2`
#> [1] 0.59704006
#> 
#> $`3`
#> [1] 0.79860954

sapply

sapply tries to simplify the result of lapply if possible.
- If the result is a list where every element is length 1, then a vector is returned.
- If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
- If it can’t figure things out, a list is returned.

# sapply function with a list.
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(X = x, FUN = mean) ## List format.
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.39509565
#> 
#> $c
#> [1] 1.0345496
#> 
#> $d
#> [1] 4.9814706
sapply(X = x, FUN = mean, simplify = FALSE) ## Same as lapply.
#> $a
#> [1] 2.5
#> 
#> $b
#> [1] -0.39509565
#> 
#> $c
#> [1] 1.0345496
#> 
#> $d
#> [1] 4.9814706
sapply(X = x, FUN = mean) ## Vector format.
#>           a           b           c           d 
#>  2.50000000 -0.39509565  1.03454962  4.98147062
mean(x) ## Note that mean function cannot handle list objects.
#> Warning in mean.default(x): argument is not numeric or logical: returning
#> NA
#> [1] NA

apply

apply is used to a evaluate a function, often an anonymous one, over the margins of an array X.
- If X is not an array but has a dimension attribute, apply attempts to coerce it to an array via as.matrix function if it is two-dimensional (e.g., data frames) or via as.array function.
- MARGIN is an integer vector indicating which margins should be retained.
- FUN is a function to be applied.
- As in lapply and sapply, you can pass arguments from other functions.
apply can return different outputs.
- If each call to FUN returns a vector of length n and if n > 1, then apply returns an array of dimension c(n, dim(X)[MARGIN]).
- If n equals 1, apply returns a vector if MARGIN has length 1, and an array of dimension dim(X)[MARGIN] otherwise.
It is most often used to apply a function to the rows or columns of a matrix or data frame.
It can be used with general arrays, e.g., taking the average of an array of matrices.
It is not really faster than writing a loop, but it works in one line!

# apply function on a matrix which returns a vector.
x <- matrix(rnorm(200), 20, 10)
y <- apply(X = x, MARGIN = 2, FUN = mean) ## Means of columns.
y
#>  [1]  0.1281966691  0.0547078269 -0.1353001075 -0.0749594454  0.3316059745
#>  [6]  0.0032045774 -0.2222750870  0.3328635026  0.1081437661 -0.0403008476
class(y)
#> [1] "numeric"
str(y)
#>  num [1:10] 0.1282 0.0547 -0.1353 -0.075 0.3316 ...

apply(x, 1, sum) ## Calculates the sum of each row.
#>  [1]  1.876708309 -1.274228306  3.320315421  3.311382123  3.733855170
#>  [6]  0.500747121  0.813166962 -6.339688707  2.166718702 -0.044351288
#> [11]  3.412550506 -1.443729958  2.260549728 -2.354827026  1.356863014
#> [16] -1.100915952  0.640174366 -1.862292631  0.745788668 -0.001049639

# apply function on a matrix which returns a array.
x <- matrix(rnorm(200), 20, 10)
y <- apply(X = x, MARGIN = 2, FUN = quantile, probs = c(0.25, 0.75)) ## Gives the first and the third quantiles of each column.
y
#>            [,1]        [,2]        [,3]        [,4]       [,5]        [,6]
#> 25% -0.19656913 -0.57894620 -0.58522113 -0.14105238 -0.5541239 -0.74813513
#> 75%  0.90453170  0.53543041  0.64705173  0.85289385  1.2061409  0.48785013
#>             [,7]       [,8]        [,9]       [,10]
#> 25% -0.038155669 -1.1057168 -0.26771426 -0.60151437
#> 75%  0.774704143  1.0904228  0.75291972  0.35112934
class(y)
#> [1] "matrix"
str(y)
#>  num [1:2, 1:10] -0.197 0.905 -0.579 0.535 -0.585 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:2] "25%" "75%"
#>   ..$ : NULL

# apply function on an array which returns a matrix.
## Gives averages of an array in a matrix format.
x <- array(rnorm(2 * 2 * 10), c(2, 2, 10)) ## This array has 3 dimensions: with 2 rows, 2 columns and the 3rd dimension with number 10.
apply(x, c(1, 2), mean) ## Generates the mean of the array with the 1st and 2nd dimension. In other meaning, 3rd dimension is collapssed. So the resulting matrix will be a 2x2 matrix with means.
#>              [,1]        [,2]
#> [1,] -0.025901501 -0.23329858
#> [2,] -0.015702859  0.45235361
rowMeans(x, dims = 2) ## Gives the same result as above. "2" represents the first number of dimensions which are preserved.
#>              [,1]        [,2]
#> [1,] -0.025901501 -0.23329858
#> [2,] -0.015702859  0.45235361

# apply function on an array which returns an array.
x <- array(rnorm(2 * 2 * 10), c(2, 2, 10)) 
apply(x, c(2, 3), mean) ## Means for the 2nd and the 3rd dimensions.
#>             [,1]        [,2]       [,3]        [,4]        [,5]
#> [1,] 1.032033819 -0.22886939 1.15934540 -1.05876132  0.32213331
#> [2,] 0.076930345  0.29861014 0.39098475  0.55558119 -0.41934945
#>             [,6]      [,7]        [,8]         [,9]      [,10]
#> [1,]  0.42438482 1.0297013  0.41704257 -0.088439608 -0.0419970
#> [2,] -0.97151164 2.8643074 -0.20719924  0.879466661  1.4425146

For sums and means of matrix dimensions, you can use some shortcut functions.
The shortcut functions are much faster, but you will not notice unless you are using a large matrix.

x <- matrix(rnorm(15), 3, 5)
apply(x, 1, sum) ## Same as "rowSums(x)" function.
#> [1] 1.15104386 0.52906972 1.09875316
rowSums(x)
#> [1] 1.15104386 0.52906972 1.09875316

apply(x, 1, mean) ## Same as "rowMeans(x)" function.
#> [1] 0.23020877 0.10581394 0.21975063
rowMeans(x)
#> [1] 0.23020877 0.10581394 0.21975063

apply(x, 2, sum) ## Same as "colSums(x)" function.
#> [1]  0.35154985  1.84621859 -0.74730656  0.61891238  0.70949248
colSums(x)
#> [1]  0.35154985  1.84621859 -0.74730656  0.61891238  0.70949248

apply(x, 2, mean) ## Same as "colMeans(x)" function.
#> [1]  0.11718328  0.61540620 -0.24910219  0.20630413  0.23649749
colMeans(x)
#> [1]  0.11718328  0.61540620 -0.24910219  0.20630413  0.23649749

tapply

tapply is used to apply a function over a subset of a vector X which is given by a unique combination of the levels of certain factors.
- X is typically a vector.
- INDEX is a list of factors, each of same length as X. Its elements are coerced to factors by as.factor function.
- FUN is the function to be applied.
- As in the other loop functions, you can pass arguments from other functions.
If argument simplify is FALSE, then tapply returns as list, otherwise it returns an array.

# tapply function (simple).
x <- c(rnorm(10), runif(10), rnorm(10, 1))
f <- gl(n = 3, k = 10) ## Generates factor levels.
tapply(X = x, INDEX = f, FUN = mean) ## A factor level is assigned to each value in x in order.
#>           1           2           3 
#> 0.096512364 0.492764916 1.358479278
tapply(x, f, range) ## Gives the min and max within the subset of x.
#> $`1`
#> [1] -1.4331798  1.0072600
#> 
#> $`2`
#> [1] 0.0038729776 0.9113824219
#> 
#> $`3`
#> [1] -0.88013112  2.81199630

tapply(x, f, mean, simplify = FALSE) ## The result is in a list.
#> $`1`
#> [1] 0.096512364
#> 
#> $`2`
#> [1] 0.49276492
#> 
#> $`3`
#> [1] 1.3584793
lapply(split(x, f), mean) ## Same as above.
#> $`1`
#> [1] 0.096512364
#> 
#> $`2`
#> [1] 0.49276492
#> 
#> $`3`
#> [1] 1.3584793

# tapply function (complex).
x <- c(rnorm(5), rnorm(5, 1), rnorm(5, 2), rnorm(5, 3)) ## Our values.
f1 <- factor(rep(1:2, each = 10)) ## First factor.
f2 <- factor(rep(rep(3:4, each = 5), times = 2)) ## Second factor.
f <- list(f1, f2) ## List of factors.
tapply(X = x, INDEX = f, FUN = mean)
#>            3          4
#> 1 0.13168967 0.97117196
#> 2 2.07530546 3.21447994

mapply

mapply is a multivariate version of sapply function.
- FUN is a function to apply.
- ... contains arguments to apply over.
- MoreArgs is a list of other arguments to FUN.
- SIMPLIFY indicates whether the result should be simplified.
mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

# mapply function (simple).
list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1)) # Instead we can do the below code.
#> [[1]]
#> [1] 1 1 1 1
#> 
#> [[2]]
#> [1] 2 2 2
#> 
#> [[3]]
#> [1] 3 3
#> 
#> [[4]]
#> [1] 4
mapply(rep, 1:4, 4:1) ## mapply function takes the arguement in order.
#> [[1]]
#> [1] 1 1 1 1
#> 
#> [[2]]
#> [1] 2 2 2
#> 
#> [[3]]
#> [1] 3 3
#> 
#> [[4]]
#> [1] 4

# mapply function (complex).
noise <- function(n, mean, sd) { ## A function for n, mean and sd.
    rnorm(n, mean, sd)
}
noise(5, 1, 2)
#> [1] 5.77172396 6.75199012 0.33407791 1.98091386 5.99155326
noise(1:5, 1:5, 2) ## It does not work correctly for set of n's and means's. No vectorization.
#> [1] -1.38768901  1.96256096 -0.68194949  5.54666240  4.82433828
mapply(noise, 1:5, 1:5, 2) ## With mapply, it will be vectorized.
#> [[1]]
#> [1] -0.045964864
#> 
#> [[2]]
#> [1] 4.0227480 1.5404014
#> 
#> [[3]]
#> [1] 2.27326866 3.36000422 0.53698866
#> 
#> [[4]]
#> [1] -0.76509722  2.36799894 -0.33937480  3.08212633
#> 
#> [[5]]
#> [1] 5.1811054 7.4741680 6.3593198 6.2523982 7.9988633

Case Study

In the below case study, let’s see how the conditional statements, loops and R functions ease our job in coding.
Consider that with a given income you want to calculate the income tax and net income (income after tax) for single family households.
- The 2017 tax brackets for single family households (for taxes due in April 17, 2018) can be found here.
- Let’s further assume that you want calculate these measures for each employee (n = 10) of a company.

n <- 10 ## Number of employees.
all.incomes <- sample(x = c(0:(5*10^5)), size = n, replace = FALSE, prob = NULL) ## Random sample for income.
all.incomes ## Income values.
#>  [1] 193763 295560 496736 334982 157274 394491 494968 207406  45176  55453

Conditional Statement

Let’s calculate these measures with a conditional statement.
- Note that the below conditional statement is for only one employee.
- To calculate these measures for other employees, change the index in the all.incomes object below.

# Calculating for one employee.
income <- all.incomes[1] ## Income for the first employee.
if (income <= 0) {
    tax <- 0
} else if (income <= 9325) {
    tax <- income * 0.1
} else if (income <= 37950) {
    tax <- income * 0.15
} else if (income <= 91900) {
    tax <- income * 0.25
} else if (income <= 191650) {
    tax <- income * 0.28
} else if (income <= 416700) {
    tax <- income * 0.33
} else if (income <= 418400) {
    tax <- income * 0.35
} else {
    tax <- income * 0.396
}
c("Income" = income, "Tax" = tax, "Net Income" = income - tax)
#>     Income        Tax Net Income 
#>  193763.00   63941.79  129821.21

Loop

Let’s calculate these measures with a loop.
- Note that the below conditional statement is for only one employee.
- To calculate these measures for other employees, change the index in the all.incomes object below.
Also, lets add the tax rate to our results.

# Calculating for one employee.
tax.brackets <- list(c(0, 0.1), c(9325, 0.15), c(37950, 0.25), c(91900, 0.28), c(191650, 0.33), c(416700, 0.35), c(418400, 0.396)) ## Tax brackets in a list.
income <- all.incomes[1]
for (i in 1:length(tax.brackets)) {
    if (tax.brackets[[i]][1] < income) {
        tax.rate <- tax.brackets[[i]][2]
        tax <- income * tax.rate
    }
}
c("Income" = income, "Tax Rate" = tax.rate, "Tax" = tax, "Net Income" = income - tax)
#>     Income   Tax Rate        Tax Net Income 
#>  193763.00       0.33   63941.79  129821.21

You can add an additional loop to calculate these measures for all employees.

# Calculating for all employee.
tax.brackets <- list(c(0, 0.1), c(9325, 0.15), c(37950, 0.25), c(91900, 0.28), c(191650, 0.33), c(416700, 0.35), c(418400, 0.396)) ## Tax brackets in a list.

for (j in 1:length(all.incomes)) {
    income <- all.incomes[j]
    for (i in 1:length(tax.brackets)) {
        if (tax.brackets[[i]][1] < income) {
            tax.rate <- tax.brackets[[i]][2]
            tax <- income * tax.rate
        }
    }
    if (j == 1) {
        results <- c(income, tax.rate, tax, income - tax)
    } else {
        temp <- c(income, tax.rate, tax, income - tax)
        results <- rbind(results, temp)
    }
}
results <- as.data.frame(results, row.names = paste("Employee", " ", 1:length(all.incomes)), stringsAsFactors = FALSE)
colnames(results) <- c("Income", "Tax Rate", "Tax", "Net Income")
results

Function

Let’s write a R function which calculates these measures with any pre-specified tax brackets for all employees.

# Calculating with any pre-specified tax brackets for all employees.
## Pre-specified tax brackets in a list.
tax.brackets <- list(c(0, 0.1), c(9325, 0.15), c(37950, 0.25), c(91900, 0.28), c(191650, 0.33), c(416700, 0.35), c(418400, 0.396))

## Function with Income and Tax.Brakets options.
tax.func <- function(Income, Tax.Brackets) {
    for (j in 1:length(Income)) {
        for (i in 1:length(Tax.Brackets)) {
            if (Tax.Brackets[[i]][1] < Income[j]) {
                tax.rate <- Tax.Brackets[[i]][2]
                tax <- Income[j] * tax.rate
            }
        }
        if (j == 1) {
            results <- c(Income[j], tax.rate, tax, Income[j] - tax)
        } else {
            temp <- c(Income[j], tax.rate, tax, Income[j] - tax)
            results <- rbind(results, temp)
        }
    }
    results <- as.data.frame(results, row.names = paste("Employee", " ", 1:length(Income)), stringsAsFactors = FALSE)
    colnames(results) <- c("Income", "Tax Rate", "Tax", "Net Income")
    return(results)
}

tax.func(Income = all.incomes, Tax.Brackets = tax.brackets)

Now, let’s try a made-up tax bracket.

# Calculating with any pre-specified tax brackets for all employees.
## Pre-specified tax brackets in a list.
tax.brackets <- list(c(0, 0.1), c(50000, 0.25), c(100000, 0.30), c(150000, 0.35), c(200000, 0.40), c(300000, 0.45), c(400000, 0.50))

tax.func(Income = all.incomes, Tax.Brackets = tax.brackets)

7 Data

In this section I will show only the most used alternatives.

Importing

Data in R

Local Data

RData

csv

Excel

txt

Delimited

SAS

SPSS

Others

Data Downloading

Data Scraping

Exporting

8 Descriptive Statistics

9 Exploratory Data Analysis

10 Linear Regression

11 Simulations

12 Reproducible Research

The term reproducible research refers to the idea that the ultimate product of an academic research can be recreated by an independent investigator using the full computational environment utilized to produce the results in the paper such as the original code, original data, and etc.
The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that the study can be recreated, better understood and verified.
Reproducibility is important because it is the only thing that an investigator can guarantee about a study.
Although, scholars are inclined to use the terms Replication and Reproducibility interchangeably, there is a distinction between them in the context of scientific verification.
- Replication is done by independent people using new data and even code.
- Reproducibility is done by independent people using the same data, code and computational environment.
See CRAN Task View: Reproducible Research for all available R packages related to reproducible research techniques.
See here for reproducibility in economics.

Markdown

Markdown is a text-to-HTML conversion tool for web writers.
Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
For more information about Markdown, please see its wiki page.

R Markdown

R Markdown allows you to create documents (PDF, beamer slides, markdown, and HTML) that serve as a neat record of your text and coding with its output (graphs, tables, and etc.).
R Markdown is a wonderful tool for reproducible research.
See R Markdown from RStudio for the details and tutorial of R Markdown.
Interested users should see R Markdown Gallery for the range of outputs and formats you can create using R Markdown.
Finally, if you are still hungry for more information, see bookdown by Yihui Xie for preparing books with R Markdown.

Knitr

To compile/render R Markdown files, knitr is necessary which is an engine for dynamic report generation with R.
For the details of knitr, see knitr page prepared by Yihui Xie.
R Markdown and knitr come pre-installed with RStudio so there is need for further action.
If you are planing to use LaTeX to generate reports in PDF via R Markdown, it is better to install MacTeX distribution for Mac and MacTeX distribution for PC.

R Session Info

It’s a a good idea to end with some information about the packages you used, their versions, and even the version of R and Rstudio that you used.
The sessionInfo() function provides this information. Even better is to install the devtools package and use devtools::session_info().

R.version.string ## Returns the R version in a string.

#> [1] "R version 3.3.3 (2017-03-06)"

sessionInfo() ## From utils package.

#> R version 3.3.3 (2017-03-06)
#> Platform: x86_64-apple-darwin13.4.0 (64-bit)
#> Running under: OS X Mavericks 10.9.5
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] gvlma_1.0.0.2        psych_1.7.8          pastecs_1.3-18      
#>  [4] boot_1.3-20          gapminder_0.3.0      car_2.1-6           
#>  [7] leaflet_1.1.0        spdep_0.7-4          spData_0.2.6.7      
#> [10] Matrix_1.2-11        sp_1.2-5             zoo_1.8-0           
#> [13] NCmisc_1.1.5         magrittr_1.5         rvest_0.3.2         
#> [16] xml2_1.1.1           lubridate_1.7.1      dygraphs_1.1.1.4    
#> [19] plotly_4.7.1         ggplot2_2.2.1        DT_0.2              
#> [22] tibble_1.3.4         kableExtra_0.6.1     stargazer_5.2       
#> [25] xtable_1.8-2         stringr_1.2.0        XLConnect_0.2-13    
#> [28] XLConnectJars_0.2-13 ctv_0.8-3            knitr_1.17          
#> [31] rmarkdown_1.8        devtools_1.13.4      openssl_0.9.9       
#> [34] checkpoint_0.4.3    
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-131        pbkrtest_0.4-7      gmodels_2.16.2     
#>  [4] httr_1.3.1          rprojroot_1.2       tools_3.3.3        
#>  [7] backports_1.1.1     R6_2.2.2            lazyeval_0.2.1     
#> [10] mgcv_1.8-17         colorspace_1.3-2    nnet_7.3-12        
#> [13] withr_2.1.0         mnormt_1.5-5        quantreg_5.34      
#> [16] SparseM_1.77        expm_0.999-2        scales_0.5.0       
#> [19] readr_1.1.1         digest_0.6.12       foreign_0.8-69     
#> [22] minqa_1.2.4         pkgconfig_2.0.1     htmltools_0.3.6    
#> [25] lme4_1.1-14         highr_0.6           htmlwidgets_0.9    
#> [28] rlang_0.1.4         rstudioapi_0.7      shiny_1.0.5        
#> [31] bindr_0.1           jsonlite_1.5        crosstalk_1.0.0    
#> [34] gtools_3.5.0        dplyr_0.7.4         Rcpp_0.12.14       
#> [37] munsell_0.4.3       stringi_1.1.6       yaml_2.1.16        
#> [40] MASS_7.3-47         plyr_1.8.4          grid_3.3.3         
#> [43] parallel_3.3.3      gdata_2.18.0        deldir_0.1-14      
#> [46] lattice_0.20-34     splines_3.3.3       hms_0.4.0          
#> [49] LearnBayes_2.15     glue_1.2.0          evaluate_0.10.1    
#> [52] data.table_1.10.4-3 nloptr_1.0.4        httpuv_1.3.5       
#> [55] MatrixModels_0.4-1  gtable_0.2.0        purrr_0.2.4        
#> [58] tidyr_0.7.2         assertthat_0.2.0    mime_0.5           
#> [61] coda_0.19-1         viridisLite_0.2.0   rJava_0.9-9        
#> [64] proftools_0.99-2    memoise_1.1.0       bindrcpp_0.2

devtools::session_info() ## From devtools package.

#> Session info -------------------------------------------------------------

#>  setting  value                       
#>  version  R version 3.3.3 (2017-03-06)
#>  system   x86_64, darwin13.4.0        
#>  ui       RStudio (1.1.419)           
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2018-02-06

#> Packages -----------------------------------------------------------------

#>  package       * version  date       source        
#>  assertthat      0.2.0    2017-04-11 CRAN (R 3.3.2)
#>  backports       1.1.1    2017-09-25 CRAN (R 3.3.2)
#>  base          * 3.3.3    2017-03-07 local         
#>  bindr           0.1      2016-11-13 CRAN (R 3.3.2)
#>  bindrcpp        0.2      2017-06-17 CRAN (R 3.3.2)
#>  boot          * 1.3-20   2017-07-30 CRAN (R 3.3.2)
#>  car           * 2.1-6    2017-11-19 CRAN (R 3.3.2)
#>  checkpoint    * 0.4.3    2017-12-19 CRAN (R 3.3.2)
#>  coda            0.19-1   2016-12-08 CRAN (R 3.3.2)
#>  colorspace      1.3-2    2016-12-14 CRAN (R 3.3.2)
#>  crosstalk       1.0.0    2016-12-21 CRAN (R 3.3.2)
#>  ctv           * 0.8-3    2017-10-07 CRAN (R 3.3.2)
#>  data.table      1.10.4-3 2017-10-27 CRAN (R 3.3.2)
#>  datasets      * 3.3.3    2017-03-07 local         
#>  deldir          0.1-14   2017-04-22 CRAN (R 3.3.2)
#>  devtools      * 1.13.4   2017-11-09 CRAN (R 3.3.2)
#>  digest          0.6.12   2017-01-27 CRAN (R 3.3.2)
#>  dplyr           0.7.4    2017-09-28 CRAN (R 3.3.2)
#>  DT            * 0.2      2016-08-09 CRAN (R 3.3.0)
#>  dygraphs      * 1.1.1.4  2017-01-04 CRAN (R 3.3.2)
#>  evaluate        0.10.1   2017-06-24 CRAN (R 3.3.2)
#>  expm            0.999-2  2017-03-29 CRAN (R 3.3.2)
#>  foreign         0.8-69   2017-06-21 CRAN (R 3.3.2)
#>  gapminder     * 0.3.0    2017-10-31 CRAN (R 3.3.2)
#>  gdata           2.18.0   2017-06-06 CRAN (R 3.3.2)
#>  ggplot2       * 2.2.1    2016-12-30 CRAN (R 3.3.2)
#>  glue            1.2.0    2017-10-29 CRAN (R 3.3.2)
#>  gmodels         2.16.2   2015-07-22 CRAN (R 3.3.0)
#>  graphics      * 3.3.3    2017-03-07 local         
#>  grDevices     * 3.3.3    2017-03-07 local         
#>  grid            3.3.3    2017-03-07 local         
#>  gtable          0.2.0    2016-02-26 CRAN (R 3.3.0)
#>  gtools          3.5.0    2015-05-29 CRAN (R 3.3.0)
#>  gvlma         * 1.0.0.2  2014-01-21 CRAN (R 3.3.0)
#>  highr           0.6      2016-05-09 CRAN (R 3.3.0)
#>  hms             0.4.0    2017-11-23 CRAN (R 3.3.2)
#>  htmltools       0.3.6    2017-04-28 CRAN (R 3.3.2)
#>  htmlwidgets     0.9      2017-07-10 CRAN (R 3.3.2)
#>  httpuv          1.3.5    2017-07-04 CRAN (R 3.3.2)
#>  httr            1.3.1    2017-08-20 CRAN (R 3.3.2)
#>  jsonlite        1.5      2017-06-01 CRAN (R 3.3.2)
#>  kableExtra    * 0.6.1    2017-11-01 CRAN (R 3.3.2)
#>  knitr         * 1.17     2017-08-10 CRAN (R 3.3.2)
#>  lattice         0.20-34  2016-09-06 CRAN (R 3.3.3)
#>  lazyeval        0.2.1    2017-10-29 CRAN (R 3.3.2)
#>  leaflet       * 1.1.0    2017-02-21 CRAN (R 3.3.2)
#>  LearnBayes      2.15     2014-05-29 CRAN (R 3.3.0)
#>  lme4            1.1-14   2017-09-27 CRAN (R 3.3.2)
#>  lubridate     * 1.7.1    2017-11-03 CRAN (R 3.3.2)
#>  magrittr      * 1.5      2014-11-22 CRAN (R 3.3.0)
#>  MASS            7.3-47   2017-04-21 CRAN (R 3.3.2)
#>  Matrix        * 1.2-11   2017-08-16 CRAN (R 3.3.2)
#>  MatrixModels    0.4-1    2015-08-22 CRAN (R 3.3.0)
#>  memoise         1.1.0    2017-04-21 CRAN (R 3.3.2)
#>  methods       * 3.3.3    2017-03-07 local         
#>  mgcv            1.8-17   2017-02-08 CRAN (R 3.3.3)
#>  mime            0.5      2016-07-07 CRAN (R 3.3.0)
#>  minqa           1.2.4    2014-10-09 CRAN (R 3.3.0)
#>  mnormt          1.5-5    2016-10-15 CRAN (R 3.3.0)
#>  munsell         0.4.3    2016-02-13 CRAN (R 3.3.0)
#>  NCmisc        * 1.1.5    2017-01-03 CRAN (R 3.3.2)
#>  nlme            3.1-131  2017-02-06 CRAN (R 3.3.3)
#>  nloptr          1.0.4    2014-08-04 CRAN (R 3.3.0)
#>  nnet            7.3-12   2016-02-02 CRAN (R 3.3.3)
#>  openssl       * 0.9.9    2017-11-10 CRAN (R 3.3.2)
#>  parallel        3.3.3    2017-03-07 local         
#>  pastecs       * 1.3-18   2014-03-02 CRAN (R 3.3.0)
#>  pbkrtest        0.4-7    2017-03-15 CRAN (R 3.3.2)
#>  pkgconfig       2.0.1    2017-03-21 CRAN (R 3.3.2)
#>  plotly        * 4.7.1    2017-07-29 CRAN (R 3.3.2)
#>  plyr            1.8.4    2016-06-08 CRAN (R 3.3.0)
#>  proftools       0.99-2   2016-01-13 CRAN (R 3.3.0)
#>  psych         * 1.7.8    2017-09-09 CRAN (R 3.3.3)
#>  purrr           0.2.4    2017-10-18 CRAN (R 3.3.2)
#>  quantreg        5.34     2017-10-25 CRAN (R 3.3.2)
#>  R6              2.2.2    2017-06-17 CRAN (R 3.3.2)
#>  Rcpp            0.12.14  2017-11-23 CRAN (R 3.3.2)
#>  readr           1.1.1    2017-05-16 CRAN (R 3.3.2)
#>  rJava           0.9-9    2017-10-12 CRAN (R 3.3.2)
#>  rlang           0.1.4    2017-11-05 CRAN (R 3.3.2)
#>  rmarkdown     * 1.8      2017-11-17 CRAN (R 3.3.2)
#>  rprojroot       1.2      2017-01-16 CRAN (R 3.3.2)
#>  rstudioapi      0.7      2017-09-07 CRAN (R 3.3.2)
#>  rvest         * 0.3.2    2016-06-17 CRAN (R 3.3.0)
#>  scales          0.5.0    2017-08-24 CRAN (R 3.3.2)
#>  shiny           1.0.5    2017-08-23 CRAN (R 3.3.2)
#>  sp            * 1.2-5    2017-06-29 CRAN (R 3.3.2)
#>  SparseM         1.77     2017-04-23 CRAN (R 3.3.2)
#>  spData        * 0.2.6.7  2017-11-28 CRAN (R 3.3.2)
#>  spdep         * 0.7-4    2017-11-22 CRAN (R 3.3.2)
#>  splines         3.3.3    2017-03-07 local         
#>  stargazer     * 5.2      2015-07-14 CRAN (R 3.3.0)
#>  stats         * 3.3.3    2017-03-07 local         
#>  stringi         1.1.6    2017-11-17 CRAN (R 3.3.2)
#>  stringr       * 1.2.0    2017-02-18 CRAN (R 3.3.2)
#>  tibble        * 1.3.4    2017-08-22 CRAN (R 3.3.2)
#>  tidyr           0.7.2    2017-10-16 CRAN (R 3.3.2)
#>  tools           3.3.3    2017-03-07 local         
#>  utils         * 3.3.3    2017-03-07 local         
#>  viridisLite     0.2.0    2017-03-24 CRAN (R 3.3.2)
#>  withr           2.1.0    2017-11-01 CRAN (R 3.3.2)
#>  XLConnect     * 0.2-13   2017-05-14 CRAN (R 3.3.2)
#>  XLConnectJars * 0.2-13   2017-05-14 CRAN (R 3.3.2)
#>  xml2          * 1.1.1    2017-01-24 CRAN (R 3.3.2)
#>  xtable        * 1.8-2    2016-02-05 CRAN (R 3.3.0)
#>  yaml            2.1.16   2017-12-12 CRAN (R 3.3.2)
#>  zoo           * 1.8-0    2017-04-12 CRAN (R 3.3.2)

13 Used R Functions

This section lists all the used built-in R functions in alphabetical order.

#>   [1] "any"             "apply"           "args"           
#>   [4] "array"           "as.array"        "as.character"   
#>   [7] "as.complex"      "as.data.frame"   "as.Date"        
#>  [10] "as.factor"       "as.list"         "as.logical"     
#>  [13] "as.matrix"       "as.numeric"      "as.POSIXct"     
#>  [16] "as.POSIXlt"      "assign"          "attr"           
#>  [19] "attributes"      "body"            "c"              
#>  [22] "cbind"           "citation"        "class"          
#>  [25] "colMeans"        "colnames"        "colSums"        
#>  [28] "complete.cases"  "conflicts"       "crossprod"      
#>  [31] "cube.func"       "cut"             "data.frame"     
#>  [34] "date"            "det"             "diag"           
#>  [37] "dim"             "dimnames"        "eigen"          
#>  [40] "environment"     "exp"             "factor"         
#>  [43] "file.exists"     "formals"         "format"         
#>  [46] "func.1"          "func.2"          "function.syntax"
#>  [49] "get"             "getAnywhere"     "getMethod"      
#>  [52] "gl"              "head"            "ifelse"         
#>  [55] "is.array"        "is.character"    "is.complex"     
#>  [58] "is.data.frame"   "is.double"       "is.factor"      
#>  [61] "is.function"     "is.integer"      "is.list"        
#>  [64] "is.logical"      "is.matrix"       "is.na"          
#>  [67] "is.nan"          "is.numeric"      "julian"         
#>  [70] "kable"           "kronecker"       "lapply"         
#>  [73] "length"          "levels"          "list"           
#>  [76] "log"             "lower.tri"       "ls"             
#>  [79] "make.power"      "mapply"          "matrix"         
#>  [82] "max"             "mean"            "message"        
#>  [85] "min"             "months"          "my.cube"        
#>  [88] "my.cube.func"    "my.function"     "my.square"      
#>  [91] "my.variance"     "myfunction"      "names"          
#>  [94] "ncol"            "noise"           "nrow"           
#>  [97] "paste"           "paste0"          "print"          
#> [100] "rbind"           "rbinom"          "readRDS"        
#> [103] "rep"             "return"          "rnorm"          
#> [106] "round"           "rowMeans"        "rownames"       
#> [109] "rowSums"         "runif"           "sample"         
#> [112] "sapply"          "sd"              "seq"            
#> [115] "seq_along"       "session_info"    "sessionInfo"    
#> [118] "setdiff"         "sin"             "solve"          
#> [121] "sort"            "split"           "sqrt"           
#> [124] "square.func"     "stop"            "str"            
#> [127] "strftime"        "strptime"        "structure"      
#> [130] "sum"             "sum.of.squares"  "sum.square.cube"
#> [133] "Sys.Date"        "Sys.time"        "t"              
#> [136] "table"           "tail"            "tapply"         
#> [139] "tax.func"        "unclass"         "unique"         
#> [142] "unlist"          "unname"          "upper.tri"      
#> [145] "var"             "vector"          "warning"        
#> [148] "weekdays"        "which"

R mini BootCamp

Introductory R Workshop

Omer Kara

Tuesday, February 06, 2018

1 Introduction

2 R and RStudio

What is R?

Why to Use R?

RStudio

Downloading and Installing

Getting Help

3 R Basics

Console vs. Editor

Console

Editor

R Objects

Creating Objects

Vectors

Factors

Logicals

Matrices

Arrays

Data Frames

Lists

Working with RStudio

Working Directory

Workspace

File System

Getting Help

Packages

Citation

Operators

4 R Details

Missing Values

Vectors

Factors

Logicals

Matrices

Arrays

Data Frames

Lists

Miscellaneous

Subsetting

Vectors

Factors

Logicals

Matrices

Arrays

Data Frames

Lists

Advanced Subsetting

Conditional Subsetting

Missing Values Problem

Assignment by Subsetting

Miscellaneous

Coercion

Dates and Times

5 R Functions

Built-in R Functions

Function Call

Function Parts

Function Arguments

Argument Matching

Creating R Functions

Syntax

My R Functions

Default Values

Nested Functions 1

Nested Functions 2

Scoping

Case Study 1

Case Study 2

Important R Functions

6 Control Structures

Conditional Statements

Syntax

Case 1

Case 2

Case 3

Case 4