Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconR Cheat Sheet: The One You Should Keep it Handy

R Cheat Sheet: The One You Should Keep it Handy

Last updated:
12th Jun, 2023
Views
Read Time
13 Mins
share image icon
In this article
Chevron in toc
View All
R Cheat Sheet: The One You Should Keep it Handy

Introduction

R programming language’s status has grown from being a mere programming language made for statistical analysis to a more potent all-round tool. The user base of R has also grown over the past few years. It is now being employed by a host of programmers, scholars, and practitioners. In order to make the most out of any programming language, learning how to get help is quintessential because errors are bound to happen.

So, with the knowledge of syntax, the knowledge on how to access the R help files and find help from other sources is critical for success as an R programmer. Now, here is where the R cheat sheet will come in handy. The R cheat sheet contains all the vital functions along with its calls for an easy reference of the programmers.

Learn More: R Tutorial for Beginners: Become an Expert in R Programming

Getting help with the programming language R

Even the best books to introduce people and ease their way into the world of programming in R are not enough on their own. Sometimes one needs to learn and access the R help files. This help file that we keep talking about presents the user with a piece of detailed information on how to use various dependencies in R. How to make use of a particular function, for every built-in function is baked into these help files. The code examples on how to use the specific function are also there on each of these different help pages. 

If you want to access the R help files, to get help on how to use a particular feature, you will have to use any one of the functions that are listed below:

1. ?: The use of a single question mark displays the help files pertaining to any function that the user desires to get help. For example, “?data.frame” would view the page on the R help files that contain the documentation on how to use the function data.frame(). 

2. ??: If you want to search for a particular substring in the R help files, “??” will do the job for you. So, if you want to know the names of a function which contains the word “list” in them, all you have to do is run “??list” and your problem would be solved

3. RSiteSearch(): This function RSiteSearch() essentially does what it is named after. It essentially does an online search about the query that is passed as the parameter for this function. So, RSiteSearch(“linear models”) will compile the search at the website “RSiteSearch” for the string “linear models.”

If you are struggling to get help for R and the baked-in documentations are not sitting well with you, there are many add-on packages that you can install to get all the help that you need with R. Packages like “sos” is available for download which is offered by CRAN. This R package contains some clear and concise function which would make the search for all kinds of queries through all the help files available on the website “RSiteSearch.”

The installation of the package is also reasonably straightforward. All that you need to do is run the code install.packages(“sos”) in the R console, then all that is left is to load the package. The package loading can be done through the use of the library(“sos”).

With the installation of the “sos” package, you will now have access to the function called findFn(). This findFn() function takes in the search parameter as the argument and then returns the list of hundreds of the web pages, which contain the argument that has been passed. So, for example, if you run the function findFn (“regression”) into your R console, you will be faced with a web page containing a lot of information.

The information includes links to many functions that have the word regression in the name, or even if they have the phrase regression in their help text, you will also find a reference to it if you use the function findFn().

Read: 6 Interesting R Project Ideas For Beginners

Data Transformation Cheat Sheet

The Data Transformation in R cheat sheet covers how to use the dplyr package to manipulate tidy data. Tidy data is a data format where each variable is a column, each observation is a row, and each value is a cell. The dplyr package provides a set of functions that make it easy to perform common data manipulation tasks, such as filtering, arranging, selecting, mutating, summarizing, and joining data frames. The following are the various functions –

Single-Table Verbs

This section shows how to use the six core functions of dplyr to manipulate a single data frame. These functions are filter(), arrange(), select(), mutate(), summarise(), and group_by(). Each function takes a data frame as the first argument and returns a modified data frame as the output. For example, the filter() function can be used to subset rows based on a condition, the arrange() function can be used to sort rows by one or more variables, and the select() function can be used to choose or rename variables.

Two-Table Verbs

This section shows how to use the four join functions of dplyr to combine two data frames based on a common variable. These functions are inner_join(), left_join(), right_join(), and full_join(). Each function takes two data frames as the first two arguments and returns a joined data frame as the output. For example, the inner_join() function can be used to keep only the rows that match in both data frames. The left_join() function can be used to keep all the rows from the first data frame and add columns from the second data frame, and the full_join() function can be used to keep all the rows from both data frames and fill in missing values with NA.

Grouped Mutates & Filters

This section shows how to use the group_by() function in combination with other dplyr functions to perform operations on grouped data. The group_by() function can be used to group a data frame by one or more variables and create a grouped data frame. A grouped data frame behaves like a regular one, but any operation applied to it will be performed on each group separately. For example, the summarise() function can be used to calculate summary statistics for each group, the mutate() function can be used to create new variables based on group values, and the filter() function can be used to subset groups based on a condition.

Other Useful Functions

This section shows how to use other useful functions from dplyr or related packages to help with data transformation. These functions include rename(), relocate(), slice(), pull(), across(), if_else(), case_when(), and more. Each function has a different purpose and syntax, but they all work with tidy data and follow the same logic as dplyr functions. For example, the rename() function can be used to rename variables in a data frame, the relocate() function can be used to move variables to different positions in a data frame, and the slice() function can be used to select rows by their position.

How to import Data into R

The following table is handy because it contains some functions which will come in very handy when you want to import data into R:

FunctionWhat It DoesExample
read.table()This function is responsible for reading the data whose columns are not joined together. Usually, this function is employed when the data that you want to read has its columns separated with a comma or a tab. One thing to note is that you can specify the separator yourself alongside some other different arguments which accurately describe the data you want R to read.read.table(file=myfile”, sep=t”,
header=FALSE)
read.csv() This function in crude terms is a very toned down or watered-down version of the read.table() method. This function has been hard-coded to read the data from any CSV file that is being passed into this function as an argument. CSV files are typically spreadsheets and MS Excel documents.read.csv(file=myfile”)
read.csv2()This function is essentially a read.csv() function with minor tweaks. Read.csv2() function has a preset where the separator of the data is a semicolon and the comma serves as the floating or decimal point. read.csv2(file=myfile”,
header=FALSE)
read.delim()This function is used when the main motive is to read the files which have been delimited. The default separator that is being used here is tab.read.delim(file=myfile”,
header=TRUE)
scan()This function gives you a finer and much more precise control over the data that you want to be read by R if the data in question is not tabular.scan(“myfile”,skip=1,
nmax=10)
readLines()This function is used when reading one line at a time from a text file is the required job we want the program to perform.readLines(“myfile”)
read.fwfIf the data you have has dates in fixed-width-format then you should use this function because it reads the dates in the fixed-width-format. In simpler words, if the data that you have has a fixed number of characters in each column then this function should be used.read.fwf(“myfile”,
widths=c(1,2,3)

The host of function that you will gain access to after running that line of code and the purpose that they serve are listed below:

Top Data Science Skills You Should Learn

FunctionWhat it doesExample
read.spssThis function takes in the name of an SPSS file as the argument and reads it into the R program.read.spss(“myfile”)
read.dtaThis function takes in the input of the file name of Stata binary format and it reads it into the R program.read.dta(“myfile”)
read.xportThis function takes the argument of the name of a SAS export file and it reads the file into the R program.read.export(“myfile”)

Source

Also check out: Why Learn R? Top 8 Reasons To Learn R

Explore our Popular Data Science Online Certifications

upGrad’s Exclusive Data Science Webinar for you –

Transformation & Opportunities in Analytics & Insights

Different data types and the basic manipulation of the tables

1. There are basically three data types that are of major importance when you are programming in R. These three types are namely: numeric, character, and a factor. You can quickly do a search for which kind of data type is this, or you can also typecast by using the following two commands, respectively, is.factor() and as.factor().

2. If you happen to import a table whose variables contain one or more than one entries, which are characters, then R will automatically cast the table as the datatype of the factor. However, that being said you can still cast the data into numeric by forcing R, using the command= as.numeric(as.character(dat1$VAR1)).

3. The command names (dat1)=c(“ID”, “X”, “Y”, “Z”) actually renames the variable in your dataset. You will have to keep in mind and the vector length should match the number of variables that you have; otherwise, you will run into an error.

4. The command fix (dat2) opens the entire data you have in a spreadsheet document where you can edit the cells with a simple double-clicking in the cells.

5. If the data you have only contains numeric values in the table, you can take the transposition of the table. Use, dat2 = t(dat1), and the table named as dat2 will contain the transpose (making all the rows into columns) of the table of data contained in dat1.

Our learners also read: Top Python Free Courses

Read our popular Data Science Articles

Tips on how to create random data and how to do random sampling

1. The function rnorm(10) takes in the argument of 10 and creates ten random samples. These random samples are generated from a normal distribution, which has a zero mean, and the standard deviation of the dataset happens to be 1.

2. The function runif(10) takes ten different random samples to create a distribution that is uniform and whose value is between zero and one.

3. The function round(rnorm(10)*3+15) takes ten samples, which are random from a normal distribution whose mean is 15, and the standard deviation that it has is of 3 and the floating points which are there in the data are removed with the help of the rounding function.

4. The function round(runif(10)*5+15) gives the user back with random integers, which has the value between the values of 15 and 20. The distribution of these values will be uniform.

5. The function sample(c(“A”, “B”, “C”), 10, replace=TRUE) samples and creates a random sample from any vector that has been passed as the argument to this function. 

Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Tips on how to transform data that is inside the data table

1. The function call of the transform function done like this dat2=transform(dat1, VAR1=VAR1*0.4), multiplies the values stored in VAR1 with 0,4 and then re-assign the multiplied value to VAR1 again.

2. The call of the function transform can also be used to create variables with specific dependencies on existing variables. If you call the function like this dat2=transform(dat1, VAR2=VAR1*2), it will create a new variable with the name of VAR2, which will contain the value of VAR1 multiplied with a factor of two.

3. You can also call the transform function to modify the values at any specific site that you require. For performing that task, you will have to call the function like dat2=transform(dat1, VAR1=ifelse(VAR3== “Site 1”, VAR1*0.4, VAR1)). The call, as mentioned earlier of the transform function, multiplies the data stored in VAR1 for the data entries, which are the place known as site 1. The value of the variable VAR1 remains the same everywhere else.

Read : 8 Astonishing Data Science Projects in R For Beginners

Conclusion

The world of programming has seen a boom of languages over the past few years. These programming languages are aimed to eradicate and focus its attention on one aspect of computing. The languages like R have a robust statistical and data science-centric approach mainly because of the baked-in features that this language possesses.

While working in any programming language, having every command on your fingertips is not an easy task. Now, this is where the R cheat sheet comes to the rescue. One thing to remember always is that the best R cheat sheet is the one that you create. 

Profile

Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1What is the meaning of C in the R programming language?

The C function stands for ‘Combine’ in the R programming language. This function is utilized for getting the output by passing parameters in the function. You can extract data in three different ways with the use of C in R: using the c(row) command for extracting rows, the c(column) command for extracting columns, and the c(row, column) command for extracting both columns and rows.

Here, you have to provide the value of rows and columns in the function from the dataset that you are utilizing. The function will return a vector in return to this command. Other than that, you can use the c() function for combining two different vectors.

2What are R functions?

Functions are self-contained modules of code that are used for performing a specific task. Usually, functions take in a particular data structure like value, dataframe, vector, or anything and process it for returning a result. Arguments are passed in these functions in parenthesis for specifying the requirements.

There are two types of functions being used in R: basic and user-defined. The basic functions are the ones that are already available in the R programming language. You can access these functions from various packages or libraries that are available in R. Every function is used for a different purpose and to complete a specific task. Some of the basic functions in R are sqrt(), round(), getwd(), etc. Since it is not possible to complete every action with the help of basic functions, you need to take the help of the user-defined functions by writing your own code to perform certain customized tasks. These functions are developed when you have to perform certain actions multiple times. A function can make this easier for you.

3What are some of the key features of the R programming language?

There are plenty of ways R can help out data analysts and data scientists. Some of its key features help it to stand out from the general crowd of statistical languages. The key features are strong graphical capabilities, the ability to perform complex statistical calculations, running code without the need of any compiler, data wrangling, data handling and storage capacities, and the ability to generate reports in the desired formats.

Explore Free Courses

Suggested Blogs

Priority Queue in Data Structure: Characteristics, Types & Implementation
57467
Introduction The priority queue in the data structure is an extension of the “normal” queue. It is an abstract data type that contains a
Read More

by Rohit Sharma

15 Jul 2024

An Overview of Association Rule Mining & its Applications
142458
Association Rule Mining in data mining, as the name suggests, involves discovering relationships between seemingly independent relational databases or
Read More

by Abhinav Rai

13 Jul 2024

Data Mining Techniques & Tools: Types of Data, Methods, Applications [With Examples]
101684
Why data mining techniques are important like never before? Businesses these days are collecting data at a very striking rate. The sources of this eno
Read More

by Rohit Sharma

12 Jul 2024

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
58114
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. The full form
Read More

by Rohit Sharma

11 Jul 2024

Top 7 Data Types of Python | Python Data Types
99373
Data types are an essential concept in the python programming language. In Python, every value has its own python data type. The classification of dat
Read More

by Rohit Sharma

11 Jul 2024

What is Decision Tree in Data Mining? Types, Real World Examples & Applications
16859
Introduction to Data Mining In its raw form, data requires efficient processing to transform into valuable information. Predicting outcomes hinges on
Read More

by Rohit Sharma

04 Jul 2024

6 Phases of Data Analytics Lifecycle Every Data Analyst Should Know About
82805
What is a Data Analytics Lifecycle? Data is crucial in today’s digital world. As it gets created, consumed, tested, processed, and reused, data goes
Read More

by Rohit Sharma

04 Jul 2024

Most Common Binary Tree Interview Questions & Answers [For Freshers & Experienced]
10471
Introduction Data structures are one of the most fundamental concepts in object-oriented programming. To explain it simply, a data structure is a par
Read More

by Rohit Sharma

03 Jul 2024

Data Science Vs Data Analytics: Difference Between Data Science and Data Analytics
70271
Summary: In this article, you will learn, Difference between Data Science and Data Analytics Job roles Skills Career perspectives Which one is right
Read More

by Rohit Sharma

02 Jul 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon