As a software programmer and coder, you must be aware of the need for variables to store data. These variables are reserved in different memory locations to store values. Thus, creating a variable means reserving some space in memory. It is this data that is arranged by data structures to be efficiently used in a computer.
Unlike popular programming languages such as C and Java, R has no variables to be declared as data. R has R-objects (data structures) that become the datatype of the required variable. There are various types of data structures in R. But first, let’s understand what data structures are!
What are Data Structures?
In R, data structures are a tool that holds multiple values. Note that in R programming, data with single values are barely ever used. It is more viable to use R to club multiple numbers, words, or values of different types together. This is where data structures come into the picture. They group these multiple values together to make it easier to work with large amounts of data at once.
Data structures are composed of data types that define the kind of data that is stored in zvalue. For instance, the number 13 is a numeric data type, while “thirteen” has a character data type, also called string.
Now that you’ve got a hold of this, let’s see the different data structure types.
Types of Data Structures
In order to make data analysis and operations easy and efficient, there are five major types of data structures in R programming.
Let’s take a look at each of them in detail.
The function of R Vectors is to group multiple values of the same data type. It is the most basic type of data structure in R and has two parts: Atomic Vectors and Lists. Following are their common properties:
- Type of function (what it is)
- Length of function (number of elements)
- Attribute of function (additional arbitrary metadata)
Now, while Atomic Vectors are meant for clubbing the same data type, lists can group different data types. There are four types of Atomic Vectors:
- Numeric Data Type
- Integer Data Type
- Character Data Type
- Logical Data Type
You can create Vectors using the function c().
If you run the above code, a vector by the name ‘thisVector’ will be created, containing all numbers from 1 to 30.
To store character values in a Vector, you will have to use double quotes as such:
While you can store different types of data in a vector, it is advised that you don’t as all values get converted to a character type.
As mentioned above, Lists can contain any type of data elements – strings, numbers, vectors, and even another list. For example, you can create a list of 80 numbers, 30 words, and 42 vectors. The function to be used is a list().
Since Lists can have other lists as well, they are sometimes called recursive Vectors. This is why they’re very different from Atomic Vectors.
Simply put, a factor is a type of vector where only predefined values can be stored. It is primarily used to store categorical data. They categorize column values, such as “Male”, “Female”, “TRUE”, “FALSE”, etc.
Factors are heterogeneous in the sense that both strings and integers can be stored in them. To create factors, use the factor() function. They are very useful when there are a lot of possible values for a particular variable and you know all of them.
In R programming, character vectors automatically get converted into vector. You can use stringsAsFactors = FALSE in order to suppress this and then manually convert each character vector to factors.
- Data Frames
This data structure in R is used to represent data in a tabular form to make data analysis easier. It contains equal-length vectors, thus forming a two-dimensional structure. There are columns containing values of a variable and rows containing a set of values of each column.
Naturally, data frames can store values of different data types. However, each column must have the same number of elements. For example, if column 1 has 5 elements, column 2 should also have 5 values.
Data frames have some special characteristics:
- No column names should be left empty.
- Each row’s name must be unique.
- You can store numeric, factor, or character type data in a data frame.
- All columns must contain the same number of data elements.
All datasets that are imported in R are automatically stores as data frames.
Matrix data structure in R stands somewhere between Vectors and Data Frames. Matrices are two-dimensional data sets that can contain elements of only the same data type. You can create a matrix using the function matrix ().
Syntax: matrix(data, nrow, ncol, byrow, dimnames)
data = input elements as a vector
nrow = number of rows
ncol = number of columns
byrow = row-wise arrangement
dimnames = names of columns/rows
Even though factors look and behave like character vectors, they are, in fact, integers. To convert factors to stings, use functions like gsub() and grepl(). Using nchar() will shoot an error.
Arrays are multi-dimensional matrices. A matrix is a special case of arrays in that that it has two dimensions. While matrices are commonly used, arrays are very rare.
The function to create an array is an array().
Testing whether an object is a matrix or array is pretty simple. Just use is.matrix() or is.array() function.
Here are some questions that you can try answering now that you’ve acquired sufficient knowledge about the data structures in R.
- What are the attributes of data frames?
- Can data frames contain 0 rows or columns?
- What are the different types of Atomic Vectors in R?
- What is the difference between Atomic Vectors and Lists?
- Create a 4X3 matrix in R.
Send your answers to us via email or write them in the comments below!
To utilize the R language adequately, a decent comprehension of data types, data structures and how they work is significant. These items are the premise of all activities in R. For instance, a typical problem encountered by most programmers is object transformations, which can be disposed of with a good knowledge of R objects. It is imperative to note that in R everything is an object and operations have proceeded as function calls.
Data structures in R can be sorted out in two different ways. The principal method for sorting out data structures is by their dimensionality which can be 1, 2, or n dimensionality and the subsequent route is by their nature of elements which can be homogeneous or heterogeneous. Every one of the elements in a homogeneous structure must be of a similar sort while in a heterogeneous structure, elements with various kinds are permitted.
After having learned the basics of data structures in R, you will find programming in R much easier. Data structures are the fundamentals of R. The six most commonly used data structures are mentioned above. It is important to remember the different characteristics of each type and implement it to analyze data and carry out its operations.
If you are interested to know more about Big Data, check out our PG Diploma in Software Development Specialization in Big Data program which is designed for working professionals and provides 7+ case studies & projects, covers 14 programming languages & tools, practical hands-on workshops, more than 400 hours of rigorous learning & job placement assistance with top firms.