R Data Frame
A data frame can be understood as the "table" we commonly refer to.
A data frame is a data structure in R language, which is a special two-dimensional list.
Each column in a data frame has a unique column name and all columns are of equal length. The data type within the same column must be consistent, but different columns can have different data types.
To create a data frame in R, the data.frame()
function is used with the following syntax:
data.frame(..., row.names = NULL, check.rows = FALSE,
check.names = TRUE, fix.empty.names = TRUE,
stringsAsFactors = default.stringsAsFactors())
- ...: Column vectors, which can be of any type (character, numeric, logical), typically represented in the form of
tag = value
, or simplyvalue
. - row.names: Row names, default is
NULL
, can be set to a single number, string, or a vector of strings and numbers. - check.rows: Checks if the row names and lengths are consistent.
- check.names: Checks if the variable names in the data frame are valid.
- fix.empty.names: Determines whether unnamed parameters should be automatically named.
- stringsAsFactors: Boolean value, whether characters should be converted to factors. The default value is
TRUE
, but can be modified by setting the option (stringsAsFactors=FALSE
).
Below is an example of creating a simple data frame containing names, employee IDs, and monthly salaries:
Example
table = data.frame(
姓名 = c("张三", "李四"),
工号 = c("001","002"),
月薪 = c(1000, 2000)
)
print(table) # View the table data
Executing the above code outputs:
姓名 工号 月薪
1 张三 001 1000
2 李四 002 2000
The structure of the data frame can be displayed using the str()
function:
Example
table = data.frame(
姓名 = c("张三", "李四"),
工号 = c("001","002"),
月薪 = c(1000, 2000)
)
# Get the structure
str(table)
Executing the above code outputs:
'data.frame': 2 obs. of 3 variables:
$ 姓名: chr "张三" "李四"
$ 工号: chr "001" "002"
$ 月薪: num 1000 2000
The summary()
function can display a summary of the data frame:
Example
table = data.frame(
姓名 = c("张三", "李四"),
工号 = c("001","002"),
月薪 = c(1000, 2000)
)
# Display summary
print(summary(table))
Executing the above code outputs:
姓名 工号 月薪
Length:2 Length:2 Min. :1000
Class :character Class :character 1st Qu.:1250
Mode :character Mode :character Median :1500
Mean :1500
3rd Qu.:1750
Max. :2000
We can also extract specified columns:
Example
table = data.frame(
姓名 = c("张三", "李四"),
工号 = c("001","002"),
月薪 = c(1000, 2000)
)
# Extract specified columns
result <- data.frame(table$姓名, table$月薪)
print(result)
Executing the above code outputs:
table.姓名 table.月薪
1 张三 1000
2 李四 2000
The following example shows the first two rows:
Example
table = data.frame(
姓名 = c("张三", "李四", "王五"),
工号 = c("001","002","003"),
This is a Chinese to English translation, please provide the English translation for this text. Do not provide any explanations or text apart from the translation.
Chinese: 月薪 = c(1000, 2000, 3000) print(table)
Extract the first two rows
print("---Output the first two rows----") result <- table[1:2,] print(result)
Executing the above code outputs the following result:
Name ID Salary
1 Zhang San 001 1000
2 Li Si 002 2000
3 Wang Wu 003 3000
[1] "---Output the first two rows----"
Name ID Salary
1 Zhang San 001 1000
2 Li Si 002 2000
We can read data from a specified row and column using a coordinate-like format. Below, we read data from the 1st and 2nd columns of the 2nd and 3rd rows:
Example
table = data.frame(
Name = c("Zhang San", "Li Si", "Wang Wu"),
ID = c("001", "002", "003"),
Salary = c(1000, 2000, 3000)
)
# Read data from the 1st and 2nd columns of the 2nd and 3rd rows:
result <- table[c(2, 3), c(1, 2)]
print(result)
Executing the above code outputs the following result:
Name ID
2 Li Si 002
3 Wang Wu 003
Expanding Data Frames
We can expand an existing data frame. Below is an example where we add a Department column:
Example
table = data.frame(
Name = c("Zhang San", "Li Si", "Wang Wu"),
ID = c("001", "002", "003"),
Salary = c(1000, 2000, 3000)
)
# Add a Department column
table$Department <- c("Operations", "Technology", "Editing")
print(table)
Executing the above code outputs the following result:
Name ID Salary Department
1 Zhang San 001 1000 Operations
2 Li Si 002 2000 Technology
3 Wang Wu 003 3000 Editing
We can use the cbind() function to combine multiple vectors into a data frame:
Example
# Create vectors
sites <- c("Google", "tutorialpro", "Taobao")
likes <- c(222, 111, 123)
url <- c("www.google.com", "www.tutorialpro.org", "www.taobao.com")
# Combine vectors into a data frame
addresses <- cbind(sites, likes, url)
# View the data frame
print(addresses)
Executing the above code outputs the following result:
sites likes url
[1,] "Google" "222" "www.google.com"
[2,] "tutorialpro" "111" "www.tutorialpro.org"
[3,] "Taobao" "123" "www.taobao.com"
To merge two data frames, we can use the rbind() function:
Example
table = data.frame(
Name = c("Zhang San", "Li Si", "Wang Wu"),
ID = c("001", "002", "003"),
Salary = c(1000, 2000, 3000)
)
newtable = data.frame(
Name = c("Xiao Ming", "Xiao Bai"),
ID = c("101", "102"),
Salary = c(5000, 7000)
)
# Merge two data frames
result <- rbind(table, newtable)
print(result)
Executing the above code outputs the following result:
Name ID Salary
1 Zhang San 001 1000
2 Li Si 002 2000
3 Wang Wu 003 3000
4 Xiao Ming 101 5000
5 Xiao Bai 102 7000