R CSV File
As a professional tool for statistics, R would be meaningless if it could only manually import and export data, so R supports batch data retrieval from mainstream tabular storage formats such as CSV, Excel, XML, etc.
CSV Table Interaction
CSV (Comma-Separated Values, sometimes also referred to as Character-Separated Values because the delimiter can also be a character other than a comma) is a highly popular file format for tabular storage, suitable for storing medium or small-scale data.
Since most software supports this file format, it is commonly used for data storage and interaction.
CSV is essentially text, and its file format is extremely simple: data is saved line by line in text, with each record being separated by a delimiter into fields, and each record having the same sequence of fields.
Below is a simple sites.csv file (stored in the same directory as the test program):
id,name,url,likes
1,Google,www.google.com,111
2,tutorialpro,www.tutorialpro.org,222
3,Taobao,www.taobao.com,333
CSV uses commas to separate columns. If the data contains commas, the entire data block should be enclosed in double quotes.
Note: Text containing non-English characters should pay attention to the encoding used for saving. Since many computers commonly use UTF-8 encoding, I use UTF-8 for saving.
Note: The last line of the CSV file needs to be left blank, otherwise, the program will issue a warning message.
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'sites.csv'
Reading CSV Files
Next, we can use the read.csv() function to read data from a CSV file:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
print(data)
If the encoding attribute is not set, the read.csv function will default to reading using the operating system's default text encoding. If you are using a Chinese version of Windows and have not set the system's default encoding, the system's default encoding should be GBK. Therefore, it is advisable to standardize text encoding to prevent errors.
Executing the above code outputs the result:
id name url likes
1 1 Google www.google.com 111
2 2 tutorialpro www.tutorialpro.org 222
3 3 Taobao www.taobao.com 333
The read.csv() function returns a data frame, which allows for convenient statistical processing of the data. The following example checks the number of rows and columns:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
print(is.data.frame(data)) # Check if it is a data frame
print(ncol(data)) # Number of columns
print(nrow(data)) # Number of rows
Executing the above code outputs the result:
[1] TRUE
[1] 4
[1] 3
The following example calculates the maximum value in the likes field:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
# Maximum likes
like <- max(data$likes)
print(like)
Executing the above code outputs the result:
[1] 333
We can also specify search conditions, similar to SQL WHERE clauses, to query data using the subset() function.
The following example searches for data where likes is 222:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
# Data where likes is 222
retval <- subset(data, likes == 222)
print(retval)
Executing the above code outputs the result:
id name url likes
2 2 tutorialpro www.tutorialpro.org 222
Note: The equality condition uses ==
.
Multiple conditions are separated by &
. The following example searches for data where likes are greater than 1 and the name is tutorialpro:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
# Data where likes are greater than 1 and the name is tutorialpro
retval <- subset(data, likes > 1 & name == "tutorialpro")
print(retval)
Executing the above code outputs the result:
id name url likes
2 2 tutorialpro www.tutorialpro.org 222
Note: The equality condition uses ==
.
retval <- subset(data, likes > 1 & name=="tutorialpro")
print(retval)
Executing the above code outputs the following result:
id name url likes
2 2 tutorialpro www.tutorialpro.org 222
Save as CSV File
R can use the write.csv() function to save data as a CSV file.
Following the previous example, we will save the data with 222 likes to the file tutorialpro.csv:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
# Data with 222 likes
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval,"tutorialpro.csv")
newdata <- read.csv("tutorialpro.csv")
print(newdata)
Executing the above code outputs the following result:
X id name url likes
1 2 2 tutorialpro www.tutorialpro.org 222
The X comes from the dataset newper, and can be removed with the parameter row.names = FALSE:
Example
data <- read.csv("sites.csv", encoding="UTF-8")
# Data with 222 likes
retval <- subset(data, likes == 222)
# Write to a new file
write.csv(retval,"tutorialpro.csv", row.names = FALSE)
newdata <- read.csv("tutorialpro.csv")
print(newdata)
Executing the above code outputs the following result:
id name url likes
1 2 tutorialpro www.tutorialpro.org 222
After executing this, we can see that the tutorialpro.csv file has been generated.