Easy Tutorial

❮ Home R Bar Charts ❯

R CSV File

As a professional tool for statistics, R would be meaningless if it could only manually import and export data, so R supports batch data retrieval from mainstream tabular storage formats such as CSV, Excel, XML, etc.

CSV Table Interaction

CSV (Comma-Separated Values, sometimes also referred to as Character-Separated Values because the delimiter can also be a character other than a comma) is a highly popular file format for tabular storage, suitable for storing medium or small-scale data.

Since most software supports this file format, it is commonly used for data storage and interaction.

CSV is essentially text, and its file format is extremely simple: data is saved line by line in text, with each record being separated by a delimiter into fields, and each record having the same sequence of fields.

Below is a simple sites.csv file (stored in the same directory as the test program):

id,name,url,likes
1,Google,www.google.com,111
2,tutorialpro,www.tutorialpro.org,222
3,Taobao,www.taobao.com,333

CSV uses commas to separate columns. If the data contains commas, the entire data block should be enclosed in double quotes.

Note: Text containing non-English characters should pay attention to the encoding used for saving. Since many computers commonly use UTF-8 encoding, I use UTF-8 for saving.

Note: The last line of the CSV file needs to be left blank, otherwise, the program will issue a warning message.

Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'sites.csv'

Reading CSV Files

Next, we can use the read.csv() function to read data from a CSV file:

Example

data <- read.csv("sites.csv", encoding="UTF-8")
print(data)

If the encoding attribute is not set, the read.csv function will default to reading using the operating system's default text encoding. If you are using a Chinese version of Windows and have not set the system's default encoding, the system's default encoding should be GBK. Therefore, it is advisable to standardize text encoding to prevent errors.

Executing the above code outputs the result:

id   name            url likes
1  1 Google www.google.com   111
2  2 tutorialpro www.tutorialpro.org   222
3  3 Taobao www.taobao.com   333

The read.csv() function returns a data frame, which allows for convenient statistical processing of the data. The following example checks the number of rows and columns:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

print(is.data.frame(data))  # Check if it is a data frame
print(ncol(data))  # Number of columns
print(nrow(data))  # Number of rows

Executing the above code outputs the result:

[1] TRUE
[1] 4
[1] 3

The following example calculates the maximum value in the likes field:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

# Maximum likes
like <- max(data$likes)
print(like)

Executing the above code outputs the result:

[1] 333

We can also specify search conditions, similar to SQL WHERE clauses, to query data using the subset() function.

The following example searches for data where likes is 222:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

# Data where likes is 222
retval <- subset(data, likes == 222)
print(retval)

Executing the above code outputs the result:

id   name            url likes
2  2 tutorialpro www.tutorialpro.org   222

Note: The equality condition uses ==.

Multiple conditions are separated by &. The following example searches for data where likes are greater than 1 and the name is tutorialpro:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

# Data where likes are greater than 1 and the name is tutorialpro
retval <- subset(data, likes > 1 & name == "tutorialpro")
print(retval)

Executing the above code outputs the result:

id   name            url likes
2  2 tutorialpro www.tutorialpro.org   222

Note: The equality condition uses ==.

retval <- subset(data, likes > 1 & name=="tutorialpro")
print(retval)

Executing the above code outputs the following result:

id   name            url likes
2  2 tutorialpro www.tutorialpro.org   222

Save as CSV File

R can use the write.csv() function to save data as a CSV file.

Following the previous example, we will save the data with 222 likes to the file tutorialpro.csv:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

# Data with 222 likes
retval <- subset(data, likes == 222)

# Write to a new file
write.csv(retval,"tutorialpro.csv")
newdata <- read.csv("tutorialpro.csv")
print(newdata)

Executing the above code outputs the following result:

X id   name            url likes
1 2  2 tutorialpro www.tutorialpro.org   222

The X comes from the dataset newper, and can be removed with the parameter row.names = FALSE:

Example

data <- read.csv("sites.csv", encoding="UTF-8")

# Data with 222 likes
retval <- subset(data, likes == 222)

# Write to a new file
write.csv(retval,"tutorialpro.csv", row.names = FALSE)
newdata <- read.csv("tutorialpro.csv")
print(newdata)

Executing the above code outputs the following result:

id   name            url likes
1  2 tutorialpro www.tutorialpro.org   222

After executing this, we can see that the tutorialpro.csv file has been generated.

❮ Home R Bar Charts ❯