Factors are used to store different categories of data types, such as gender, which can be divided into male and female, and age, which can be divided into minors and adults.
In R, factors are created using the factor()
function, with a vector as the input parameter.
The syntax for the factor()
function is as follows:
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)
Parameter descriptions:
x: A vector.
levels: Specifies the level values; if not specified, it is determined by the distinct values of x.
labels: Labels for the levels; if not specified, it uses the corresponding strings of the level values.
exclude: Characters to be excluded.
ordered: A logical value to specify whether the levels are ordered.
nmax: The upper limit of the number of levels.
The following example converts a character vector into a factor:
Example
x <- c("男", "女", "男", "男", "女")
sex <- factor(x)
print(sex)
print(is.factor(sex))
Executing the above code outputs:
[1] 男 女 男 男 女
Levels: 男 女
[1] TRUE
The following example sets the factor levels to c('男','女')
:
Example
x <- c("男", "女", "男", "男", "女", levels=c('男','女'))
sex <- factor(x)
print(sex)
print(is.factor(sex))
Executing the above code outputs:
levels1 levels2
男 女 男 男 女 男 女
Levels: 男 女
[1] TRUE
Factor Level Labels
Next, we use the labels
parameter to add labels for each factor level. The character order of the labels
parameter must match the character order of the levels
parameter, for example:
Example
sex = factor(c('f','m','f','f','m'), levels=c('f','m'), labels=c('female','male'), ordered=TRUE)
print(sex)
Executing the above code outputs:
[1] female male female female male
Levels: female < male
Generating Factor Levels
We can use the gl()
function to generate factor levels. The syntax is as follows:
gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE)
Parameter descriptions:
n: Sets the number of levels.
k: Sets the number of times each level repeats.
length: Sets the length.
labels: Sets the values of the levels.
ordered: Sets whether the levels are ordered, a boolean value.
Example
v <- gl(3, 4, labels = c("Google", "tutorialpro", "Taobao"))
print(v)
Executing the above code outputs:
[1] Google Google Google Google tutorialpro tutorialpro tutorialpro tutorialpro Taobao Taobao
[11] Taobao Taobao
Levels: Google tutorialpro Taobao