MongoDB Concept Explanation
Regardless of the database we are learning, it is essential to understand the fundamental concepts. In MongoDB, the basic concepts are document, collection, and database. Below, we will introduce each of these concepts one by one.
The table below will help you understand some MongoDB concepts more easily:
SQL Term/Concept | MongoDB Term/Concept | Explanation/Description |
---|---|---|
database | database | Database |
table | collection | Database table/collection |
row | document | Data record row/document |
column | field | Data field/domain |
index | index | Index |
table joins | Table joins, MongoDB does not support | |
primary key | primary key | Primary key, MongoDB automatically sets the _id field as the primary key |
Through the following example, we can also understand some MongoDB concepts more intuitively:
Database
Multiple databases can be created within a single MongoDB instance.
The default database in MongoDB is "db", which is stored in the data directory.
A single MongoDB instance can hold multiple independent databases, each with its own collections and permissions. Different databases are stored in separate files.
The "show dbs" command displays a list of all databases.
$ ./mongo
MongoDB shell version: 3.0.6
connecting to: test
> show dbs
local 0.078GB
test 0.078GB
>
The "db" command displays the current database object or collection.
$ ./mongo
MongoDB shell version: 3.0.6
connecting to: test
> db
test
>
The "use" command can connect to a specified database.
> use local
switched to db local
> db
local
>
In the above example commands, "local" is the database you are connecting to.
In the next section, we will detail the use of commands in MongoDB.
Database names are used to identify databases. Database names can be any UTF-8 string that meets the following conditions:
- Cannot be an empty string ("").
- Must not contain ' ' (space), '.', '$', '/', '\', or '\0' (null character).
- Should be all lowercase.
- Maximum of 64 bytes.
Some database names are reserved and can be accessed directly for special purposes.
- admin: From a permissions perspective, this is the "root" database. If a user is added to this database, the user automatically inherits permissions for all databases. Certain server-side commands can only be run from this database, such as listing all databases or shutting down the server.
- local: This database is never replicated and can be used to store any collection limited to a single server.
- config: When MongoDB is used for sharding, the config database is used internally to store sharding-related information.
Document
A document is a set of key-value pairs (BSON). MongoDB documents do not require the same fields, and the same fields do not need the same data types, which is a significant difference from relational databases and a prominent feature of MongoDB.
A simple document example is as follows:
{"site":"www.tutorialpro.org", "name":"tutorialpro.org"}
The table below lists the corresponding terms between RDBMS and MongoDB:
RDBMS | MongoDB |
---|---|
Database | Database |
Table | Collection |
Row | Document |
Column | Field |
Table join | Embedded document |
Primary key | Primary key (MongoDB provides key as _id) |
Database server and client | |
--- | |
Mysqld/Oracle | mongod |
mysql/sqlplus | mongo |
It is important to note:
- The key/value pairs in a document are ordered.
- The values in a document can be not only strings in double quotes but also other data types (or even entire embedded documents).
- MongoDB distinguishes between types and is case-sensitive.
- MongoDB documents cannot have duplicate keys.
- The keys in a document are strings. With few exceptions, keys can use any UTF-8 characters.
Document key naming conventions:
- Keys cannot contain '\0' (null character). This character is used to denote the end of a key.
- '.' and '$' have special meanings and can only be used in specific contexts.
- Keys that start with an underscore "_" are reserved (not strictly required).
Collection
A collection is a group of MongoDB documents, similar to a table in an RDBMS (Relational Database Management System). Collections exist within the database and do not have a fixed structure. This means you can insert data of different formats and types into a collection, but typically the data inserted into a collection has some relevance to each other.
For example, we can insert documents with different data structures into a collection:
{"site":"www.baidu.com"}
{"site":"www.google.com","name":"Google"}
{"site":"www.tutorialpro.org","name":"tutorialpro.org","num":5}
When the first document is inserted, the collection is created.
Valid Collection Names
- Collection names cannot be empty strings "".
- Collection names cannot contain the \0 character (null character), which indicates the end of the collection name.
- Collection names cannot start with "system.", as this is a reserved prefix for system collections.
- User-created collection names cannot contain reserved characters. Some drivers do support including these characters in collection names, as certain system-generated collections contain them. Unless you are accessing such system-created collections, do not include $ in the name.
Example:
db.col.findOne()
Capped Collections
Capped collections are fixed-size collections.
They have high performance and queue expiration features (expiring based on insertion order), similar to the "RRD" concept.
Capped collections automatically maintain the insertion order of objects. They are ideal for logging functionality. Unlike standard collections, you must explicitly create a capped collection and specify its size in bytes. The storage space for the collection is pre-allocated.
Capped collections store documents in insertion order on disk, ensuring that the positions of documents on disk remain unchanged when updated, as long as the updated document does not exceed the size of the original document.
Since capped collections insert documents in insertion order rather than using indexes to determine the insertion position, this improves the efficiency of data insertion. MongoDB's operation log file oplog.rs uses capped collections.
Note that the specified storage size includes the database header information.
db.createCollection("mycoll", {capped:true, size:100000})
- You can add new objects to a capped collection.
- You can update objects, but they cannot increase in storage space. If they do, the update will fail.
- You cannot delete individual documents in a capped collection; you can use the drop() method to delete all rows in the collection.
- After deletion, you must explicitly recreate the collection.
- On a 32-bit machine, the maximum storage for a capped collection is 1e9 (1x10^9).
Metadata
Database information is stored in collections. They use the system namespace:
dbname.system.*
In MongoDB, the namespace <dbname>.system.* includes special collections that contain various system information, as follows:
Collection Namespace | Description |
---|---|
dbname.system.namespaces | Lists all namespaces. |
dbname.system.indexes | Lists all indexes. |
dbname.system.profile | Contains database profile information. |
dbname.system.users | Lists all users who can access the database. |
dbname.local.sources | Contains server information and status for replication slaves. |
There are restrictions on modifying objects in system collections.
You can insert data into {{system.indexes}} to create indexes, but otherwise, the table information is immutable (special drop index commands will automatically update related information).
{{system.users}} is mutable. {{system.profile}} can be deleted.
MongoDB Data Types
The following table lists commonly used data types in MongoDB.
Data Type | Description |
---|---|
String | String. The most commonly used data type for storing data. In MongoDB, only UTF-8 encoded strings are valid. |
Integer | Integer value. Used for storing numerical values. Depending on the server, it can be 32-bit or 64-bit. |
Boolean | Boolean value. Used for storing boolean values (true/false). |
Double | Double precision floating-point value. Used for storing floating-point values. |
Min/Max keys | Compares a value against the lowest and highest values of BSON (binary JSON) elements. |
Array | Used to store arrays or lists or multiple values under a single key. |
Timestamp | Timestamp. Records the specific time of document modification or addition. |
Object | Used for embedded documents. |
Null | Used to create null values. |
Symbol | Symbol. This data type is essentially equivalent to the string type, but it is generally used for languages that adopt special symbol types. |
Date | Date and time. Stores the current date or time in UNIX time format. You can specify your own date and time: create a Date object and pass in the year, month, and day information. |
Object ID | Object ID. Used to create the ID of a document. |
Binary Data | Binary data. Used to store binary data. |
Code | Code type. Used to store JavaScript code within a document. |
Regular expression | Regular expression type. Used to store regular expressions. |
Here are explanations of several important data types.
ObjectId
ObjectId is similar to a unique primary key, allowing for quick generation and sorting, consisting of 12 bytes, with the following meanings:
- The first 4 bytes represent the creation UNIX timestamp, in UTC time, which is 8 hours behind Beijing time.
- The next 3 bytes are the machine identifier code.
- The following 2 bytes are composed of the process ID (PID).
- The last 3 bytes are a random number.
Every document stored in MongoDB must have an _id key. The value of this key can be of any type, and by default, it is an ObjectId object.
Since the ObjectId contains the creation timestamp, you do not need to save a timestamp field for your document. You can obtain the creation time of the document using the getTimestamp function:
> var newObject = ObjectId()
> newObject.getTimestamp()
ISODate("2017-11-25T07:21:10Z")
Converting ObjectId to a string:
> newObject.str
5a1919e63df83ce79df8b38f
String
BSON strings are UTF-8 encoded.
Timestamp
BSON has a special timestamp type for internal use by MongoDB, which is not related to the regular date type. The timestamp value is a 64-bit value, where:
- The first 32 bits are a time_t value (the number of seconds since the Unix epoch).
- The last 32 bits are an incrementing ordinal number within a second.
In a single mongod instance, timestamp values are usually unique.
In a replica set, the oplog has a ts field. The value in this field uses the BSON timestamp to represent the operation time.
The BSON timestamp type is primarily used for internal MongoDB use. In most application development scenarios, you can use the BSON date type.
Date
Represents the number of milliseconds since the Unix epoch (January 1, 1970). The date type is signed; negative numbers indicate dates before 1970.
> var mydate1 = new Date() // Greenwich Mean Time
> mydate1
ISODate("2018-03-04T14:58:51.233Z")
> typeof mydate1
object
> var mydate2 = ISODate() // Greenwich Mean Time
> mydate2
ISODate("2018-03-04T15:00:45.479Z")
> typeof mydate2
object
Dates created this way are of the date type and can use methods from the Date type in JavaScript.
Returning a string representation of a date:
> var mydate1str = mydate1.toString()
> mydate1str
Sun Mar 04 2018 14:58:51 GMT+0000 (UTC)
> typeof mydate1str
string
Or:
> Date()
Sun Mar 04 2018 15:02:59 GMT+0000 (UTC)