MongoDB Aggregation
Aggregation in MongoDB is primarily used for processing data (such as calculating averages, sums, etc.) and returning the computed results.
It is somewhat similar to the count(*)
statement in SQL.
aggregate() Method
The aggregation method in MongoDB uses aggregate()
.
Syntax
The basic syntax format for the aggregate()
method is as follows:
>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Example
The data in the collection is as follows:
{
_id: ObjectId(7df78ad8902c)
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by_user: 'tutorialpro.org',
url: 'http://www.tutorialpro.org',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100
},
{
_id: ObjectId(7df78ad8902d)
title: 'NoSQL Overview',
description: 'No sql database is very fast',
by_user: 'tutorialpro.org',
url: 'http://www.tutorialpro.org',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 10
},
{
_id: ObjectId(7df78ad8902e)
title: 'Neo4j Overview',
description: 'Neo4j is no sql database',
by_user: 'Neo4j',
url: 'http://www.neo4j.com',
tags: ['neo4j', 'database', 'NoSQL'],
likes: 750
},
Now, we calculate the number of articles written by each author using aggregate()
, with the result as follows:
> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
"result" : [
{
"_id" : "tutorialpro.org",
"num_tutorial" : 2
},
{
"_id" : "Neo4j",
"num_tutorial" : 1
}
],
"ok" : 1
}
>
The above example is similar to the SQL statement:
select by_user, count(*) from mycol group by by_user
In the example above, we grouped the data by the by_user
field and calculated the sum of the by_user
field with the same value.
The following table shows some aggregation expressions:
Expression | Description | Example |
---|---|---|
$sum | Calculates the total sum. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}]) |
$avg | Calculates the average value. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}]) |
$min | Gets the minimum value corresponding to all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}]) |
$max | Gets the maximum value corresponding to all documents in the collection. | db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}]) |
$push | Adds values to an array without checking for duplicate values. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}]) |
$addToSet | Adds a value to an array unless the value is already present, in which case it does nothing. | db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}]) |
$first | Gets the first document according to the sort order. | db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}]) |
$last | Gets the last document according to the sort order. | db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}]) |
Concept of Pipelines
Pipelines in Unix and Linux are generally used to pass the output of the current command as the input to the next command.
MongoDB's aggregation pipeline passes the MongoDB documents through a series of pipeline stages that process the documents. Pipeline operations can be repeated.
Expressions: Process input documents and produce output. Expressions are stateless and only process the current aggregation pipeline's documents; they cannot process other documents.
Here we introduce a few commonly used operations in the aggregation framework:
$project: Modifies the structure of the input documents. It can be used to rename, add, or remove fields, as well as to create computed fields and nested documents.
$match: Filters data to output only documents that meet the specified conditions. $match uses MongoDB's standard query operations.
$limit: Limits the number of documents returned by the MongoDB aggregation pipeline.
$skip: Skips the specified number of documents in the aggregation pipeline and returns the remaining documents.
$unwind: Deconstructs an array field in the documents into multiple documents, each containing a single value from the array.
$group: Groups documents in the collection, useful for aggregating results.
$sort: Sorts the input documents and outputs them.
$geoNear: Outputs documents ordered by proximity to a geographical location.
Examples of Pipeline Operators
- $project Example
db.article.aggregate( { $project : { title : 1 , author : 1 , }} );
This will result in documents containing only the _id, title, and author fields. By default, the _id field is included. To exclude the _id field:
db.article.aggregate(
{ $project : {
_id : 0 ,
title : 1 ,
author : 1
}}
);
- $match Example
db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );
$match is used to retrieve records with a score greater than 70 and less than or equal to 90, then sends the matching records to the next stage, $group, for processing.
- $skip Example
db.article.aggregate( { $skip : 5 });
After the $skip pipeline operator processes the documents, the first five documents are "filtered" out.