❮ Mongodb Regular Expression Mongodb Remove ❯

MongoDB Aggregation

Aggregation in MongoDB is primarily used for processing data (such as calculating averages, sums, etc.) and returning the computed results.

It is somewhat similar to the count(*) statement in SQL.

aggregate() Method

The aggregation method in MongoDB uses aggregate().

Syntax

The basic syntax format for the aggregate() method is as follows:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Example

The data in the collection is as follows:

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'tutorialpro.org',
   url: 'http://www.tutorialpro.org',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'tutorialpro.org',
   url: 'http://www.tutorialpro.org',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now, we calculate the number of articles written by each author using aggregate(), with the result as follows:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "tutorialpro.org",
         "num_tutorial" : 2
      },
      {
         "_id" : "Neo4j",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

The above example is similar to the SQL statement:

select by_user, count(*) from mycol group by by_user

In the example above, we grouped the data by the by_user field and calculated the sum of the by_user field with the same value.

The following table shows some aggregation expressions:

Expression	Description	Example
$sum	Calculates the total sum.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg	Calculates the average value.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min	Gets the minimum value corresponding to all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max	Gets the maximum value corresponding to all documents in the collection.	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push	Adds values to an array without checking for duplicate values.	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet	Adds a value to an array unless the value is already present, in which case it does nothing.	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first	Gets the first document according to the sort order.	db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last	Gets the last document according to the sort order.	db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

Concept of Pipelines

Pipelines in Unix and Linux are generally used to pass the output of the current command as the input to the next command.

MongoDB's aggregation pipeline passes the MongoDB documents through a series of pipeline stages that process the documents. Pipeline operations can be repeated.

Expressions: Process input documents and produce output. Expressions are stateless and only process the current aggregation pipeline's documents; they cannot process other documents.

Here we introduce a few commonly used operations in the aggregation framework:

$project: Modifies the structure of the input documents. It can be used to rename, add, or remove fields, as well as to create computed fields and nested documents.
$match: Filters data to output only documents that meet the specified conditions. $match uses MongoDB's standard query operations.
$limit: Limits the number of documents returned by the MongoDB aggregation pipeline.
$skip: Skips the specified number of documents in the aggregation pipeline and returns the remaining documents.
$unwind: Deconstructs an array field in the documents into multiple documents, each containing a single value from the array.
$group: Groups documents in the collection, useful for aggregating results.
$sort: Sorts the input documents and outputs them.
$geoNear: Outputs documents ordered by proximity to a geographical location.

Examples of Pipeline Operators

$project Example

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
);

This will result in documents containing only the _id, title, and author fields. By default, the _id field is included. To exclude the _id field:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }}
);

$match Example

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

$match is used to retrieve records with a score greater than 70 and less than or equal to 90, then sends the matching records to the next stage, $group, for processing.

$skip Example

db.article.aggregate(
    { $skip : 5 });

After the $skip pipeline operator processes the documents, the first five documents are "filtered" out.

❮ Mongodb Regular Expression Mongodb Remove ❯