Ubuntu Manpage: mongoc_aggregate - Aggregation Framework Examples

Provided by: libmongoc-doc_1.23.1-1build1_all

NAME

       mongoc_aggregate - Aggregation Framework Examples

       This document provides a number of practical examples that display the capabilities of the
       aggregation framework.

       The Aggregations using the Zip Codes Data Set examples uses a publicly available data  set
       of  all  zipcodes  and  populations  in  the  United  States. These data are available at:
       zips.json.

REQUIREMENTS

       Let's check if everything is installed.

       Use the following command to load zips.json data set into mongod instance:

          $ mongoimport --drop -d test -c zipcodes zips.json

       Let's use the MongoDB shell to verify that everything was imported successfully.

          $ mongo test
          connecting to: test
          > db.zipcodes.count()
          29467
          > db.zipcodes.findOne()
          {
                "_id" : "35004",
                "city" : "ACMAR",
                "loc" : [
                        -86.51557,
                        33.584132
                ],
                "pop" : 6055,
                "state" : "AL"
          }

AGGREGATIONS USING THE ZIP CODES DATA SET

       Each document in this collection has the following form:

          {
            "_id" : "35004",
            "city" : "Acmar",
            "state" : "AL",
            "pop" : 6055,
            "loc" : [-86.51557, 33.584132]
          }

       In these documents:

       • The _id field holds the zipcode as a string.

       • The city field holds the city name.

       • The state field holds the two letter state abbreviation.

       • The pop field holds the population.

       • The loc field holds the location as a [latitude, longitude] array.

STATES WITH POPULATIONS OVER 10 MILLION

       To get all states with a population greater than 10 million, use the following aggregation
       pipeline:

       aggregation1.c

          #include <mongoc/mongoc.h>
          #include <stdio.h>

          static void
          print_pipeline (mongoc_collection_t *collection)
          {
             mongoc_cursor_t *cursor;
             bson_error_t error;
             const bson_t *doc;
             bson_t *pipeline;
             char *str;

             pipeline = BCON_NEW ("pipeline",
                                  "[",
                                  "{",
                                  "$group",
                                  "{",
                                  "_id",
                                  "$state",
                                  "total_pop",
                                  "{",
                                  "$sum",
                                  "$pop",
                                  "}",
                                  "}",
                                  "}",
                                  "{",
                                  "$match",
                                  "{",
                                  "total_pop",
                                  "{",
                                  "$gte",
                                  BCON_INT32 (10000000),
                                  "}",
                                  "}",
                                  "}",
                                  "]");

             cursor = mongoc_collection_aggregate (
                collection, MONGOC_QUERY_NONE, pipeline, NULL, NULL);

             while (mongoc_cursor_next (cursor, &doc)) {
                str = bson_as_canonical_extended_json (doc, NULL);
                printf ("%s\n", str);
                bson_free (str);
             }

             if (mongoc_cursor_error (cursor, &error)) {
                fprintf (stderr, "Cursor Failure: %s\n", error.message);
             }

             mongoc_cursor_destroy (cursor);
             bson_destroy (pipeline);
          }

          int
          main (void)
          {
             mongoc_client_t *client;
             mongoc_collection_t *collection;
             const char *uri_string =
                "mongodb://localhost:27017/?appname=aggregation-example";
             mongoc_uri_t *uri;
             bson_error_t error;

             mongoc_init ();

             uri = mongoc_uri_new_with_error (uri_string, &error);
             if (!uri) {
                fprintf (stderr,
                         "failed to parse URI: %s\n"
                         "error message:       %s\n",
                         uri_string,
                         error.message);
                return EXIT_FAILURE;
             }

             client = mongoc_client_new_from_uri (uri);
             if (!client) {
                return EXIT_FAILURE;
             }

             mongoc_client_set_error_api (client, 2);
             collection = mongoc_client_get_collection (client, "test", "zipcodes");

             print_pipeline (collection);

             mongoc_uri_destroy (uri);
             mongoc_collection_destroy (collection);
             mongoc_client_destroy (client);

             mongoc_cleanup ();

             return EXIT_SUCCESS;
          }

       You should see a result like the following:

          { "_id" : "PA", "total_pop" : 11881643 }
          { "_id" : "OH", "total_pop" : 10847115 }
          { "_id" : "NY", "total_pop" : 17990455 }
          { "_id" : "FL", "total_pop" : 12937284 }
          { "_id" : "TX", "total_pop" : 16986510 }
          { "_id" : "IL", "total_pop" : 11430472 }
          { "_id" : "CA", "total_pop" : 29760021 }

       The above aggregation pipeline is build from two pipeline operators: $group and $match.

       The  $group  pipeline  operator  requires  _id  field where we specify grouping; remaining
       fields specify how to generate composite value and must use one of the  group  aggregation
       functions:  $addToSet,  $first,  $last, $max, $min, $avg, $push, $sum. The $match pipeline
       operator syntax is the same as the read operation query syntax.

       The $group process reads all documents and for each state it creates a separate  document,
       for example:

          { "_id" : "WA", "total_pop" : 4866692 }

       The total_pop field uses the $sum aggregation function to sum the values of all pop fields
       in the source documents.

       Documents created by $group are piped to the $match  pipeline  operator.  It  returns  the
       documents with the value of total_pop field greater than or equal to 10 million.

AVERAGE CITY POPULATION BY STATE

       To  get  the  first  three  states  with the greatest average population per city, use the
       following aggregation:

          pipeline = BCON_NEW ("pipeline", "[",
             "{", "$group", "{", "_id", "{", "state", "$state", "city", "$city", "}", "pop", "{", "$sum", "$pop", "}", "}", "}",
             "{", "$group", "{", "_id", "$_id.state", "avg_city_pop", "{", "$avg", "$pop", "}", "}", "}",
             "{", "$sort", "{", "avg_city_pop", BCON_INT32 (-1), "}", "}",
             "{", "$limit", BCON_INT32 (3) "}",
          "]");

       This aggregate pipeline produces:

          { "_id" : "DC", "avg_city_pop" : 303450.0 }
          { "_id" : "FL", "avg_city_pop" : 27942.29805615551 }
          { "_id" : "CA", "avg_city_pop" : 27735.341099720412 }

       The above aggregation pipeline is build from three pipeline operators: $group,  $sort  and
       $limit.

       The first $group operator creates the following documents:

          { "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 }

       Note, that the $group operator can't use nested documents except the _id field.

       The second $group uses these documents to create the following documents:

          { "_id" : "FL", "avg_city_pop" : 27942.29805615551 }

       These  documents  are  sorted  by the avg_city_pop field in descending order. Finally, the
       $limit pipeline operator returns the first 3 documents from the sorted set.

AUTHOR

       MongoDB, Inc

COPYRIGHT

       2017-present, MongoDB, Inc