Tuesday, April 01, 2014

MongoDB Aggregation Query for Justin John Mathews

Recently on LinkedIn, Justin John Mathews asked a question about being able to return data in a certain format. The input and output records are below. Input
db.users.insert({ courseId: 1, stDt: new Date(2014, 01, 01), endDt: new Date(2014, 01, 20), active: false });
db.users.insert({ courseId: 1, stDt: new Date(2014, 01, 25), endDt: new Date(2014, 02, 10), active: false });
db.users.insert({ courseId: 1, stDt: new Date(2014, 02, 25), endDt: new Date(2014, 03, 10), active: true });
db.users.insert({ courseId: 1, stDt: new Date(2014, 02, 28), endDt: new Date(2014, 06, 10), active: true });
db.users.insert({ courseId: 1, stDt: new Date(2014, 02, 25), endDt: new Date(2014, 02, 30), active: false });
db.users.insert({ courseId: 1, stDt: new Date(2014, 05, 25), endDt: new Date(2014, 10, 30), active: false });
db.users.insert({ courseId: 1, stDt: new Date(2013, 10, 01), endDt: new Date(2014, 08, 10), active: true });
db.users.insert({ courseId: 1, stDt: new Date(2014, 09, 01), endDt: new Date(2014, 11, 30), active: false });
Output
{
    "result" : {
        true : [
            { endDt: new Date(2014, 03, 10), active: true },
            { endDt: new Date(2014, 06, 10), active: true },
            { endDt: new Date(2014, 08, 10), active: true },
        ],
        false: [
            { endDt: new Date(2014, 01, 20), active: false },
            { endDt: new Date(2014, 02, 10), active: false },
            { endDt: new Date(2014, 02, 30), active: false },
            { endDt: new Date(2014, 10, 30), active: false },
            { endDt: new Date(2014, 11, 30), active: false }
        ]
    }
}
The code below is one possible solution. I'm not necessarily happy with everything yet, but it does produce the same results as requested, with two small exceptions. First, the code below displays the ISODate instead of the JavaScript date method which I'm assuming is what should have been in the request as you can't return what's shown above. The other small difference is that since I'm using the aggregation framework with MongoDB 2.4.9, the result is a key of result which contains an array of subdocuments as it's value. There were quite a bit of hoops I jumped through to get this solution, and it could probably be cleaned up quite a bit, but I thought I would throw this post together to let him see that it can be done with a single command if you're patient enough to keep tweaking your pipeline. ;)
db.users.aggregate([

    // Project only what we need.
    {
        "$project": {
            "endDt": 1,
            "active": 1,
            "_id": 0
        }
    },

    // Group to true and false buckets. This will give us arrays with null.
    // We'll remove them in a bit.
    {
        "$group": {
            "_id": "$active",
            "true": {
                "$addToSet": {
                    "$cond": [
                        {"$eq": ["$active", true]},
                        "$endDt",
                        null
                    ]
                }
            },
            "false": {
                "$addToSet": {
                    "$cond": [
                        {"$eq": ["$active", false]},
                        "$endDt",
                        null
                    ]
                }
            }
        }
    },

    // We need to unwind our arrays so we can build them back up without the
    // "null" array.
    {"$unwind": "$true"},
    {"$unwind": "$false"},

    // Project out the values. This will give both a true and false key for
    // each item. This builds our arrays up with the proper endDt and
    // active values.
    {
        "$project": {
            "true": {"endDt": "$true", "active": "$_id"}, 
            "false": {"endDt": "$false", "active": "$_id"}
        }
    },

    // Project out a single value to clean up the "issue" a couple steps above.
    {
        "$project": {
            "value": {
                "$cond": [
                    {"$eq": ["$_id", true]},
                    "$true",
                    "$false"
                ]
            }
        }
    },

    // Group things up again to rebuild our arrays.
    // This adds a single "null" entry that will need to be cleaned up.
    {
        "$group": {
            "_id": null,
            "true": {
                "$addToSet": {
                     "$cond": [
                         {"$eq": ["$_id", true]},
                         "$value",
                         null
                     ]
                }
            },
            "false": {
                "$addToSet": {
                    "$cond": [
                        {"$eq": ["$_id", false]},
                        "$value",
                        null
                    ]
                }
            }
        }
    },

    // Unwind our arrays again so we can clean up one more time.
    {"$unwind": "$true"},
    {"$unwind": "$false"},

    // Match only documents where true and false are not null.
    {
        "$match": {
            "true": {"$ne": null},
            "false": {"$ne": null}
        }
    },

    // Sort our items so we can add the to the array in the correct order.
    // I'm not sure why it has to be descending order, but it works.
    {
        "$sort": {
            "true.endDt": -1,
            "false.endDt": -1
        }
    },

    // Group again to build our array.
    {
        "$group": {
            "_id": null,
            "true": {"$addToSet": "$true"},
            "false": {"$addToSet": "$false"}
        }
    },

    // Once again project out just the fields we need
    {
        "$project": {
            "true": 1,
            "false": 1,
            "_id": 0
        }
    }
])
Here are the returned results in MongoDB 2.2.x - 2.4.x:
{
    "result" : [
        {
            "true" : [
                {
                    "endDt" : ISODate("2014-04-10T06:00:00Z"),
                    "active" : true
                },
                {
                    "endDt" : ISODate("2014-07-10T06:00:00Z"),
                    "active" : true
                },
                {
                    "endDt" : ISODate("2014-09-10T06:00:00Z"),
                    "active" : true
                }
            ],
            "false" : [
                {
                    "endDt" : ISODate("2014-02-20T07:00:00Z"),
                    "active" : false
                },
                {
                    "endDt" : ISODate("2014-03-10T06:00:00Z"),
                    "active" : false
                },
                {
                    "endDt" : ISODate("2014-03-30T06:00:00Z"),
                    "active" : false
                },
                {
                    "endDt" : ISODate("2014-11-30T07:00:00Z"),
                    "active" : false
                },
                {
                    "endDt" : ISODate("2014-12-30T07:00:00Z"),
                    "active" : false
                }
            ]
        }
    ],
    "ok" : 1
}
And in the soon to be released version 2.6.0, the results look like the following:
{
    "true" : [
        {
            "endDt" : ISODate("2014-04-10T06:00:00Z"),
            "active" : true
        },
        {
            "endDt" : ISODate("2014-07-10T06:00:00Z"),
            "active" : true
        },
        {
            "endDt" : ISODate("2014-09-10T06:00:00Z"),
            "active" : true
        }
    ],
    "false" : [
        {
            "endDt" : ISODate("2014-02-20T07:00:00Z"),
            "active" : false
        },
        {
            "endDt" : ISODate("2014-03-10T06:00:00Z"),
            "active" : false
        },
        {
            "endDt" : ISODate("2014-03-30T06:00:00Z"),
            "active" : false
        },
        {
            "endDt" : ISODate("2014-11-30T07:00:00Z"),
            "active" : false
        },
        {
            "endDt" : ISODate("2014-12-30T07:00:00Z"),
            "active" : false
        }
    ]
}

3 comments:

  1. Thank you Doug...

    I will never go in this way unless you provided it.

    Earlier, I tried by changing the result structure, so that I can retrieve the results. The following query is used to retrieve the results.

    db.users.aggregate({'$group' : {'_id' : '$active', 'endDt' : {'$addToSet' : { 'endDt': '$endDt', 'active': '$active'}}});

    I will try to explain Second Case mentioned in post:

    Does following document be valid?

    { "result" : { doc:
    { endDt: new Date(2014, 01, 20), active: false },
    { endDt: new Date(2014, 02, 10), active: false },
    { endDt: new Date(2014, 02, 30), active: false },
    [
    { endDt: new Date(2014, 03, 10), active: true },
    { endDt: new Date(2014, 06, 10), active: true },
    { endDt: new Date(2014, 08, 10), active: true }
    ],
    { endDt: new Date(2014, 10, 30), active: false },
    { endDt: new Date(2014, 11, 30), active: false }
    } }

    Here "active" field record to "false" will have object for each "endDt" field, but the "active" field to true records are placed as array at the position of last "endDt"(new Date(2014, 08, 10)) field which have "active" set to "true".

    ReplyDelete
  2. Hi Justin,

    Unfortunately that's still not an valid document as your array of "actives" has no name, and you've got a list of subdocuments and only one of those has a key name. You could assign each group to a buckets for "past", "current" and "upcoming" which would look similar to the following.

    {
    "result": {
    "past" : [
    { "endDt" : ISODate("2014-02-20T07:00:00Z"), "active" : false },
    { "endDt" : ISODate("2014-03-10T06:00:00Z"), "active" : false },
    { "endDt" : ISODate("2014-03-30T06:00:00Z"), "active" : false }
    ],
    "current" : [
    { "endDt" : ISODate("2014-04-10T06:00:00Z"), "active" : true },
    { "endDt" : ISODate("2014-07-10T06:00:00Z"), "active" : true },
    { "endDt" : ISODate("2014-09-10T06:00:00Z"), "active" : true }
    ],
    "upcoming": [
    { "endDt" : ISODate("2014-11-30T07:00:00Z"), "active" : false },
    { "endDt" : ISODate("2014-12-30T07:00:00Z"), "active" : false }
    ]
    }

    Is that what you're looking for? Also should a class that's listed as "active": false show up in the middle array?

    ReplyDelete
  3. Yes. To an extend it is correct. I think, you will try to sort the records according to "endDt".
    But, it won't be correct scenario, as there can be end dates('endDt') later than "ISODate('2014-12-30T07:00:00Z')" with "active: "true".

    I have another field called startDate('startDt') in each record. Each document that comes in between start date and end date is "current", start date and end date are over as "past" and start and end date are future dates then "upcoming". I added "active" in order to make the querying fast. Now, I think there is no need to have it as "active" there.

    Can you please give correct way to deal with this?

    ReplyDelete