Friday, May 27, 2016

My Journey with MongoDB

In 2011, I'd been a SQL Server guy (development, DBA and warehousing) for quite some time and I never meant to set out to find a new data store. I was working as a DBA at the time, and we were asked to look into this new data storage technology called Cassandra. At the time you had to build things manually and that was outside of what we did, so we never went anywhere with it.

That quick look however got me looking at other database technologies and and that's when I came across MongoDB. This was around the time that version 1.8 came out. I started looking at MongoDB to see what it was all about. I was still stuck in my relational ways, but thought that there was something to this new tech. I went out and bought Kristina Chodorow's MongoDB: The Definitive Guide and Kyle Banker's MongoDB in Action. After reading through these books and playing around my mind started grasping the concepts and was beginning to break free from the confines of the relational world. I started seeing how denormalizing (storing duplicate data in multiple documents) the data into a single document made sense. You could store all the data that should be together and wouldn't have to pay the overhead of joining tables. The more I worked with MongoDB the more it made sense.

Shortly after this, another team in the company was looking at using MongoDB as a backend data store for a project they were working on. The project was to replicate something similar to Google Analytics that we could put on our client's web sites and custom build tracking for them. Since I wanted to get some practical experience with the technology, I moved over to that team and spent a month building out a prototype of the system. Unfortunately the developer that I was working with on the project had decided to go a different direction and the project was scrapped.

This had given me a taste of what MongoDB could do, and I kept working with MongoDB in a personal capacity for a while. At the end of 2011, MongoDB University released their first two courses and I decided to take M201. I wasn't sure how involved the courses were so I only wanted to take one at a time. After I passed that course, I took M101 (later renamed to M101P) and I passed that course as well. After I completed M101, I reached out to Andrew Erlichson about becoming a TA for the courses. I was brought on and spent the next two years supporting the M101J, M101JS and M201 courses at various times. The education team was a pleasure to work with and I will always be grateful to them for allowing me to spend that length of time with them. I also learned a lot from the students during that time. For my work with the team, they honored me by giving me the first MongoDB DBA certification back in 2013.

After two years of helping out on the education team I made the hard choice to step away from my role as a TA since my available free time shrunk and I felt that I wasn't giving the students (and the other TAs) the time and attention that was required. I however wasn't done with MongoDB by any means.

About four months later, I was contacted to join the MongoDB Advocacy Hub and have been enjoying helping out with promoting MongoDB in various ways. It's been great working the community team members and getting to know them. I'm slowly getting into blogging now with the encouragement from the marketing and community team members at MongoDB. With my work here I have been honored by being chosen as the MongoDB Giant of the Month for January 2016, and most recently was selected to join the MongoDB Masters Program for 2016.

Most recently I have taken over as the Denver/Boulder MongoDB Meet Up Group organizer. This has been a big change for me as I'm more comfortable being behind the scenes, then organizing them and being front and center. It has only been a few months since starting this leg of my journey, but I've met some wonderful people and glad to see the local community coming out and learning from our speakers. If you're ever in the area stop on in.

My journey has taken me along different paths with MongoDB and I'm glad to have traveled them all. I'm not sure where the adventure will take me in the future, but I know that I'll meet great people along the way and will never be alone.

It's interesting looking back, to realize that I came across the product by chance. Had we not taken a look at Cassandra back then, who knows if I would have payed attention to MongoDB. I have stayed with it for five years now because of the value and ease of working with the product, and the great staff and community the company has assembled and built.

Wednesday, May 25, 2016

Five Easy Ways to Get Started With MongoDB

1. Download the community edition and play
The community edition is available freely and there are installers for all major operating systems (Linux, OS X, Windows and Solaris). This makes getting a system up and running quickly very easy to do. It should only take a few minutes to install MongoDB and you're ready to start playing and learning how to work with the quickest growing and most popular NoSQL database.

2. Work through the getting started guide
MongoDB provides a getting started guide for the `mongo` shell, Python, Java, C++, NodeJS and C#. Working through the examples provided there will get you up and going with the basics in no time. From there you'll be ready to move on to more involved examples and walkthroughs that you might find on the internet.

3. Take a course at MongoDB University
MongoDB has the best set of online learning courses out there. You can learn to code against MongoDB in four different languages currently: Python, Java, NodeJS and .NET. There are two different DBA courses covering basic and advanced topics. There is an advanced level course on cluster management. These courses are seven weeks long, with the exception of the cluster management course which is only two weeks. Having been through most these courses I wholeheartedly recommend them. They are well done and the forums are moderated by a great group of online TAs with deep knowledge in the respective course topics.

4. Read Kristina Chodorow's MongoDB: The Definitive Guide
If you only read one book on MongoDB, this is the book to read. Even though the second edition came out in early 2013 and covers MongoDB 2.6, you will still learn a lot by reading it. This is also a must read for anyone who's looking at studying for the certification exams. Rumor has it that a third edition is in the works, but I've not seen anything definitive on that. Once it comes out, I'll definitely be grabbing a copy as quickly as I can.

5. Join your local MongoDB user group
Perhaps the funnest way to learn MongoDB and how to go from the basics to more advanced topics is to join your local MongoDB User Group (MUG). These groups are frequented by very passionate MongoDB users and you'll find talks ranging from beginning to advanced topics. Equally important to the talks are the conversations you'll have and the friendships you'll make with your fellow community members.

Note that MongoDB makes it easy for anyone to get started with their database and they have built a great community to help get answers to any questions new users might have. Very few products have that type of community support and buy in. In addition to the community, the employees that I have met and worked with are some of the nicest people around. They've always had time to help me through problems in a way that I didn't seem rushed to get out of their hair. If MongoDB fits your use case definitely install it and put it through its paces and you might be surprised at how easy and quick it is to get things running.

Monday, February 15, 2016

Getting distinct values from MongoDB

MongoDB has a function called distinct that allows you get a list of all the distinct values in the given collection for a single key. Let's say that you wanted to see all of the distinct values for the key OffenseTeam in a collection called pbp_2015. You would use the following statement:

db.pbp_2015.distinct("OffenseTeam")

The result of this statement is an array with each distinct value for the given key. Your results would look something like this:

[
  "",
  "WAS",
  "BUF",
  "CLE",
  "NYJ",
  ⋮
  "JAC",
  "MIA",
  "SEA",
  "GB",
  "IND",
  "STL"
]

Note: results truncated for brevity.

You can also provide a query to this function to view the distinct values on a subset of the data.
db.pbp_2015.distinct("OffenseTeam", {"PlayType": "FIELD GOAL", "Quarter": 4, "Down": {"$ne": 4}})

The above shows us which teams attempted to kick a field goal in the fourth quarter of a game, even if it wasn't their final down. The results show that there were a total of 9 teams that did this in the 2015 football year.

[
  "JAC",
  "BAL",
  "NO",
  "TB",
  "CHI",
  "KC",
  "OAK",
  "NYJ",
  "MIN"
]

While this is great for getting distinct values for a single field, what do you do if you need to get a list of the distinct combination of two or more fields?

One option is to use the aggregation framework to get this information.

The following is an example:

db.pbp_2015.aggregate(
    [
        {
            "$match": {
                "PlayType": "FIELD GOAL",
                "Down": {"$ne": 4},
                "Quarter": 4
            }
        },
        {
            "$project": {
                "OffenseTeam": 1,
                "DefenseTeam": 1,
                "_id": 0
            }
        },
        {
            "$group": {
                "_id": {
                    "OffenseTeam": "$OffenseTeam",
                    "DefenseTeam": "$DefenseTeam"
                }
            }
        },
        {
            "$sort": {
                "_id": 1
            }
        }
    ]
)

The above aggregation is similar to the prior distinct command, you can see that we have the same match criteria ({"PlayType": "FIELD GOAL", "Down": {"$ne": 4}, "Quarter": 4}). This time however we want to see not only the offensive team, but the defensive team as well. Since those are the only fields we care about, we will $project them out. Next we will group on OffsenseTeam and DefenseTeam by combining them into compound _id key to get our list. Finally we'll sort documents on _id to make it easier to read the data.

The results look like the following:

{
  "waitedMS": NumberLong("0"),
  "result": [
    {"_id": {"OffenseTeam": "BAL", "DefenseTeam": "PIT"}},
    {"_id": {"OffenseTeam": "BAL", "DefenseTeam": "SD"}},
    {"_id": {"OffenseTeam": "BAL", "DefenseTeam": "STL"}},
    {"_id": {"OffenseTeam": "CHI", "DefenseTeam": "DET"}},
    {"_id": {"OffenseTeam": "CHI", "DefenseTeam": "OAK"}},
    {"_id": {"OffenseTeam": "JAC", "DefenseTeam": "BAL"}},
    {"_id": {"OffenseTeam": "JAC", "DefenseTeam": "IND"}},
    {"_id": {"OffenseTeam": "KC", "DefenseTeam": "CHI"}},
    {"_id": {"OffenseTeam": "MIN", "DefenseTeam": "CHI"}},
    {"_id": {"OffenseTeam": "NO", "DefenseTeam": "ATL"}},
    {"_id": {"OffenseTeam": "NO", "DefenseTeam": "DAL"}},
    {"_id": {"OffenseTeam": "NO", "DefenseTeam": "NYG"}},
    {"_id": {"OffenseTeam": "NYJ", "DefenseTeam": "NE"}},
    {"_id": {"OffenseTeam": "OAK", "DefenseTeam": "DEN"}},
    {"_id": {"OffenseTeam": "TB", "DefenseTeam": "HOU"}}
  ],
  "ok": 1
}

Here you can see that while there were 9 teams that kicked a field goal in the fourth quarter even though they weren't facing a fourth down at the time, several teams did this in multiple games (BAL, CHI, JAC, and NO).

As you can see, you can use the aggregation framework to get the distinct values over multiple keys. This information can be used for a variety of purposes such as the case we used it above. Another use would be to use this to determine the selectivity of keys in an index.

Data for the samples can be found here. The data was cleaned up using code from this MongoDB blog post.