How To Analyze MongoDB Data In Tableau

The speed, flexibility and ease of setup of NoSQL-style databases have all contributed to the rapid rise in popularity of MongoDB in recent years.

Despite its advantages in speed and flexibility, though, MongoDB can still be a bit of a pain when it comes to actually analyzing your data, because so many of the analytic approaches and techniques used by data analysts and data scientists currently depend on the more structured, tabular format common to traditional relational databases.

At Panoply, we get a lot of customers who are looking for ways to make analyzing their MongoDB data easier, especially when it needs to be integrated with the rest of their data in an analytics warehouse.

Today, we’re going to explore how you can use Panoply to collect, process and prepare your MongoDB-stored data for analysis in just a few steps by taking advantage of our automated data warehousing approach, and we’ll do it using Tableau, one of the most popular BI suites.

If you want to follow along, we’ll be using a pre-made dataset of city inspection data available from Github that you can load into your own MongoDB instance before starting–or just use your own data. Learn more from MongoDB Training

Pulling data from MongoDB into your Panoply Data Warehouse

First, you’ll need to connect your MongoDB instance to Panoply (note: if you’ve already done this, feel free to skip ahead to a later section). From your Panoply dashboard, click on Data Sources in the left pane, then hit Add Data Source in the upper right:

Which will take you into the data source selection section. MongoDB will be under Databases (in the leftmost column):

Selecting MongoDB will open a new pane where you can enter your connection details, which will look something like this:

Note that you’ll need a couple different pieces of information to connect Panoply to your MongoDB database:

  • Username: the username you use to connect to your MongoDB database remotely
  • Password: the password associated with that username
  • Host: the address where your MongoDB database is hosted
  • Port: the port your MongoDB database uses for connections
  • Database name: the name of your MongoDB database. for more MongoDB Certification

As you can see in the screengrab above, you’ll need to plug all that information into a MongoDB connection URI, which will have the general form of:

mongodb://[username:password@]host1:[port1],[...hostN:portN]][/[database][?options]]

Or, more concretely, it should look something like this:

mongodb://myDBReader:D1fficultP%40ssw0rd@mongodb0.example.com:27017/admin

Note that Panoply encrypts all login information provided by our users. Once you’ve entered your URI, you can expand the Advanced Options section and set your preferences. In the Advanced Options section, you’ll be able to set the following:

  • Destination: allows you to give a name to the table that will be created in Panoply after importing, or a table-specific prefix if you’re planning on importing multiple tables
  • Primary Key: allows you to set a primary key that will act as a unique identifier for each record in your table(s), and will also allow you to link related data across tables. If you already have a field in your tables called “id” or something similar, Panoply will use it as a primary key if no other parameter is set. If you don’t set a primary at all, you won’t see any deduplication in your data imports, so every single piece of data from the collection run will be added to your tables–you might not want that. Since your are using MongoDB, the default will be _id.
  • Incremental Key: make sure to set this up if you’re planning on making multiple, recurring pulls from this data source. Doing so will make future imports more efficient, as Panoply will only collect data that has been updated since the last time you pulled from that source. Note that incremental keys will only work properly if you are pulling from a single table with this connector.
  • Exclude: you can use this field to exclude specific data elements from your import. If there are specific types of data (e.g. irrelevant or sensitive data) that you would like to exclude from your Panoply data warehouse, you can use this setting to manage that. You can exclude nested fields with dot notation. So if my “users” top level object had a field in it called “email”, you could exclude that by putting “users.email” in the exclude section in Panoply. get additional info from Tableau Training

Once you’ve set all that up, click Collect. The Data sources – MongoDB pane will go gray while the process is underway, and a green status bar will appear at the top of the pane. You don’t need to stay and watch it, though.

Panoply is designed to allow for multiple, parallel data collection processes, so you can move on to your next data source–and the one after that–while your MongoDB data is being ingested.

Panoply will send you an alert when your data collection run is finished, but you can also monitor progress directly or cancel jobs in the Jobs pane.

Once all your data collection is finished, you can head to the Tables pane and get a bird’s eye view of all your fresh, processed data. If you used the same sample data we did, your data should have gone from a collection called Inspections looking something like this:

{"_id":{"$oid":"56d61033a378eccde8a8354f"},"id":"10021-2015-ENFO","certificate_number":9278806,"business_name":"ATLIXCO DELI GROCERY INC.","date":"Feb 20 2015","result":"No Violation Issued","sector":"Cigarette Retail Dealer - 127","address":

{"city":"RIDGEWOOD","zip":11385,"street":"MENAHAN ST","number":1712}}

{"_id":{"$oid":"56d61033a378eccde8a83550"},"id":"10057-2015-ENFO","certificate_number":6007104,"business_name":"LD BUSINESS SOLUTIONS","date":"Feb 25 2015","result":"Violation Issued","sector":"Tax Preparers - 891","address":{"city":"NEW YORK","zip":10030,"street":"FREDERICK DOUGLASS BLVD","number":2655}}

{"_id":{"$oid":"56d61033a378eccde8a83551"},"id":"10084-2015-ENFO","certificate_number":9278914,"business_name":"MICHAEL GOMEZ RANGHALL","date":"Feb 10 2015","result":"No Violation Issued","sector":"Locksmith - 062","address":{"city":"QUEENS VLG","zip":11427,"street":"214TH ST","number":8823}}

Connecting Tableau to your MongoDB data

Now that your data is all loaded into Panoply and organized into standardized columnar tables, you’ll find it’s a lot easier to analyze with a BI tool like Tableau. Connecting Tableau to your Panoply data warehouse is a pretty simple process, since it uses Tableau’s native Redshift connector. learn more info from MongoDB Online Training

Finding your Panoply connection parameters

Before you start, though, you’ll need to gather your connection details from Panoply. You can find this by clicking Connect on the navigation pane in your Panoply dashboard, which will open a new pane with all of the relevant connection details, like so:

Setting up the connection in Tableau

Now open a new Tableau session, and start a new connection by going to Data > New Data Source from the menu:

Choose Amazon Redshift. This will open a dialog where you can enter the connection details you got from Panoply:

Once Tableau is connected to your Panoply data warehouse, you will be able to select which data tables you want to use for analysis. Select the Public schema on the left hand side, then drag the tables you’d like to analyze to the top top pane (indicated by the text “Drag tables here”).

You should see the tables you selected laid out in a relational graph now. Since we used the city inspections dataset from the Github collection mentioned above, our data from Panoply now looks like this in Tableau:

Analyzing your MongoDB data in Tableau

You’ve now created an analysis pipeline by linking your MongoDB database to your Panoply instance and are all set up to analyze your MongoDB in Tableau. Let’s take a look at the dataset we pulled from Github.

A brief glance at the raw JSON shows that it’s got records of New York City inspections from all over the city representing a number of different business types. It also shows whether a violation was issued after the inspection, but it doesn’t show the type of violation.

Let’s start by looking at the geographical distribution of violations using one of Tableau’s pretty mapping features. Once you’ve connected to your Panoply data source, you can head to your first worksheet.

At this point, Tableau has probably even popped up a little flag in the lower left corner of your screen to help you along:

To get in-depth knowledge, enroll for a live free demo on Tableau Online Training

Leave a comment

Design a site like this with WordPress.com
Get started