Using MongoDB with Community Data Access (CDA)

I’ve covered how to get data directly from MongoDB directly into Pentaho Reports.  But what if you want to get them in a dashboard built with Community Dashboard Framework (CDF)?  That’s actually about as simple as getting it into a Pentaho report.

CDF comes with a related technology known as Community Data Access (CDA).  In general, CDA is a technology that abstracts access to data and provides a consistent approach to data access for CDF dashboards.  Among it’s other nice features are that it supports a wide variety of data sources including scripting.  Currently the scripting data source only supports Beanshell, but that’s good enough for communicating with MongoDB.

I’ve used the same database and collection from the reporting example, so you can find the source there.  I won’t go into all of the details of the CDA since you can find those at the WebDetails site.  The significant pieces are that you need to have a Connection of type “scripting.scripting” and then a DataAccess of type “scripting”.  As you can see, the query script is essentially the same as the one for reporting except that it’s in the Beanshell syntax.

<?xml version="1.0" encoding="UTF-8"?>
<CDADescriptor>
 <DataSources>
 <Connection id="mongodb" type="scripting.scripting">
 <Language>beanshell</Language>
 <InitScript/>
 </Connection>
 </DataSources>
 <DataAccess access="public" cache="true" cacheDuration="3600" connection="mongodb" id="mongodb-sales" type="scriptable">
   <Name>Sales via MongoDB</Name>
   <Columns/>
   <Parameters/>
   <Query><![CDATA[
import com.mongodb.*;
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel;
Mongo mongo = new Mongo();
db = mongo.getDB("pentaho");
sales = db.getCollection("sales");
String [] columnNames = {"Region", "Year", "Q1", "Q2", "Q3", "Q4"};
Class[] columnTypes = {String.class, Integer.class, Integer.class, Integer.class, Integer.class, Integer.class};
TypedTableModel model = new TypedTableModel(columnNames, columnTypes);
docs= sales.find();
while (docs.hasNext()) {
 doc = docs.next();
 model.addRow(new Object[] { doc.get("region"), doc.get("year"), doc.get("q1"), doc.get("q2"), doc.get("q3"), doc.get("q4")});
}
docs.close();
return model;
 ]]></Query>
 </DataAccess>
</CDADescriptor>

Now that the CDA data access has been defined you can use the CDA previewer to view the results.  Assuming you’ve used the same data you should see something similar to the following:

Example of the data from MongoDB via CDA.

At this point the data is available for any dashboard that has permission to access the data.

 


Adding Parameters to the Pentaho MongoDB Report

In the previous post, I showed how to create a Pentaho report with MongoDB as a data source using Groovy Scripting.  That’s pretty handy, but a lot of users want to be able to have parameters they can use to filter the date.  Adding parameters is pretty easy, but not well documented, so here’s how it’s done.  These instructions assume you are fairly familiar with Pentaho Report Designer.  If you need the data file and instruction on setting up the data, see the previous post.

To make it easy for the user, let’s create a query to use for the parameter values.  You could do another MongoDB query, but I’ll just use a table to keep things simple.  Under the data source tab add a new data source of type Table, create a query, and then add values for each of the regions: North, South, East West.

Regions for the parameter values.

Next, create a new parameter named Region that the user will select the Region to filter on.  The setting should look like the following:

Region parameter settings

Now you’re ready to use the parameters in your query.  Since the query is a Groovy script you have to use a special call rather than the traditional ${parameter} approach that you might be used to in SQL queries.  Specifically you will use the rowData.get("parameter-name"). The following is the new version of the query:

import com.mongodb.*
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel

def mongo = new Mongo()
def db = mongo.getDB(“pentaho”)
def sales = db.getCollection(“sales”)

def columnNames = [“Region”, “Year”, “Q1”, “Q2”, “Q3”, “Q4”] as String[]
def columnTypes = [String.class, Integer.class, Integer.class, Integer.class, Integer.class, Integer.class] as Class[]

TypedTableModel model = new TypedTableModel(columnNames, columnTypes)

def docs = sales.find(new BasicDBObject(“region”, dataRow.get(“Region”)))
while (docs.hasNext()) {
def doc = docs.next()
model.addRow([ doc.get(“region”), doc.get(“year”), doc.get(“q1”), doc.get(“q2”), doc.get(“q3”), doc.get(“q4”) ] as Object[])
}
docs.close()

model

Download the report description to see all the details. When run it looks like the following:

Example report with queries.


Creating Pentaho Reports from MongoDB


So you’ve made the move and started using MongoDB to store unstructured data.  Now your users want to create reports on the MongoDB databases and collections.  One approach is to use a Kettle transformation that retrieves data from MongoDB for reports.  This approach is documented on the Pentaho Wiki.  However, I want to use the MongoDB database directly without dealing with Spoon and Kettle transformations.  Fortunately Pentaho Reporting also supports scripting with Groovy built in.  This tutorial will show you how to create a report against MongoDB data using the Java drivers and Groovy scripting.

Pre-Conditions
You should already have mongodb installed and accessible.  I’m running on the same machine with the default settings, so vary the code as needed for your configuration.
You also need to put the mongo-java-driver-2.7.2.jar file in the libraries for Report Designer and the BA Server
$PENTAHO_HOME/design-tools/Pentaho Report Designer.app/lib/
$PENTAHO_HOME/server/biserver-ee/tomcat/webapps/pentaho/WEB-INF/lib
Restart the app and BA Server if they are running to pick up the new .jar files.
Setting Up
The first thing you need is some data.  I’ve created an input file of sales by region and year to use as an example.  Download and import the data using the mongoimport command:
> mongoimport -d pentaho -c sales data.json
Verify that the data has been successfully imported by opening the mongo shell and using the following commands:
> use pentaho
> db.sales.find();
You should see a list of documents that were added.
Creating the Report
  1. Using Pentaho Report Designer, create a new report.
  2. Add a data source and choose Advanced -> Scriptable
  3. Select groovy as the language and click the (+) for a new query
  4. Enter the following code as the script

import com.mongodb.*
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel
def mongo = new Mongo()
def db = mongo.getDB(“pentaho”)
def sales = db.getCollection(“sales”)

def columnNames = [“Region”, “Year”, “Q1”, “Q2”, “Q3”, “Q4”] as String[]
def columnTypes = [String.class, Integer.class, Integer.class, Integer.class, Integer.class, Integer.class] as Class[]

TypedTableModel model = new TypedTableModel(columnNames, columnTypes)

def docs = sales.find()
while (docs.hasNext()) {
def doc = docs.next()
model.addRow([ doc.get(“region”), doc.get(“year”), doc.get(“q1”), doc.get(“q2”), doc.get(“q3”), doc.get(“q4”) ] as Object[])
}
docs.close()

model

This will read the data from MongoDB and return the table model needed by the reporting engine.
From here it’s just standard report generation and publishing, which is described in the Pentaho documentation.
Go ahead and download the report definition.  You’ll need to have mongod running and on the default (or change your script).  The results look something like the following:

Sales by Region from MongoDB