Creating Pentaho Reports from MongoDB


So you’ve made the move and started using MongoDB to store unstructured data.  Now your users want to create reports on the MongoDB databases and collections.  One approach is to use a Kettle transformation that retrieves data from MongoDB for reports.  This approach is documented on the Pentaho Wiki.  However, I want to use the MongoDB database directly without dealing with Spoon and Kettle transformations.  Fortunately Pentaho Reporting also supports scripting with Groovy built in.  This tutorial will show you how to create a report against MongoDB data using the Java drivers and Groovy scripting.

Pre-Conditions
You should already have mongodb installed and accessible.  I’m running on the same machine with the default settings, so vary the code as needed for your configuration.
You also need to put the mongo-java-driver-2.7.2.jar file in the libraries for Report Designer and the BA Server
$PENTAHO_HOME/design-tools/Pentaho Report Designer.app/lib/
$PENTAHO_HOME/server/biserver-ee/tomcat/webapps/pentaho/WEB-INF/lib
Restart the app and BA Server if they are running to pick up the new .jar files.
Setting Up
The first thing you need is some data.  I’ve created an input file of sales by region and year to use as an example.  Download and import the data using the mongoimport command:
> mongoimport -d pentaho -c sales data.json
Verify that the data has been successfully imported by opening the mongo shell and using the following commands:
> use pentaho
> db.sales.find();
You should see a list of documents that were added.
Creating the Report
  1. Using Pentaho Report Designer, create a new report.
  2. Add a data source and choose Advanced -> Scriptable
  3. Select groovy as the language and click the (+) for a new query
  4. Enter the following code as the script

import com.mongodb.*
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel
def mongo = new Mongo()
def db = mongo.getDB(“pentaho”)
def sales = db.getCollection(“sales”)

def columnNames = ["Region", "Year", "Q1", "Q2", "Q3", "Q4"] as String[]
def columnTypes = [String.class, Integer.class, Integer.class, Integer.class, Integer.class, Integer.class] as Class[]

TypedTableModel model = new TypedTableModel(columnNames, columnTypes)

def docs = sales.find()
while (docs.hasNext()) {
def doc = docs.next()
model.addRow([ doc.get("region"), doc.get("year"), doc.get("q1"), doc.get("q2"), doc.get("q3"), doc.get("q4") ] as Object[])
}
docs.close()

model

This will read the data from MongoDB and return the table model needed by the reporting engine.
From here it’s just standard report generation and publishing, which is described in the Pentaho documentation.
Go ahead and download the report definition.  You’ll need to have mongod running and on the default (or change your script).  The results look something like the following:

Sales by Region from MongoDB

About these ads

13 Comments on “Creating Pentaho Reports from MongoDB”

  1. anonymousbi says:

    Great Tutorial Bill! @pentaho_fan

  2. szabodata says:

    Excellent article Bill ! Since this is mongoDB, I’d remind users to add something to limit the rows to avoid memory blowout. For human readable reports, it is reasonable to limit the rows.

  3. mdahlman says:

    I think your premise that reporting without using any ETL is quite interesting in many use cases. And I really like your step-by-step tutorial with examples. My background is with Jaspersoft… so of course I run the risk of starting a content-less discussion that goes, “I’m better.” “No, me.” “No, I am.” etc.

    But… have you looked at iReport for a requirement like this? I think you’ll find that it’s better suited to this requirement in a pretty fundamental way. You can write a query that’s a single line rather than 12 lines of Groovy:
    { collectionName : ‘sales’ }

    More importantly from a practical standpoint, it figures out the field names for you. It figures out the data types. It pools connections when you deploy the report to the server, etc.

    Anyway, if you’re interested to take a look, I think your opinion could be very interesting for readers.

    • billdback says:

      I think it’s always useful to look at a variety of tools. However, the purpose of this particular post was to show how to do something specific with Pentaho Report Designer as a contrast to using a Kettle based approach.

  4. Simon says:

    thank you…is it possible to allow SalveOk in the mongodb connection as it does not seem possible for Pentaho to connect to mongo replica. Also, what do you mean by pick up the new .jar file for mongo…I placed it in the file path you specified, anything else? thanks!!

    • billdback says:

      You should be able to do whatever the driver you are using supports. I haven’t personally tried this.

      Yes – I meant put the jar file in the classpath so it is discovered on startup.

  5. [...] used the same database and collection from the reporting example, so you can find the source there.  I won’t go into all of the details of the CDA since you [...]

  6. Bin says:

    Any idea on how to externalize mongodb connection from script, so that It can be shared by multiple users?

    I opened a ticket #27809 for this but pentaho support said it is ot of their scope

    • billdback says:

      I haven’t tried it so this is pure speculation. However, you could try creating a global MongoDB connection pool on server startup. Then pass a handle to the session using the parameters as describe in my previous post. It would be interesting to know if such an approach works outside of theory.

      • Bin Lin says:

        - I created a Mongo object (which has internal pool) on server startup
        - When user login, this Mongo object is saved in a user session attribute
        - On PRD, created a hidden parameter (with Value type as Object) in report following your instruction in another post to pass in the session variable:
        … =ENV(“session:mongopool”)
        - Tried to access the session variable in groovy:
        def mongo = dataRow.get(“mongopool”)

        However what dataRow.get() returned appear to be always a string representation of Mongo object. I am not sure how to rebuild Mongo object from it, or is there another way to access it as Mongo object?

  7. juansierrapons says:

    Hi,

    I have the mongodb server in a remote server. How can I use grovvy to get connected to the remote server? I can not see any option in the Pentaho Report Designer.

    PS: For the moment I can tunneling the port from remote to localhost. But this is a workaround, not a proper solution.

    Thank for your time

    Best regards


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 294 other followers