Getting creative with holidays in dimensions

I was recently working with a client and saw an interesting approach to a classic problem dealing with holidays as a dimension.  Now maybe this is a common solution, but I hadn’t seen it before, so I thought I’d share.  This same solution could be used for similar problems as well.

The goal is to be able to analyze the impact of holidays on various measures.  Imagine you are analyzing sales or hours worked.  Knowing if a particular day is a holiday is pretty important to understanding spikes in the data.

A first approach might be to have the holiday as a property in the date dimension.  That would only work if the data you are dealing with has the same holiday for all data that points back to that particular date.  This isn’t even a true case for the United States, where some states have their own holidays, much less on a global scale.

So what this client did, was solve the problem at the ETL level.  For each fact, they check with the client calendar and see if the data is for a holiday or not and then set it in the fact table as a degenerate dimension.  You could have a separate dimension as well, but they decided to avoid the join that would get created with Mondrian.  Just make sure to index that column so you don’t do a table scan when grabbing the members.

A simple solution to what at first might appear to be a complex problem.  I like that.

 


Focus! It’s good for you.

Somewhat random post.  I saw a quote I’ve long loved attributed to Steve Jobs: “Deciding what not to do is as important as deciding what to do.”  I don’t think I can really improve on that statement. So maybe I can share what I do to stay focused.  For the free-wheeling types out there, this may seem a bit O-C, but it works for me.

I split my focus areas into four categories that I think are important:

  1. Work / Professional
  2. Family / Home
  3. Learning
  4. Health and Fitness

These are the categories that I feel are very important.  Feel free to have your own.  Many might add “spiritual”, but I lump that in with Heath and Fitness.

Within each of these categories I create 2-3 goals with some being more important than others.  For example, I’m trying to get into shape for a major bike ride this month.  I’d also like to drop a few pounds, so that’s secondary.

I won’t go into detail on how I deal weekly with the goals, since that’s not the point of this post.  What this does do, is allow me to evaluate new ideas and things I want to do.  If the idea is something already on the list, then I can do something with it.  If not, then I put it on my backlog to address in the future.

I’ve been using this technique for a few years now and it works really well.  It helps me stay focused on a few things I can accomplish, while allowing me the flexibility to change and focus on new things.  Definitely something worth considering if you feel overwhelmed or that you are having problems accomplishing your goals.

 


Why I Blog

I thought I’d take a bit and describe why I bother to blog (occasionally) when I could be doing something else.
There are really three reasons.
  1. I like to share things that I figure out that may not be so obvious to others.  I like to experiment and play around with technologies and when I learning something new and non-obvious it seems nice to share.
  2. It makes a great resource for me to go back to.  I’ve been asked questions about some of the things I’ve blogged about and I can point people to my blog.  Sometimes the solution to a problem is complex enough that you can’t really remember all the steps you used, esp. when you solved the problem six month ago.
  3. It’s a good form of professional self promotion.  I suppose there are some professionals out there that don’t use their blogs in part to help establish their credibility.  But I don’t think I’ve met any.  I could probably go on and on about why pros should blog, but will leave it for now.
How I pick topics
The topics I chose to blog about aren’t totally random.  Sometimes they are something new I figured out solving a problem for myself or a client.  Sometimes they happen to be about an area I want to learn more about.  And sometimes (especially lately) they are to promote my book.
The effort to blog
Blogging is not effortless.  Even a quick post, such as this one will often take an hour of my personal time.  I will organize my thoughts, put together an outline, write a draft, and then review until I’m happy.  Finally I’ll enter it into WordPress and add the appropriate links and such.
Technical blog posts can take hours or days to put together.  I have to review the topic, experiment and figure things out.  Then I can go through all the work to create a post.
What’s in it for the reader
For the reader it’s my hope that some of these my address a specific issue or open you to new ideas or ways of doing something.  But you’ll notice that didn’t make my list of why I blog.  It’s just icing on the cake if someone finds this useful.  If I only posted stuff I thought had a high probability of being useful, I’d be paralyzed and post nothing.  So I post what interests me.
Which brings me to comments
I appreciate comments, esp. those that say they learned something or found a post useful.  That provides incentive to keep posting even if it isn’t the reason I post.
I also don’t mind questions specific to the topic or pointing out alternatives or corrections.  That helps other readers as well.  And when I have the time I’ll go back and correct errors (I have the MongoDB close issue on my list).
But I occasionally get comments totally unrelated to the topic of a post and/or asking for me to help someone with something, usually involving quite a bit of work.  These I don’t care for. It’s like asking your accountant to review a financial deal in their spare time.
That hour to do a blog post?  That’s time I could be doing something fun or interesting that I enjoy.  I could be playing a game with my kids, I could be reading a good book, I could be out on my bike, I could be taking a nap – you get the picture.  So you can imagine where spending an hour or more providing free tech support to someone on my personal blog falls on the list.
Hey, that’s not very nice!
Maybe, but it’s my personal time we’re talking about, so how I choose to spend it is up to me.  I have a very demanding job, am wrapping up a book, have a ton of new things to learn, and a family I like to spend time with.  There are plenty of forums, wikis, IRC chats, Google hangouts, etc. that you can turn to for support.  Use one of those and you’ll get a slew of people ready to help out if you ask nicely (and do a Google search first).  Maybe I’ll see you on one of those and answer your question.
Thank you for being considerate.

Mondrian in Action Update

It’s hard to believe it’s been almost a year since we announced that a book on Mondrian was in the works.  But were finally getting to the point where it feels like it’s almost finished.  We are getting ready to go into the final series of chapter reviews (11 total).  

It’s still going to be a few more months as we finish up the appendices and indexes and update based on reviews and then the production guys make it look nice and finished.  We also know that by the time the book is published some pretty big things are likely to have happened in the Mondrian technology sphere, like 4.0 actually being released and Pentaho 5.0 hopefully being released as well.  But that’s the drawback to technical books. (It gives me interesting stuff to blog about.)

On the whole, I’m very happy with what we’ve put together.  We’ve managed to put a lot of information into the book.  So much so that we’re now looking for ways to trim back to get within our allotted page count.  This book will be a great one to give to anyone who wants to learn about Mondrian and doesn’t want to visit a whole bunch of different sites and blogs.  I hope you enjoy it and find it useful.

 


Using the AnalyzerBusinessGroup annotation in Pentaho Analyzer

A quiet, maybe too quiet, new feature of Analyzer in Pentaho 4.8 was the addition of the AnalyzerBusinessGroup annotation.  This annotation will let you specify that a measure should go into a specific group rather than be lumped in with a bunch of measures.  If you have just a few measures, it’s not that big a deal.  But many users have a lot of measures that can be categorized and it would be nicer to have them in separate groups.  I have not tried this with dimensions, but if you define them correctly it seems that it would be overkill.  I also suspect that Mondrian 4′s Measure Groups will make this obsolete, but don’t know that for a fact.

Using the feature is very simple.  Just add an annotation to the measure and specify the AnalyzerBusinessGroup.  For example:

<Annotation name=”AnalyzerBusinessGroup”>Orders</Annotation>

and

<Annotation name=”AnalyzerBusinessGroup”>Prices</Annotation>

results in the following (using the Steel Wheels example):

Analyzer Groups


Getting the PostgreSQL installer out of quarantine on OS X Mountain Lion

In a noble attempt to start playing around with Pentaho 5.0 (aka Sugar), I downloaded the CI version and PostgreSQL, since it’s replaced MySQL as of 4.8 as the default database for the repository.  After dutifully reading the README file and changing some shared memory settings (don’t skip this step), rebooting, and remounting the install image, I kicked off the install app.  Nothing.  No message, no popup, NOTHING.  Except for a not so helpful message in the Console that indicated one of the install files was in quarantine because of TextEdit.

After a bit of searching, I found a helpful site that told me how to turn off quarantine (search for “quarantine”).  It turns out there is a handy bash command to turn off the quarantine.  “$ xattr -r -d com.apple.quarantine <file-name>”.  The ‘-r’ recurses if this is a file.  So, I cd’d to the /Volumes/PostgreSQL 9.2.2-1 and ran “$ xattr -r -d com.apple.quarantine postgresql-9.2.2-1-osx.app” since a .app file is really a directory.  And …. it failed because it’s a mounted disk without write permissions.

That should have been obvious to me, but it’s late.  So I copied the .app file to ~/Downloads and ran the command again.  After a quick return I ran the install app with no further problems.

Now to do all the cool stuff I had originally planned before getting sidetracked.


Mondrian in Action Discount!

What’s better than getting an early release copy of Mondrian in Action?  How about getting it for 1/2 off!  All the same Mondrian goodness by for only half the price.  Just hop over to the Manning site and and use the discount code dotd1101au.

Hurry, though.  This code is only good on November 1st from 12am to 11:59pm EDT.

And while you’re at it, go see what Julian Hyde, author of Mondrian and co-author of Mondrian in Action has to say.

Remember – 50% off November 1st only with code dotd1101au.


Mondrian in Action – Early Access Edition Now Available

It’s official!  Mondrian in Action is now available as an Early Access Edition from Manning press.  You can order it on the book’s manning page and the first three chapters are available.  I can tell you that about half of the others are really close as well.

So why should you order an early release version of the book?  Here are a few reasons why I think it’s a good idea.
First, if you are new to business analytics, the first three chapters are available and provide a great overview of what Mondrian is and how it can be used.
Second, you don’t have to wait until next spring to get all of the details on Mondrian 4.  You can already download the software from GitHub, so why not get the book that goes with it as it’s written.  There are a number of big changes coming to Mondrian 4 that make is better and different than previous versions.
Finally, and I think most importantly, you will be able to influence the book and make it great.  The authors all know Mondrian, so we naturally think we are doing a great job of explaining it.  But you might disagree and can offer ways to explain it better.  There is an author’s forum that we will be monitoring regularly that allow you to interact with us about the book.  Our ideal is that people will read the book, apply the techniques we describe and give us feedback before the presses start rolling.
In the end Mondrian in Action is only successful if it helps you, the reader.  And we’d really like to get your feedback to make sure that happens.

Changing the JVM Memory Settings for Pentaho Design Tools on OS X

Background

Pentaho ships a number of design tools, such as Pentaho Report Designer, Pentaho Data Integration, and others for Linux, Windows, and OS X.  While the apps appear to be native, they are Java applications that have been wrapped to run in the native environment.

One of the key settings that users often want to change are the JVM memory settings.  These settings indicate how much memory a process should start with and, more importantly, the maximum amount it should use.  If you allocate too little then it’s possible to get OutOfMemory exceptions and the application stops functioning.  This post will show you how to modify the settings.

Finding the File to Modify

The file we want to modify is called Info.plist.  This file contains settings for the application that is being executed and is a standard file in an OS X application. OS X bundles applications into an Application bundle with an extension of .app.  This is really a directory, but FInder and other tools make it appear as if it’s a single file.

There are two ways to get access to the file.  The first is to use Finder.  Find the application file in the design-tools and sub-folders under Pentaho or wherever you installed them.  The application will show up as a single file.  Now right-click on the file and select “Show Package Contents”.  This will open a new view in Finder with all of the contents of the file.

Once you are viewing the contents of the package, select the Contents folder and you should see the Info.plist file.  This is an XML file, so you can edit with any text editor.  If you double-click it will attempt to use XCode if you have it installed.  I recommend using a different text editor, such as TextMate or textedit.

Right click on the Info.plist file and select Open With… and then choose the editor of your choice.  You may have to select Other… and then choose the app if the one you want isn’t displayed.  The file should open up in your text editor.

The alternative to Finder, is to use the Terminal.  This is my preferred approach, but it should only be used by those who are comfortable working from the command line.  You will need a text editor that you are also comfortable using, such as vi or emacs.  I personally prefer vi.

Navigate to the same folder.  The Terminal doesn’t treat the .app file as anything other than a folder, so basic usage of the cd command is enough.  Once you are in the proper directory, edit the file.

Changing the File

Now that you have the file open in an editor, you can make the needed change.  Near the bottom of the file you should find a declaration similar to the following:

<key>VMOptions</key>
<string> -Xms512m -Xmx1024m</string>

These are the settings that will get passed to the JVM when the application starts.  The -Xms setting is the amount of memory used on started.  The -Xmx is the maximum amount to use before errors start happening.   In this example I have 512MB on startup and a maximum of 1024MB (1GB) of memory maximum.  Note that these are increase from the default that I started with.

It’s customary to specify the memory in megabytes, using the 256m where m is megabytes.  It is also customary, although not required, to have the memory e in exponentials of 2, e.g. 256, 512, 1024, etc.  The mx value should always be larger than the ms value.  I personally make it twice as large.

Once you’ve set the values you want, save the file and then restart the application if it is running.  Assuming you increased the values, you should now have more memory.


Using MongoDB with Community Data Access (CDA)

I’ve covered how to get data directly from MongoDB directly into Pentaho Reports.  But what if you want to get them in a dashboard built with Community Dashboard Framework (CDF)?  That’s actually about as simple as getting it into a Pentaho report.

CDF comes with a related technology known as Community Data Access (CDA).  In general, CDA is a technology that abstracts access to data and provides a consistent approach to data access for CDF dashboards.  Among it’s other nice features are that it supports a wide variety of data sources including scripting.  Currently the scripting data source only supports Beanshell, but that’s good enough for communicating with MongoDB.

I’ve used the same database and collection from the reporting example, so you can find the source there.  I won’t go into all of the details of the CDA since you can find those at the WebDetails site.  The significant pieces are that you need to have a Connection of type “scripting.scripting” and then a DataAccess of type “scripting”.  As you can see, the query script is essentially the same as the one for reporting except that it’s in the Beanshell syntax.

<?xml version="1.0" encoding="UTF-8"?>
<CDADescriptor>
 <DataSources>
 <Connection id="mongodb" type="scripting.scripting">
 <Language>beanshell</Language>
 <InitScript/>
 </Connection>
 </DataSources>
 <DataAccess access="public" cache="true" cacheDuration="3600" connection="mongodb" id="mongodb-sales" type="scriptable">
   <Name>Sales via MongoDB</Name>
   <Columns/>
   <Parameters/>
   <Query><![CDATA[
import com.mongodb.*;
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel;
Mongo mongo = new Mongo();
db = mongo.getDB("pentaho");
sales = db.getCollection("sales");
String [] columnNames = {"Region", "Year", "Q1", "Q2", "Q3", "Q4"};
Class[] columnTypes = {String.class, Integer.class, Integer.class, Integer.class, Integer.class, Integer.class};
TypedTableModel model = new TypedTableModel(columnNames, columnTypes);
docs= sales.find();
while (docs.hasNext()) {
 doc = docs.next();
 model.addRow(new Object[] { doc.get("region"), doc.get("year"), doc.get("q1"), doc.get("q2"), doc.get("q3"), doc.get("q4")});
}
docs.close();
return model;
 ]]></Query>
 </DataAccess>
</CDADescriptor>

Now that the CDA data access has been defined you can use the CDA previewer to view the results.  Assuming you’ve used the same data you should see something similar to the following:

Example of the data from MongoDB via CDA.

At this point the data is available for any dashboard that has permission to access the data.

 


Follow

Get every new post delivered to your Inbox.

Join 295 other followers