Getting creative with holidays in dimensions
Posted: December 7, 2013 Filed under: Analytics, Business Intelligence, Dimensions, Tips | Tags: analysis, date, Degenerate dimension, dimension, Dimension table, Extract transform load, Fact table Leave a commentI was recently working with a client and saw an interesting approach to a classic problem dealing with holidays as a dimension. Now maybe this is a common solution, but I hadn’t seen it before, so I thought I’d share. This same solution could be used for similar problems as well.
The goal is to be able to analyze the impact of holidays on various measures. Imagine you are analyzing sales or hours worked. Knowing if a particular day is a holiday is pretty important to understanding spikes in the data.
A first approach might be to have the holiday as a property in the date dimension. That would only work if the data you are dealing with has the same holiday for all data that points back to that particular date. This isn’t even a true case for the United States, where some states have their own holidays, much less on a global scale.
So what this client did, was solve the problem at the ETL level. For each fact, they check with the client calendar and see if the data is for a holiday or not and then set it in the fact table as a degenerate dimension. You could have a separate dimension as well, but they decided to avoid the join that would get created with Mondrian. Just make sure to index that column so you don’t do a table scan when grabbing the members.
A simple solution to what at first might appear to be a complex problem. I like that.
Using the AnalyzerBusinessGroup annotation in Pentaho Analyzer
Posted: February 2, 2013 Filed under: Analytics, Analyzer, Mondrian, Tips | Tags: Annotation, pentaho 1 CommentA quiet, maybe too quiet, new feature of Analyzer in Pentaho 4.8 was the addition of the AnalyzerBusinessGroup annotation. This annotation will let you specify that a measure should go into a specific group rather than be lumped in with a bunch of measures. If you have just a few measures, it’s not that big a deal. But many users have a lot of measures that can be categorized and it would be nicer to have them in separate groups. I have not tried this with dimensions, but if you define them correctly it seems that it would be overkill. I also suspect that Mondrian 4’s Measure Groups will make this obsolete, but don’t know that for a fact.
Using the feature is very simple. Just add an annotation to the measure and specify the AnalyzerBusinessGroup. For example:
<Annotation name=”AnalyzerBusinessGroup”>Orders</Annotation>
and
<Annotation name=”AnalyzerBusinessGroup”>Prices</Annotation>
results in the following (using the Steel Wheels example):
Mondrian in Action Discount!
Posted: October 31, 2012 Filed under: Analytics, Business Intelligence, Mondrian, Uncategorized Leave a commentWhat’s better than getting an early release copy of Mondrian in Action? How about getting it for 1/2 off! All the same Mondrian goodness by for only half the price. Just hop over to the Manning site and and use the discount code dotd1101au.
Hurry, though. This code is only good on November 1st from 12am to 11:59pm EDT.
And while you’re at it, go see what Julian Hyde, author of Mondrian and co-author of Mondrian in Action has to say.
Remember – 50% off November 1st only with code dotd1101au.
Getting session variables when using the PRD scriptable data source
Posted: October 2, 2012 Filed under: Analytics, Business Intelligence, Reporting, Uncategorized | Tags: pentaho, report designer, scritable, session 5 CommentsIn a previous post I was asked in the comments how to use a session variable as a parameter when using the scriptable data source in Pentaho Report Designer. I finally figured it out and thought I’d share with everyone. The solution is relevant for scriptable data sources, not just those that use MongoDB, so I’ll leave that complexity out of this discussion.
Pre-conditions
Before this will work you need to have some way you are getting a session variable set that you want to use. You can create an action sequence that runs when a user logs in or, if you are using single sign-on, you can set it during the log in process that way. These are describe other places, so I won’t go into how to set the session variable. For the sake of this example, lets assume you have somehow set a session variable for a user called “Region” that has the region that applies to the user, North, South, East, or West.
Create a Report Parameter
The first thing to do is to create a report parameter that will get the session value. Then set the Name and Value Type as appropriate. The key step is to set the Default Value Formula to =ENV(“session:Region”). The ENV function will get the session value for the attribute with the name “Region”. You should also set the parameter to be Hidden by checking the box, although while testing it can be handy to have it unchecked. Note that if you preview in report designer this will have no value (there are ways to set it), so a default value can be handy. I don’t recommend deploying to production with a valid default, though.
The following figure shows getting the Region value from the session.
Using the Parameter
Using the parameter from your script is simple. The scriptable interface provides a dataRow object with a .get(String name) command to get the value. So, to get the value of Region at run time use the following line (in Groovy):
def region = dataRow.get(“Region”)
Then just use the value in the script.
Mondrian and Pentaho Analytics – A book is in the works
Posted: April 20, 2012 Filed under: Analytics, Business Intelligence, Mondrian | Tags: Julian Hyde, pentaho 3 CommentsOne of the biggest challenges many new users to Pentaho have is finding good documentation. A variety of documentation exists, but it’s scattered among various sources, such as the Pentaho Knowledge Base, the Mondrian site, the Pentaho wiki, and a variety of blogs. While much of this content is good, wouldn’t it be nice to have one place to go to for all things Mondrian? I’m happy to say that a book is on the way.
It’s still very early in the process, but Julian Hyde, and Nick Goodman, and myself are collaborating to put together a book that covers the nuts and bolts of Mondrian including:
- Setting up and running Mondrian
- Creating schemas
- Scaling Mondrian
- Security
- Integrating Mondrian with applications and enterprise tools
The as yet to be named book will be available from Manning and we hope to have some early access versions available this summer. Early feedback is always welcome so we can make this the best book possible.
If there are specific topics you’d like to see in the book, please mention them in the comments.
Using Memcached with Pentaho Analysis EE
Posted: October 11, 2011 Filed under: Analytics, Business Intelligence 3 CommentsBackground
Mondrian Caching
- Query cache – This cache holds the data for the current query being executed. Once the query is done, the cache releases the data.
- Local cache – This cache resides in the VM with Mondrian. It uses weak links to the segments being stored. When the garbage collector is run, these cached segments can be garbage collected, meaning the next query that could have used the cached data will have to go back to the database and perform all the calculations. Experience has shown that segments in the local cache are kept around for 2-3 hours before they are usually GCed, but there is no guarantee.
- External segment cache – This cache is the subject of this post. The external cache keeps segments using one of the approaches mentioned above. While these segments can eventually be removed to free up space for new segments, the fact that they can be made very large means that they will be kept around for longer and they are not automatically garbage collected.
Caching with Memcached
Configuration of Pentaho Analysis EE
Installing and Configuring Memcached on Ubuntu
$ sudo apt-get install memcached
- -m <number> This setting tells how much RAM to use for memcached. This is the max that it will use before dropping data. Memcached will use more than this, so you don’t want it to use all of the RAM. However, if the node isn’t doing much else, then I’d use most of the available RAM.
- -l <IP address> By default this is set to 127.0.0.1. The memcached comment indicates that this means it will listen on all IP addresses. However, I was unable to get it to work without changing this value to the static IP of the machine. Forum posts on various locations also indicated that use of the localhost address would cause memcached to not listen to external requests.
- -p <port> The default port is 11211. There is no reason to change it unless you are running multiple memcached instances on the same machine or some other process is using this port.
- -v, -vv, -vvv These flags cause memcached to be verbose by increasing levels. These are commented out. In a production system I would leave them commented out, at least after initial use, since they could cause the logs to get quite large. However, I like using them during initial setup and test. They indicate that there is activity, which can be used to confirm that Mondrian is properly configured.
That’s really all there is to configuring memcached. Restart the machines and use “$ ps aux” to verify that memcached is running. You can also use “$ netstat -l” to make sure the process is listening on the correct port.