Saturday, May 9, 2009

Pentaho Kettle doesn't cut it - update

Admittedly I'm a Pentaho newbie, trying to evaluate the software and to some 'simple' tasks. The demo's and samples worked well, but the put the test Kettle just didn't add enough value as an ETL to warrant its use. To be fair, I have used WebMethods and some other tools but have usually found it more simple to just write the code. There always seems to be something unique enough that the tool just can't handle where writting the code work, or worst yet a defect that you can't work around.
In the case of Kettle, I just have a problem I can't work around. Its open source so in theory I can debug the original code and create a patch. Unfortunately I was trying Kettle because I thought it would be simple and I was in a hurry. So I don't have time to debug at that level, since writing the code might take as long as writing the blog entry. Because I didn't buy the Pentaho version, I don't have a support option.
Here is what I wanted to do. Open a file that included a single string that was compressed with gzip. That string contained XML and I wanted to put that XML into a database. Seemed simple. I used the load file and gzip decompress, did a preview of my decrypted string and seemed to look good. I had to sent it to Javascript to make a modification to the XML before I could parse the XML. That is where the trouble started. The resulting string was not whole, it had parts missing. I couldn't figure out why, but Kettle is truncating my string at 4600 char when it really has closer 8600. I tried setting the length of the field to 10000, but that did not work. I gave it 4 hrs of debugging and some time on IRC but there is no obvious solution yet.
Reading the 'documentation', I use that term loosely, did not help with this problem. In fact even for open source documentation, there were a lot of issues. I understand that the wiki is going through some changes, so stuff is moving around in an attempt to make the documentation better. Clearly the code is moving faster than the documentation. So for now I'm going to post the issue to the forum, and write my own because I have a job that needs this done!

Update:
I wanted to update this post based on what I did to resolve the issues I was having. I did join the IRC Channel, which I would recommend, but that did not solve the problem. The string problem so far as I can tell is truly there. What I did to solve my particular challenge is make the make the XML well formed via a bash script before loading it into kettle and then use the XML input functions. That worked like a champ in under 5 minutes. That said Kettle was easily able to deal with the XML and create the output I wanted.
Looking forward I will continue to work with the Pentaho toolset. Hopefully the work on the documentation by community will continue to pay off for the tool. I have for my part tried to update the Pentaho wiki when I could. I will continue to do that when I can, or post what I can here. This is not a simple project to document, it changes quickly and getting started is considerable undertaking.
To summarize, I'd say if you are working on a straight forward task the tool performed well. The debug features and the documentation are two areas that require improvement in order to be commercial class tools in my opinion. The debug features and improvements would probably be the most helpful if I had to prioritize them.

2 comments:

  1. Well I did tell you the solution today on IRC. It was working fine for me, don't know how you were doing it, but it was not a big deal. I do agree that documentation is a problem, but if you try to put some effort into explaining the topic (which you did) you can get good help from the community in the forums/on IRC

    ReplyDelete
  2. Somewhat agree - Changing the size of the field in kettle did not result in the string problem going away. That said it really wasn't Kettle's fault that I had to attempt to manipulate the string in this way. Once the XML was corrected, everything worked great. The help in IRC is appreciated!

    ReplyDelete