Dear Kettle fans,

As expected there was a lot of interest in cloud computing at the MySQL conference last week.  It felt really good to be able to pass the Bayon Technologies white paper around to friends, contacts and analysts.  It’s one thing to demonstrate a certain scalability on your blog, it’s another entirely to have a smart man like Nicholas Goodman do the math.

Sorting massive amounts of rows is hard problem to take on.  Making it scale on low-cost EC2 instances is interesting as it proves a certain level of scalability.  Nick ran 40 EC2 nodes in parallel to do the work and saw that it was good.  450,000 rows/s for $US 4,00/hour is not bad. Note: the tests sort 300M (50), 600M (100) and 1.8B (300) line-item rows from TCP-H respectively.

For certain, the paper seemed to make it easier for me to point to PDI scalability and it opened some doors for further testing on big iron at Sun Microsystems.  It was great to talk to so many people.  I even walked up to the Amazon Web Services booth at the expo to ask about the performance bottleneck in the EBS that was exposed by the white paper.  “It’s being worked on” was the reply :-)

The most interesting thing about the PDI cloud integration work is that there don’t seem to be a lot of other ETL tool vendors doing it.  In fact, after a Google or 2 I could only find Informatica with a Saas (not even IaaS) offering and I kinda doubt that closed source software is a good match for cloud computing.

So I went out there and did a presentation on the subject to explain to people how they would set it up for themselves.  The open source way is to not only do the marketing but to allow people to run their own tests and see for themselves.  That way you get valuable feedback to improve your offering.

Here is a copy of the presentation I gave: Cloud Computing with MySQL and Kettle.

I thought it was a good session although for once I didn’t get “The Question”, you know the one where people ask me how Kettle is different from Talend and where I get to comment on their lack of scalability.  Oh well, I guess you can’t win them all :-)

Finally, people have been asking me about integration with both SQLStream on the one hand and MapReduce/Hadoop/Hive/HDFS on the other hand.  I’m happy to say that the former is in progress and that I’ve started talks with the fine folks from Cloudera to get started on the latter.  I simply loved Aaron Kimball’s tutorial @ MySQL Conf on the MapReduce subject and think that there is a lot of potential for integration with PDI to make us scale even better.

Until next time,

Matt

Dear Kettle & MySQL fans!

I’m really looking forward to go to the MySQL User Conference next week, not just because I’m speaking in 2 sessions again, but perhaps also because these are “interesting” times for MySQL and Sun Microsystems.  Pivotal times it would seem.

Here are the 2 sessions I’m going to do:

  • Cloud Computing with MySQL and Kettle : I’m particularly happy that MySQL accepted this session: it will demonstrate how easy it has become to do cloud computing exercises with tools like MySQL and Kettle.

So please drop in on our sessions and join the fun.  2 years ago my sessions drew quite a crowd and so I hope that this is again the case.  Pentaho is a sponsor of the event and even has a booth (#308) on the main show floor.  You can find me there to chat on Tuesday & Wednesday afternoon (1pm-4:30pm).  I’ll be there together with a group of people from Pentaho including Julian Hyde, James Dixon, Lars Nordwal, Lance Walter, Matt Papertsian & Jared Cornelius.

On Thursday I’ll be visiting the sages from SQLStream in the morning to talk about integrating their technology to create truly real-time data integration solutions without the need to fork over insane amounts of money.  Later that day we’ll all go see John Sichi’s session at the nearby (same building) Percona Performance Conference.

See you soon!

Matt

With all the traveling I forgot to blog about the Pentaho Data Integration 3.2.0-RC1 release.

Grab the goodies on Sourceforge!

Until next time,
Matt

…is a session being held at EDGEucate 2009 in Plano, Texas, USA between 28th and 30th October. Even though I can’t be there in person, it will be fascinating to understand what CA Gen can do here.

In the post “Cloud Computing” I wondered if, and indeed, how, CA Gen could be used to deploy into a cloud infrastructure, since CA Gen’s major benefit is “Write Once, Deploy Many”, this could be “Write Once, Deploy Many (oh- and into a cloud as well)” -  this may well give me the answer.

Leveraging all of what’s been written in CA Gen over the years into a cloud would seal it as “probably the best development tool in the world”. I’m not sure what other development tools could take one piece of code and use it one a mainframe batch environment, through client/server, through Java, through to a cloud deployment.

My supposition is that it could be deployed into a cloud using an extention of the component-based-development features that it already has, simply wrapping up the component in a custom framework – we shall wait and see.

Join the forum discussion on this post

– (1) Posts

AbendIT.com
Abend IT has launched!

What is Abend IT ? – A Dutch Service provider based in Utrecht, specialising in CA Gen (but other technologies too).

Who is Abend IT ? – Willem Slob and Martijn van den Burg

How to contact them ? http://www.abendit.com

They’ve got a really good “spin” for their site – some simulated Gen code spelling out their name, address and contact details – wish I’d thought of that !!

Join the forum discussion on this post

– (1) Posts

I’ve collected together some bits of gentalk.biz which are associated with so-called “social media” or “interactive media”  and put them together on this new “Interactive Media Page“. There’s a new URL for it : http://interactive.gentalk.biz

There, you can easily follow gentalk on Twitter, join Second Life, and get easy RSS feeds of both the blog and the forum, and more…..

More will be added soon, just to make it easy for you to get the gentalk.biz information quickly and easily.

If anyone would like a specific feed, drop me a comment and with any luck it will be done for you.

Join the forum discussion on this post

– (1) Posts

ARIKAN Productivity Group (APG) is the latest official partner of CA – their ModelCVS technology has been stamped as CA Smart.

APG specializes in CA Gen. Their technologies contribute to the success of CA Gen in software modernization,state-of-the-art tool integration with Eclipse Modeling Framework and additional generations like UML2 diagrams.
Read the full testemonial here at CA’s partners site.

Join the forum discussion on this post

– (1) Posts

With the announcement here and here that CA are supporting Amazon’s EC2 (Elastic Compute Cloud), one wonders how easy (if at all) it would be to deploy a CA Gen application into such an infrastructure.

I’ll do some reading and find out, since the whole concept of cloud computing is of interest.

It’s good to see that such infrastructures are being supported by large vendors, and CA in particular.

Will CA support Microsoft’s Azure platform, I wonder ???

Join the forum discussion on this post

– (1) Posts

Register for EDGEucate 2009 here and EDGE EMEA 2009 here

Hurry as there are discounts for early registration !

Join the forum discussion on this post

– (1) Posts

The latest edition of the IET newsletter “SpotlIET” is avaiable for download, as is the new edition of the Jumar Solutions SUMMIT newsletter.

IET focus on the release of GuardIEn 7.8 – specifically, the Release/Environment Preparaton wizard, together with more exciting new features contained within, while Jumar look at the new Worldwide Vendor Services Agreement with CA, and how it will positively affect CA’s ability to service customers around the world at the drop of a hat with Jumar’s extensive knowledge and resources.

These are exciting times within the Gen community, since both companies will be participating in the EDGE conferences, AND the Gen r8 Beta programme.

Join the forum discussion on this post

– (1) Posts