"An extraordinary thinker and strategist" "Great knowledge and a wealth of experience" "Informative and entertaining as always" "Captivating!" "Very relevant information" "10 out of 7 actually!" "In my over 20 years in the Analytics and Information Management space I believe Alan is the best and most complete practitioner I have worked with" "Surprisingly entertaining..." "Extremely eloquent, knowledgeable and great at joining the topics and themes between presentations" "Informative, dynamic and engaging" "I'd work with Alan even if I didn't enjoy it so much." "The quintessential information and data management practitioner – passionate, evangelistic, experienced, intelligent, and knowledgeable" "The best knowledgeable, enthusiastic and committed problem solver I have ever worked with" "His passion and depth of knowledge in Information Management Strategy and Governance is infectious" "Feed him your most critical strategic challenges. They are his breakfast." "A rare gem - a pleasure to work with."

Wednesday 27 November 2013

Bill Inmon agrees with Aussie-based Scotsman!

Big ambitions for Big Data? Be prepared for Big Problems...

I was excited to be able attend the recent Enterprise IQ Future Enterprise summit held in Sydney. The event was up to Enterprise IQ's usual high standard, with a mix of great keynote speakers, breakout sessions, discussions and exhibitors.

Martin Rennhackkamp has already posted an excellent overall summary of the general proceedings (saving me the trouble, thanks Martin!), so in one respect it only remains for me to offer congratulations and thanks to Daniel McMurray and the team for having me involved.

However, I cannot go without exploring one aspect of the conference in some more detail. Bill Inmon's keynote speech was of course much anticipated, and I would be pretty surprised if anyone reading this blog was unaware of Bill's work and his impact on IT and Information Management. (If you've been under a rock for the last twenty-odd years, Bill is generally heralded as "The Father of Data Warehousing", and together with the contributions of Ralph Kimball, his seminal works have laid the foundations for and entire industry and thousands of careers, including my own.)

Now, I had never seen Bill present before, so I don't know if his contribution was typical or if he was just having a particularly "engaged" day (jet-lag probably contributes!). But I was totally unprepared for, and blown away by, the strength of opinion that he offered on the current wave of "Big Data".

Boy oh boy, did he launch into one! I think it's fair to say that Bill Inmon is not a fan of what is currently going on in the Information Management sector, and with Big Data in particular. It was certainly gratifying to find that most of Bill's points pretty much aligned with my own points of view on Big Data (obviously he's a smart fellow...)

Observations that resonated with me included:

  • The current technologies are generally not living up to the hype.
  • "Big Data" vendors are not engaging with (and don't understand) business problems.
  • Data processing methods based on programming intensive techniques (Hadoop/MapReduce etc) are not extensible or flexible enough.
  • The dependencies on "Data Scientists" are unsustainable and not scalable. Where are all these in-demand gurus coming from? 
  • There is an invalid assumption within semantic and natural language processing that context can be inferred from the words alone. 
For me, it was this last point that was most revelatory. 

Bill highlighted that while "traditional" data warehousing approaches create and imply context and meaning for the data by means of a structured data model, the "Big Data" approach does not impose such structure and a lot more work is required in order to contextualise the data, perform text disambiguation and make it usable (machine readable). 

Aspects that need to be considered in the data preparation and parsing steps are:
  • Defined taxonomies and ontologies.
  • Homographic resolution (words that are written the same, but have different meanings).
  • Deriving meaning of terms based on their textual proximity to other items.
  • Document metadata.
  • Acronym resolution.
  • Inference of additional or missing information from surrounding content.
  • Interpretation and decryption of encoded data streams.
Bill is clearly of the view that these functions can be systemised, configured and made repeatable. Indeed, they must be moved away from custom data processing and into encoded, re-usable tools and data products if we are to really start harnessing the benefits that Big Data promises. This is the direction that Bill is taking with his company Forest Rim and their Textual ETL tool. 

I await further developments with anticipation. Given his track record of predicting market trends ahead of the curve, I guess I'll return the favour and say that on this occasion, I agree with Bill's point of view...!


Footnote:
My thanks also to Neil Currie and his team at QFire Software for inviting me to be a guest at their post-conference networking event. Over a quiet drink or two, I was delighted to have the opportunity to meet Bill in person, enjoy panoramic views of Sydney Harbour and exchange a few views and opinions (mostly on the relative merits of living in Sydney versus Bill's home of Colorado!).

1 comment: