A funny thing happened on the way to Hadoop adoption. Several industry leaders agreed to a core set of common components to help enterprises adopt value added solutions above the open-source core with confidence and more ease of use, and there has emerged a very interesting backlash from parts of the open-source community who viewed this move as a ‘power grab,’ or ‘by the vendor, for the vendor.’ The theory is that by establishing a common core set of Hadoop components into this “Open Data Platform,” or ODP, that these vendors (Hortonworks, Pivotal and IBM, primarily) are essentially stripping choice from the world and making it a very bland place, while heralds of freedom (primarily Cloudera) steadfastly defend us from predatory vendors. As I said, it is an interesting debate. Who’s right? Who’s wrong? Do we even care?
To set the stage, open-source has created this bold, wonderful and transformative world that we live in today. Hadoop has revolutionized our approach to dealing with data. We can now gain insights where once we were blind, and we can leverage a breadth of data that has opened our minds to the possibilities of solving many of the world’s largest problems. With these open-source components, GE seeks to improve energy efficiency globally, while John Deere seeks to improve crop yields. While these are also good for these companies commercially, it has opened opportunities to understand customers for better marketing, helps us deal with fraud and security concerns … these are a bit more ‘big brother’ in nature, but solve key problems for enterprises and consumers. Companies serve their customers better, resources are consumed more efficiently, ne’re-do-wells are intercepted, and, yes, profits improve.
To achieve these goals, there is a great and growing sea of Hadoop components. As an abbreviated list, the Hadoop eco-system is now composed of Pig, Hive, HBase, Storm, Spark, Sqoop, Flume, Kafka, Yarn, Falcon, Knox, Ranger and many more. The challenge for companies without large teams dedicated to working with this diverse and growing ecosystem revolves around integration and managing all of the builds, versions, and matching components effectively to solve the challenges that companies face. Without a team of experts, these companies must rely on vendors with specialized expertise, and as such, a nice market has emerged to supply that talent. Additionally, vendors build value-added software on top of these core components (such as SAS, HP, IBM, Microsoft, Terradata and many other analytics vendors). As a vendor, what pieces and parts, versions, etc. do you place your bets on? What is your testing like if you plan to support multiple versions of every component in an effort to serve a broad market? To solve for this complexity, vendors like Cloudera, Hortonworks, Pivotal, IBM and others will release packages of integrated components to make things easier for customers to consume. As an example, Cloudera Enterprise packages a version of HDFS, Map-Reduce, HBase, Impala, Cloudera Search, Cloudera Navigator, and Apache Spark (to name a few). This integrated package makes things easier for an enterprise as they contract with Cloudera for the expertise to service this complex stack. “Don’t assemble the components yourself as Cloudera has already done it for you.” The only downside is that it locks the vendor or customer who builds software on top of the stack into that vendor’s view of the world. Hortonworks, IBM and Pivotal (and many others) are all game competitors in this space who could also serve the same customer, but their underlying open-source selections may be different (Kafka, Knox, Ranger, etc.) and the ones that are consistent likely are on different versions and mixes of components. Could a customer swap out the Enterprise Data Hub open source components with divergent builds? Perhaps. Does the new vendor support the same versions? Do we have the expertise to pull this all apart and put it together again without breaking any of the integrations that we now depend on? And if we do, why are we paying anyone else to service this stack?
The challenge then is that once I have placed a bet on a particular vendor, I am placing a bet on a particular bundle of components. The theory behind ODP, then, is that if we aggregate a common core set of components between industry leaders, then perhaps a customer can make a bet on a bundle that many vendors support. A customer, then, could build on Pivotal’s stack (and value-added management, analytics libraries, WAN and eventing capabilities), but if they become sold on something IBM is doing, they are not locked on the big data stack level. The ODP bundle supporting the solutions you build is consistent below your company’s/vendor’s capabilities. I saw some comments from a detractor who implied that this is similar to McDonald’s, In-and-Out Burger, and Burger King agreeing to build the same burger. While a compelling visual, it is more like those vendors agreeing that they will all use beef for their burgers. The analogy is limited, of course, in that I wouldn’t be looking to swap beef from one reusable burger to another, but the point is that what differentiates the vendors is service to their customers and the value-added preparation of the core components (management, analytical engines, dashboards, libraries of machine learning algorithms, etc.). At the next level down, contributors to the underlying open-source components differentiate themselves based on adoption by the largest set of customers.
So, does this mean customers no longer have choice? Hardly. If a vendor builds to the ODP platform, it simply implies that a common set of core components, at X versions, on Y release schedules are certified. Can you assemble them without a participating ODP vendor? Of course. Is open-source innovation on these individual project levels stifled? Why would that be? If more companies adopt the open-source components because of this approach, I would expect increased investment, not less. Besides, ODP is not a ‘fixed’ ecosystem. Innovators will build additional compelling capabilities, and ODP will flex and morph.
Ultimately, the jury is still out. A really good friend of mine challenged me recently with a question: “Isn’t this just by the vendors for the vendors?” That really challenged me to reconsider the position until I realized that he is exactly right. It is geared to solving the challenges of vendors. But who are the vendors? The list includes IBM, Pivotal and Hortonworks, but also GE, Dell, Comcast, Verizon, AT&T, silicon valley start-ups, SaaS vendors and the hundreds of other companies that seek to build value-added capabilities that leverage big data stacks.
Undoubtedly, some customers/vendors will build up their own expertise and manage their own integration and operational challenges with regards to these components. This is a great approach if your company differentiates itself through these efforts. Everyone else, those customers who do not differentiate their brand based on the minutia of operating the big data stack, will have a choice to bet on vendors that support ODP or not, and there will be plenty of criteria that will impact that decision-making process. Regardless, ODP is not the tyrannical takeover bid it is portrayed to be by opponents. Innovation will still proceed at the individual project level, and I expect projects to come in and out of the ODP, but what a change can be brought to the industry if it does make adoption less risky!
Is ODP evil? Is it good? Neither? Ultimately, you (the industry) will decide … and perhaps you’ll even comment!
Leave a comment