Don’t be Fooled by Hadoop’s Complexity: A Guide on How to Avoid Big Data’s Pitfalls

Digital Content Market

Following is the second installment in a three-part series from BlueData on the complexity of Big Data and Hadoop implementation. Check out their previous post, entitled “Big Data, Big Opportunity Cost”.

Join us next week (April 8th), when BlueData’s co-founder and CEO, Kumar Sreekanti, discusses “Hadoop-as-a-Service:  Cost, Agility & Results that enterprises should expect demand”.

In the meantime, check out Technavio’s report on the Global Hadoop as a Service Market.


Last week, my colleague Greg Kirchoff outlined the state of the union with Big Data in the enterprise – and it is not pretty. While the possibilities with Big Data are tantalizing, the reality is that current approaches have left business users frustrated and IT teams overwhelmed with the complexity.

Enterprises of all sizes have identified a clear need to streamline their Big Data initiatives. This is an open challenge to the Big Data community.

Technavio’s March 9th article, “Big Data Demand Boosts Hadoop as a Service” outlines a trend with SMEs that could be the foundation for new approaches for enterprises of all sizes.

Talk to any data analyst (or scientist) in any large enterprise and they will confirm their frustration with on-premises Hadoop. Dig deeper and you will find that they find the consumption model for Big Data in public clouds like Amazon, Azure and Google Compute very appealing in terms of agility and time to insight.  

So the obvious question becomes, why not bring the Hadoop-as-a-service approach (i.e. Amazon EMR-like experience) on-premises, and attack the common pitfalls of Big Data that are the root cause of complexity and cost?

Hadoop

This proven Hadoop-as-a-service approach for on-premises systems not only addresses the common pitfalls but also ensures enterprises are able to quickly, cost-effectively and consistently derive value from their growing data sets.

Key benefits include:

Make Big Data available to everyone. As more stakeholders within an organization realize the potential of Big Data and demand access to a company’s resources, a Big Data infrastructure must be flexible enough to accommodate all their needs.   

  • Simplify the complexity of provisioning Big Data jobs so that non-experts have the capacity to build and manage their own jobs without the cost and expertise of Big Data infrastructure specialists.
  • Permit the simultaneous provisioning and running of multiple clusters so that users throughout the organization may create their own Big Data jobs as needed instead of waiting weeks or months.
  • Permit the use of any Big Data application (e.g. Hadoop, Spark, BI/ETL tools). Most Big Data infrastructures restrict an enterprise’s choice of applications to those that are provided by a specific vendor’s Hadoop distribution. A truly flexible infrastructure allows an organization to take advantage of any Big Data application available across the open source community.                              

Separate compute and storage. Infrastructure must be flexible enough to allow an enterprise to fully disconnect analytical processing from data storage. This sort of separation allows an organization to:              

  • Access data directly from the organization’s enterprise storage systems and avoid the expensive and time- consuming step of copying data to an HDFS prior to running any analytics.                                               
  • Keep sensitive data within an organization’s secure enterprise storage systems                 
  • Independently scale compute (CPU) and storage on an as-needed basis.

Take advantage of Big Data virtualization. When virtualization is wedded with the performance of an adaptable Big Data framework for Hadoop-as-a-Service on-premises, you can streamline operations and lower costs.                

  • Experience enterprise-grade data security not available through an IaaS or PaaS running in a public cloud.
  • Quickly scale operations as needed to take advantage of dynamic market conditions.
  • Realize significant capital expenditure savings.              

In conclusion, I leave you with this final word.  Hadoop’s complexity shouldn’t be the April Fools joke to your on-premises Big Data initiatives.  You can realize Amazon EMR-like simplicity while increasing performance, security, agility and speed while unleashing Big Data’s value to everyone within your organization.