Self-Service Data Presentation: Data Quality, Lineage and Cataloging

Posted by Adam Diaz on Oct 21, 2016 11:26:14 AM

When an organization has mastered the use of automated data ingest and the appropriate application of metadata, there are a number of additional concerns to be addressed with using data at scale. These include data quality, data lineage, and a searchable data catalog. All of these are factors in presenting an effective and useful data catalog. The data catalog is the foundation of the self-service capability for a business-facing data presentation and transformation layer.

Read More

Topics: Big Data, Data Lake 360

Zaloni, Dell EMC Collaboration for an End-to-End Big Data Solution

Posted by Tony Fisher, SVP of Strategy and Business Development on Oct 20, 2016 12:33:38 PM

The Dell EMC™ Analytic Insights Module (AIM) leverages best-in-breed technologies in data management, storage, security, governance and visualization, and brings them together in an all-in-one package that makes it easier and more cost-efficient for more businesses to finally get the value from big data they’ve been looking for.

Read More

Topics: Announcements

Zaloni Zip: A Breakdown of Data Lifecycle Management

Posted by Parth Patel on Oct 17, 2016 2:41:00 PM

Data Lifecycle Management optimizes utilization of HDFS by leveraging the tiered storage solution provided by Hadoop. You can optimize big data storage based on the frequency of data usage, thereby reducing the cost in an effective manner. By implementing tiered storage, data files that are not used frequently, are stored in nodes with higher density, low compute power, and low cost.

Read More

Topics: Big Data, Bedrock, DLM, Zaloni Zip

6 Big Data Transformation Strategies in Telecommunications

Posted by Rituraj Sarma on Oct 14, 2016 1:27:29 PM

People are constantly connected to their networks through voice, text, and other smartphone interactions. This means that telecommunications companies (telecoms) have access to huge quantities of data and are metaphorically “sitting on a gold mine.” These companies require proper digging and analysis of both structured and unstructured data to get deeper insights into customer behavior, including service usage patterns, preferences, and interests in real-time. To address these requirements, here are some of the savviest big data solutions in telecom that traditional storage and analytics approaches cannot provide.

Read More

Topics: Big Data

What's to Love about Bedrock 4.2 and Mica 1.2

Posted by Scott Gidley on Oct 12, 2016 8:24:11 AM

While we were at Strata + Hadoop World New York last month, we announced two new releases: Bedrock 4.2 and Mica 1.2. We’re thrilled about these releases because they further our mission to bring the data lake approach into the mainstream by providing critical data governance, security controls and intuitive UI.

Read More

Topics: Announcements, Product Updates, Bedrock, Mica

Part Two: Migrating On-Premises Data Lakes to Cloud

Posted by Kannan Rajagopalan on Oct 10, 2016 3:34:08 PM

Migration Objectives

In the first blog of this series, we discussed some of the key drivers for a Cloud Data Lake such as:

Read More

Topics: Hadoop Expert, Data Lake, Cloud

Pig vs. Hive: Is There a Fight?

Posted by Monoj Gogoi on Oct 5, 2016 2:57:20 PM

Pig and Hive came into existence out of the sole necessity for enterprises to interact with huge amounts of data without worrying much about writing complex MapReduce code. Though it was born out of necessity, they have come a long way to run even on top of other Big Data processing engines like Spark. Both these two components of the Hadoop ecosystem provide a layer of abstraction over these core execution programs. Hive was invented to give people something that looked like SQL and would ease the transition from RDBMS. Pig has more of a procedural approach and it was created so people didn’t have to write MapReduce in order to manipulate data.

Read More

Topics: Hadoop Expert, Big Data

Machine Learning: How to Master the Basics and Transform your Dataset

Posted by Jean Georges Perrin on Sep 28, 2016 9:18:56 AM

You might be familiar with various number puzzles on LinkedIn. Although some might complain about how they disrupt their LinkedIn news feed (e.g. “This isn’t Facebook!”), the puzzles are designed to trigger your intelligence or challenge your neurons.

Read More

Topics: Data Science, Machine Learning

Part One: Data Lakes in the Cloud

Posted by Kannan Rajagopalan on Sep 26, 2016 9:11:55 AM

Read More

Topics: Hadoop Expert, Data Lake, Cloud

Chlorine for your Data Swamp: Four Key Areas for Automation

Posted by Adam Diaz on Sep 22, 2016 3:10:38 PM

Maybe we’re talking more about algaecide and not chlorine, but microbiology aside, a data lake often gets rather cloudy and disorganized shortly after being opened for use. Hadoop’s promise of schema on read lures many in, but often ends up forcing a soul-searching reevaluation of one’s principles related to data management -- not to mention a new strategy (and cost) for cleaning up a swampy data lake.

Read More

Topics: Hadoop Expert, Big Data, Data Lake