5 Guidelines for Building a Successful Data Catalog

Posted by Greg Wood on Feb 21, 2017 8:17:35 AM

At times, the search for a perfect data catalog can seem like finding the hay in a needle stack (not only difficult, but painful!). Each stakeholder has equally demanding and disparate sets of requirements for success. Where business analysts want a slick, refined, and easily navigated UI with easy export capabilities, data scientists might refuse to accept anything that does not allow custom-tailored queries, connections to their favorite notebook, and unburdened access to all of the data that has ever existed in the data lake. Meanwhile, the security group wants none of this! Exposing the data at all is a non-starter.

Read More

Topics: Hadoop, Big Data Ecosystem, Data Management, Data Governance

New Releases of Bedrock and Mica Expand Data Lake Beyond Hadoop

Posted by Kelly Schupp, VP, Data-Driven Marketing on Feb 9, 2017 9:33:09 AM

With our latest Bedrock and Mica updates, we’re pushing the boundaries of what has up until now typically defined a data lake: Hadoop. Why are we moving in this direction? Because it makes sense for our clients, who need a solution to centralize management of data from siloed data systems, legacy databases and hybrid architectures. Our solutions support the concept of a data lake beyond Hadoop to encompass a more holistic, enterprise-wide approach. By constructing a “logical” data lake architecture versus a physical one, we can give companies transparency into all of their data regardless of its location, enable application of enterprise-wide governance capabilities, and allow for expanded, controlled access for self-serve business users across the organization.

Read More

Topics: Hadoop, Big Data Ecosystem, Bedrock, Zaloni News, Data Lake, Data Management, Mica, Data Governance, Metadata Management

Up Your Game: How to Rock Data Quality Checks in the Data Lake

Posted by Adam Diaz on Feb 7, 2017 2:52:06 PM

Common sense tells us one can’t use data unless its quality is understood. Data quality checks are critical for the data lake – but it’s not unusual for companies to initially gloss over this process in the rush to move data into less-costly and scalable Hadoop storage especially during initial adoption. After all isn't landing data into Hadoop with little definition of schema and data quality what Hadoop is all about? After landing data in a raw zone in Hadoop the reality quickly sets in that in order for data to useful both structure and data quality must be applied. Defining data quality rules becomes particularly important depending on what sort of data you’re bringing into the data lake; for example, large volumes of data from machines and sensors.  Data validation is essential because it is coming from an external environment and it probably hasn’t gone through any quality checks.

Read More

Topics: Hadoop, Big Data Ecosystem, Bedrock, Data Lake Solutions, Data Warehouse, Data Lake, Metadata Management

How Data Lakes Work

Posted by Ben Sharma on Jan 26, 2017 8:06:19 AM

Excerpt from ebook, Architecting Data Lakes: Data Management Architectures for Advanced Business Use Cases, by Ben Sharma and Alice LaPlante.

Read More

Topics: Hadoop, Ben Sharma, Big Data Ecosystem, Data Warehouse, Data Lake, Data Management

The Executive Guide to Data Warehouse Augmentation

Posted by Rajesh Nadipalli on Jan 19, 2017 3:12:13 PM

The traditional data warehouse (DW) is constrained in terms of storage capacity and processing power. That’s why the overall footprint of the data warehouse is shrinking as companies look for more efficient ways to store and process big data. Although data warehouses are still used effectively by many companies for complex data analytics, creating a hybrid architecture by migrating storage and large-scale or batch processing to a data lake enables companies to save on storage and processing costs and get more value from their data warehouse for business intelligence activities.

Read More

Topics: Hadoop, Data Warehouse, Data Lake, Data Management

Tips for Trouble-Free Data Lake Ingestion

Posted by Adam Diaz on Jan 12, 2017 3:49:32 PM

Data ingestion is about much more than getting your data into the data lake. Think about it this way: designing your ingestion process is like setting up a digital “factory,” with inputs and expected outputs. That’s why when the data comes in, you have to be able to monitor whether your factory is delivering outputs reliably and consistently. You need to be able to direct the data to the right place as it is ingested, and move it along the “assembly line.” Also, you need to know in real time when something breaks down and diagnose it accurately, so that you can get your widget-making processes up and running again, fast.

Read More

Topics: Hadoop, Big Data Ecosystem, Data Lake, Data Management

Data Management and Governance in the Data Lake

Posted by Ben Sharma on Jan 10, 2017 9:25:20 AM

Excerpt from ebook, Architecting Data Lakes: Data Management Architectures for Advanced Business Use Cases, by Ben Sharma and Alice LaPlante.

Read More

Topics: Hadoop, Ben Sharma, Big Data Ecosystem, Data Lake, Data Management, Data Governance

Managing Memory is Easier Using YARN

Posted by Adam Diaz on Dec 20, 2016 8:41:36 AM

There is a long list of items that can be tuned in Hadoop, but understanding how each daemon uses memory in Hadoop is fundamental to effective tuning. Daemons launch JVMs (Java Virtual Machines) that use memory that, in turn, launch more JVMs using memory. However, it's not always easy to profile the memory use across Hadoop with all the moving parts that Hadoop presents.

Read More

Topics: Hadoop, Big Data Ecosystem

Validating Data in the Data Lake: Best Practices

Posted by Tony Fisher, SVP of Strategy and Business Development on Dec 15, 2016 1:27:33 PM

Can you trust the data in your data lake? Many companies are guilty of dumping data into the data lake without a strategy for keeping track of what’s being ingested. This leads to a murky, swampy repository. If you don’t have transparency into your lake so that you can feel confident using the data, what’s the point of deploying a data lake in the first place?

Read More

Topics: Hadoop, Big Data Ecosystem, Bedrock, Data Lake, Data Governance

Zaloni Zip: Building a Modern Data Lake Architecture Pt. 2

Posted by Rajesh Nadipalli on Dec 13, 2016 11:27:48 AM

In the last video, we looked at the pain points of traditional data warehouse architecture and the high level architecture of the next generation Data Lake based on Hadoop.

In this video, I will discuss the key components you need to build a new architecture.

Read More

Topics: Hadoop, Big Data Ecosystem, Data Warehouse, Zaloni Zip, Data Lake, Data Management