06:24 AM

Guest Column: The Complexities Of Governance In The Age Of Data 2.0

Man lost in data lake  WEB

A data lake...

The following is a guest column from Sharmila Mulligan, CEO and founder of ClearStory Data.

By Sharmila Mulligan

In the late 1990s, organizations struggled to solve the problem of who gets access to more information, more applications and more data. As enterprises began amassing more data and making it available to employees, partners and customers, security considerations escalated.

We saw rapid innovation in security technologies. The first single sign-on applications emerged together with many other authentication technologies.  And we saw the first appointments of Chief Security Officers (CSOs), helping these essential applications to proliferate in the enterprise with beneficial results.

Identity management emerged as a new category, from startups such as Oblix, headed by Gordon Eubanks, the founder of Symantec (Oracle acquired Oblix in 2005). And so began the delivery of new suites of software designed to authenticate identity, especially in Web services deployments. 

These ID management startups managed to stay above the turmoil in tech markets. By 2000, industry watchers such as Elise Ackerman of the San Jose Mercury News described these early identity checkers that controlled "who gets access to what" as dot-com bust survivors.

We now face a new challenge when it comes to proliferation of data across the enterprise. Accessibility of data to those who need it -- is absolutely necessary to realize the promise of big data analytics. However, it comes with a challenge that's leading to the appointment of yet another C-level position, the Chief Data Officer (CDO). 

Based on our work with Fortune 1000 companies during the past year, and the buzz around "data lakes" at the recent Strata + Hadoop World in San Jose, there's a conundrum that lays ahead in this new Data 2.0 Age: how to widen data access in the enterprise while complying with stricter data governance rules.

The complexities can be seen in the reality of data lakes -- where all types of data are collected in a massive data hub from a variety of sources. This mix of data creates many data governance issues. These include governing what's relevant for analysis; what's not relevant but should be stored for compliance reasons; how fresh is the data; where it came from; when was it updated; who has access to it; who gets to "see" the real information in it; who is excluded; what data should be masked but accessible in an aggregate form; and how will the governance process be managed and audited to ensure all the data is used appropriately.

The number of governance considerations is immense and we are only at the start of a growing list.

What we know for sure is three things: more data is good, speed of data accessibility is important, and the democratization of data held in data lakes is essential to making better business decisions. To realize this, however, will require elevating the importance of having a strict data governance model. 

This task is a large undertaking and is the responsibility of Chief Data Officers. It's still about access and security again, but with a more complex technical challenge because data is bigger, more fluid, and ever changing.

The overall Data 2.0 mission has these considerations:

- The selection of the data lake platform itself -- which could be Apache Hadoop, Hortonworks or Cloudera -- this becomes the central repository for data streamed in from various sources in different formats that can be combined for deeper data intelligence. While business users can examine, dive in, or preview information in data lakes, the new CDO challenge will be securing and governing data usage so only the right people get approved access to particular sets of data.

- The desire for business users from different departments to be more self-reliant with an ability to explore data themselves because they're the domain experts. That requires a fast and intuitive way to see insights with new tools that eliminate the need to be IT-savvy so business leaders can get better answers and explore data analysis and collaborate with peers in real-time.

- The blending and harmonization of diverse data sets from diverse sources and formats -- be it structured, unstructured or semi-structured data -- to reach a holistic insight that answers bigger, deeper business questions. This process includes otherwise hard-to-capture and hard-to-wrangle data coming from an exploding variety of devices and smart sensors connected to an emerging Internet of Everyday Things.

The companies and industries moving the fastest to realize the Data 2.0 advantages are those that aggressively compete against their peers day-to-day. They're accessing more data, faster, as a means to staying ahead of competitors. These companies include retailers, consumer-packaged goods, companies, pharmaceuticals, media and entertainment companies, insurance, automotive, and every industry where the consumer now has all the power, and every day is about attracting and keeping them.


Sharmila Mulligan is the CEO and founder at ClearStory Data and has spent more than 18 years building software companies in a variety of markets. She has been EVP at Netscape, Kiva Software, AOL, Opsware, and Aster Data, helping to build multi-billion dollar IT markets.  She is on the board of several start-ups, and is an advisor to numerous companies, and an active investor in early-stage companies.