Exploiting the power of data

Paul Boughton

Energy businesses are reliant on very large datasets and powerful database applications to perform many of their core business operations. Tim Butchart reports.

Oil and gas companies are among the world's largest producers of data, derived from diverse activities in exploration, seismic analyses, refining and other highly-specialised activities, as well as distribution, marketing - plus the operational activities of any major enterprise. So it's hardly surprising that energy businesses have been among the first to shoulder the enormous burden of maintaining pace with the data explosion.

With data volumes growing exponentially, and backup windows narrowing, data protection and backup become more challenging. Data experts at analyst firm IDC say that between 2005 and 2020, we will have witnessed a growth in worldwide digital data from 130 exabytes to 40,000 exabytes, with the total volume of global data doubling every year from now until 2020. As a result, these analysts believe that large industrial enterprises will have to invest 40% more in IT equipment each year during the same period to cope with the growth. With these levels of investment to consider, it's vital to get purchasing decisions right.

Energy businesses are reliant on very large datasets and powerful database applications to perform many of their core business operations - everything from integrated Enterprise Resource Planning (ERP) suites to email systems. To keep mission-critical and production data protected, it's important to identify the most critical qualities essential in a future-proofed data protection system. It should certainly be capable of delivering a robust backup and recovery strategy, a solid plan for dealing with persistent data growth; greater controls over ever richer datasets; interoperability with existing investments; and the ability to meet shrinking backup windows while making life less stressful for managers and administrators.

With ever richer datasets expanding exponentially, IT and data managers need to find innovative ways to add capacity and performance in order to reliably backup and protect critical information. But they must achieve all this while avoiding shackling datacentres further by adding successive devices into already-crowded racks and simply exacerbating the problem with data sprawl.

Data sprawl

Datacentre sprawl is a phenomenon characterised by a poorly planned infrastructure of physical storage equipment that lacks long-term efficiency or data protection strategy. Generally a sprawling storage environment builds up around server racks with low usage levels that are wasting space, time and energy allocations.

Sprawl is crippling for any businesses, but has its greatest impact on enterprises with the largest data demands - such as oil and gas companies. It prevents the effective protection, management and storage of data, while inhibiting business growth and in some cases leaving critical IT systems vulnerable to total failure or outage.

Without intelligent planning, an energy business can find its IT infrastructures reduced to intimidating, wire-crossed jungles of physical storage equipment. Often these have simply been allowed to expand organically as additional repositories are added in panic responses to rising data demands.

Under the pressures of unprecedented data growth, it is somewhat understandable that large volume servers have been added to some business datacentres in the past, as quick fixes without any thought to a long-term solution. Unfortunately in many cases, these initial fire-fighting methods are now preventing businesses from pausing and restructuring.

Data explosion debris

At the height of the data explosion, many businesses accepted sprawl as the only viable option in combatting runaway growth. However for many enterprises the problem reached critical mass when energy bills skyrocketed, physical spaces became filled to capacity and green mandates changed the ways in which businesses had to address their energy spending. IT over-complexity eliminates data visibility, while redundant, end-of-life technology soon becomes difficult to identify in mazes of heterogeneous hardware.

In the highly competitive energy sector, the issues associated with datacentre sprawl are numerous and critical. Lost or irretrievable data contributes to slashed productivity and reliability, resulting in financial losses, incomplete projects, duplicated effort and the inability to consistently meet stringent data protection service level agreements (SLAs). Worse still are the damaging effects of a loss of business reputation, as valued utility customers, prospects and auditors sense mounting turmoil.

Symptoms of data sprawl first become visible internally, as IT managers discern that reports have grown erratic or inaccurate. Soon, business planning becomes difficult and budgets come under strain as funds are funnelled into bolt-on storage technologies. Eventually, data crashes can no longer be addressed through system restoration, while business continuity and data disaster recovery plans become ineffectual and impractical. What is needed is a data protection approach designed specifically to re-energise and transform the backup environments of our largest businesses. In that way energy companies can quickly transform their backup and restore performance without disrupting their existing backup infrastructure, and achieve enhanced flexibility in their data protection environment.

Planning for the future

So many methods of data protection promoted in the market today merely serve to amplify these issues of data bloat and sprawl. What enterprise data managers are crying out for are storage regimes that are agile, fit for purpose and allow them to do more with tight budgets, fewer personnel and reduced space, power and cooling requirements.

The first line of attack is for comprehensive audits to be conducted on all information resources across an enterprise's entire data management network. Such snapshots enable IT managers to get a clear picture of what data is stored where, and to reassess the under use or redundancy of rarely-accessed devices. A complete system analysis allows for the construction of data maps, which establish basic transparency and enable IT managers to audit their existing storage devices for efficiency and purpose. In addition, accurate mapping of data storage provides a foundation for updated contingency plans in the event of a data crash.

Then data needs to be migrated from end-of-life devices to more efficient systems, opening up additional space by decommissioning outdated architecture and allowing for data consolidation on reliable and stable purpose-built technology.

The goal is to achieve rapid backups and instant restores that put managers back in control of massive and growing volumes of data. That means effortlessly scalable single-system architectures developed specifically for data environments, combined with smarter and faster data deduplication methods. Modular data storage architecture must be coupled with an innovative 'content aware' approach to deduplication so that enterprise data managers can add capacity and performance as their needs grow (rather than just throwing more devices into datacentres).

Data duplication

The answer is to deduplicate data in multiple parallel streams and across multiplexed data volumes. Business databases typically store data in small segments of just a few kilobytes that 'inline or hash-based' deduplication technologies cannot hope to process without putting the brakes on backup performance (or by simply leaving large volumes un-deduplicated). 'Byte differential' deduplication is different. It finds every byte of duplicate data without slowing backup or recovery performance. It then writes a complete backup to disk, while at the same time applying innovative forward-referencing.

This fresh approach enables data managers to recapture capacity, while controlling exponential data growth efficiently. With an enterprise-class data protection system in place that is built for the job, they are quickly able to demonstrate the performance necessary to meet aggressive backup and restore targets, while having the detailed reporting and management statistics at their fingertips to justify the transformation to directors, customers and other stakeholders.

Keeping data flowing

IT teams must implement sound growth plans for a single system of data management and storage protection that entirely eliminates organic growth of limited capacity physical storage structures.

Purpose-built modular hardware will deliver high processing power coupled with great storage capacity, and be well-suited for incremental, ordered infrastructure growth. Continued monitoring and maintenance of an organised, grid-based layout of physical storage devices will then allow IT heads to implement high-ratio deduplication technology for the management of content and data storage.

And once datacentre sprawl has finally been harnessed, IT managers should mandate clear processes of auditable data erasure to put the brakes on their own personal data explosion.

Any IT professional at a large energy company who works for a board that backs their vision of data protection renewal is well on track to regain control of their datacentres. The final jigsaw piece is a broader cultural shift that accepts that new roadmap for coping with data growth, and works with technology teams and business owners to ensure future scalability and success.

Tim Butchart is senior vice president, Sepaton, Heathrow, UK.

Recent Issues