Aerospace data scientists and the future of data-driven business

Paul Boughton

The decreasing costs of electronic sensors resulted in most aircraft manufacturers fitting them in almost every subsystem. Collection of the 'engineering' data alone is enough to require enormous computational power and storage.  Sergio Pepe reports.
Data. Very few words can be so abstract and yet so meaningful at the same time.
It has been defined as information, knowledge and wisdom, and the relationship these definitions remains a matter of intense debate in academia and industry alike. Philosophical discussions aside, in practical terms, a company’s focus should be on the path from data to understanding.
Recent announcements from companies like Google, Amazon and Apple show how much they are willing to do to get to know their customers better. In return for services users give away some personal information. Besides the standard questions like name, age and gender, geographical data is collected all the time. Online tools such as email and calendars disclose yet more information to service providers. It seems a fair exchange for the convenience of having one's information available everywhere.
The proliferation of 'cloud' offerings shows how many others want a piece of the action. Such companies are thriving by properly analysing mountains of data. Their business models depend on them understanding their users’ needs in order to make more profit from them.[Page Break]
Most will agree that data alone does not provide the answers. It needs processing to become useful. To be fully understood and to influence business decisions, the processed data, let's call it information, must be available as quickly as possible and presented in a clear way. Conveying a clear message in a visual form is normally the preferred means of catching the human brain's attention.
Anscombe's quartet comprises four datasets that have identical simple statistical properties, yet appear very different when graphed.
The decreasing costs of electronic sensors resulted in most aircraft manufacturers fitting them in almost every subsystem. Collection of the 'engineering' data alone is enough to require enormous computational power and storage. Bring in the 'business' side of things, with complex company structures and even more complex supply chains, and the IT requirements start to increase exponentially.[Page Break]
As the required sensor components and connections become cheaper, organisations collect and store more and more data. The amount collected soon outpaces the rate at which it may be analysed and understood. Throwing more computer power at the data may not be enough to solve it.. Inevitably, some organisations will find themselves collecting ever more data without the means to properly analyse and understand it.
The sheer scale and scope of the data being collected makes it very difficult to fully appreciate what information it may contain. This problem is further aggravated as new data streams are added and as their nature changes.[Page Break]
As Ben Fry[i], author of 'Visualizing Data', explained in his PhD dissertation: "The amount of data necessitates new software-based tools, and its complexity requires extra consideration be taken in its visual representation in order to highlight features in order of their importance, reveal patterns in the data, and simultaneously show features of the data that exist across multiple dimensions." In this context, any device capable of collecting data could be useful, in one way or another, to everyone in the business.
No matter what the business function, it is highly likely that performance may be significantly improved through the use of advanced analytical techniques.
Aerospace companies have long been collecting and, to some extent, processing, data such as Health & Usage Monitoring (HUM), operational information, stock levels, and vendor and supply chain data, although these are normally managed separately depending on the organisational structure in place.
While some effort has been invested in collection and processing, attempts at cross referencing are often done using less than optimal tools. Information visualisation, part of the later stages of the process, is normally only considered as an afterthought.  [Page Break]
The industry is necessarily conservative in many ways and this, combined with poor awareness, is a major factor. However, there are grounds for hope.  Davenport and Harris[ii] suggest that most large organisations have the desire to become more analytical but lack the will and know-how. The proliferation of infographics - quick and clear visual representations of complex information - can also trigger an interest in? from top managers.
As the energy sector becomes more sophisticated, it begins to experience the problems of data overload and information under-achievement seen in other sectors. However, because so many systems are essentially ‘green field’ in nature, there is less justification for failure to implement adequate data analysis capability [?] in this area.
Hal Varian, Google’s chief economist said recently: “find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis.”
The most difficult problem is finding how to extract meaningful information from the available data. When considered individually, areas such as statistics, data mining, graphic design and information visualisation are disjointed parts of the solution. If executed separately by different teams, the result will be even more fragmented as each one waits for another.
The speed with which businesses must react in a modern market means that delayed decisions almost certainly lead to lost opportunities and profits. In business areas that truly require informed decisions the usual tools (such as Excel) have limitations. Important (and interesting) analyses are missed because ideas are not communicated clearly.
By combining the full set of relevant skills in a new class of practitioners who can range from traditional computer science mathematics to art, it is possible to achieve vastly superior results. Such practitioners will be far more likely to have a clear view of the relevant methods and understand the steps necessary to reach a solution to the problem in question.
Describing the skills and disciplines involved, Ben Fry proposes the following:
          1.       Computer Science - acquire and analyse data
          2.       Mathematics, Statistics, & Data Mining - filter and mine
          3.       Graphic Design - represent and refine
          4.       Infovis and Human-Computer Interaction (HCI) - interaction
The creation of such capable specialists is no more than another step in technological evolution. Over time, they will continue to lessen the gaps and smooth the joins between the various aspects of the transformation of data into insight.
Here at Critical Software Technologies, we have been experiencing this evolution on one project since 2006. The team consists of people from backgrounds that include computer science, avionics and mechanical and telecommunications engineering.
Through training and management this disparate group has, over the years, built up a common set of skills that include mastering tools and techniques for Extracting, Transforming and Loading (ETL) data, web design and development, and the creation of artistic concepts that produce appealing and useful displays of information.[Page Break]
Initially the team was not expected to come up with an integrated product, but that is what they have developed. Starting small, and including many iterations using an agile process to address a manageable set of needs at a time.
The breakdown of tasks and the team's skill set made it possible to develop the project in a manner that gave full control of a given module to a single person. This resulted in a valuable data product that analyses a huge base of otherwise not very useful data.
Each member of the team is now responsible for understanding and building subsystems that not only produce in seconds reports that would normally have taken days, but also provide new (often interactive) visual displays of information that have already proven to save our customers’ money and time.
Having teams with complementary skill sets leads to each person dealing with an isolated part of the problem and something could be lost at each transition.
Starting with a partial set of skills, computer scientists can be taught the visual design principles necessary for data representation; avionic engineers can learn about the relevant computer science; and the non-mathematicians can become better informed about the statistics and mathematics needed to derive and convey information effectively.
The techniques themselves are not new, but their isolation to individual fields has prevented them from being used as a whole, since it is rare for someone to possess the required background in each.
It is also evident that these cannot be considered diminished roles that can be fulfilled by non-specialised IT people.
As Mike Loukides[iii] says in his O’Reilly Radar Report: What is Data Science?: "[data scientists] are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions."
Recently the Wall Street Journal reported that according to LinkedIn, the percentage of job starters with titles related to analytics and data science on its professional networking site jumped more than 40 per cent from 2009 to 2010. And while that number is expected to grow further, it is still likely that it will not be enough to meet demand.
Aerospace companies will therefore have to compete for the services of these key people, which should be good news for companies like Critical Software that are among those leading the way.[Page Break]
At Critical Software we have defined a role for Aerospace Data Scientists and a process highlighting how and when they should be employed. It transcends all areas of the traditional product development cycle.
Future data-driven business depends on data scientists' ability to apply creativity and aesthetic sensibilities to a challenge, along with the statistical understanding and models that enable enterprise software and decision-support systems to turn data into game-winning insight.

References: [i] Ben Fry, 2004. "Computational information design". Massachusetts Institute of Technology; [ii] Davenport and Harris, 2007. "Competing on Analytics". Harvard Business School Press; [iii] Toby Segaran and Jeff Hammerbacher (Eds), 2009. "Beautiful Data". O'Reilly.
Mike Loukides, 2010. "What is Data Science?". O’Reilly Radar Report.

Sergio Pepe is Consultant Engineer, Critical Software Technologies, Chilworth, Southampton, UK.

Sergio Pepe is a technology enthusiast passionate about design and the web. He specialises in data analysis and display of quantitative information, with great concern in the way graphs, tables and illustrations are displayed on web based environments. Most of his past projects involved HTML, jQuery (and other JavaScript libraries) and CSS.

Fig.1. Anscombe's quartet when plotted.

Recent Issues