Reviews and Excerpts

A Global Approach to Data Value Maximization. Integration, Machine Learning, Multimodal Analysis

Author: Paolo Dell’Aversana
Publisher: Cambridge Scholars Publishing (June, 2019)

Excerpt: Preface

Over the past thirty years, I have worked in many projects involving the acquisition, processing and interpretation of geophysical data. Using seismic, electromagnetic and gravity data, I have developed and applied approaches and algorithms for the modeling and inversion of multidisciplinary geophysical measurements. The output of these methods commonly consists of “Earth models” of the spatial distribution of various physical parameters, such as seismic velocity, electrical resistivity, density, fluid saturation, and porosity.

Using large datasets, I have frequently applied Data Science and Machine Learning approaches for supporting and improving my integrated workflow. Sometimes, colleagues, researchers and managers have applied the results of my work to improve their geological models and/or their decisional process. Indeed, a robust model helps in making key decisions, like where to drill a new exploration well.

Unfortunately, the geophysical and geological models are often affected by uncertainties and ambiguities, including the models produced by me, of course. Among the main reasons for such intrinsic indetermination, there is the fact that the exploration target is frequently located at a depth of several kilometers in the terrestrial crust, below complex geological sequences. This often happens, for instance, in hydrocarbon exploration. Consequently, the geophysical response measured at the surface can be characterized by a low signal-to-noise ratio. When geoscientists try to retrieve Earth models from that response, the measurement uncertainties propagate from data to model space, affecting negatively the reliability of the Earth models. These models represent “interpretations” rather than objective information. For that reason, Earth disciplines are a typical example of interpretative science. In other words, geoscientists can produce (very) different Earth models and different interpretations starting from the same experimental observations. The differences depend on many factors, not confined to data quality. These include the personal technical background, the individual experience and sensitivity, the specific ability in using technology for enhancing the signal and for reducing the noise, and so forth.

Under these aspects, geosciences are not very different from many medical disciplines where, for instance, physicians must define a diagnosis based on multidisciplinary observations affected by large uncertainties. Finally, both geoscientists and physicians must make crucial decisions in uncertain domains.

Over the years, geoscientists, as well as physicians, have learned how to manage experimental errors and model uncertainties. However, there are still many methodological open questions behind their interpretative work.

First, how much do they really understand about the data that they use?

Second, do they properly understand the meaning of the models that they retrieve from their data?

Third, how can they extract the maximum informative value from both data and models?

Fourth, how can they optimize the decisional process in uncertain domains using the entire value of information in both data and model space?

Of course, the above questions are not restricted to the domain of the Earth or medical disciplines. The problem of understanding properly the data and the models, exploiting their entire informative value, is generalized to all the main scientific fields. Unfortunately, that problem often remains unsolved: we do not use information in the correct way because we understand it just partially. We waste a great part of its potential value.

The “Gap of Understanding” (“GOU” could be a nice acronym) is related in some way with the number (and with the relevance) of obscure steps of the workflow through which we move from data to models, and, finally, from models to decisions.

Assuming that we do our work of data analysis and interpretation in the most honest and scrupulous way, a problem remains in the background. It is data complexity.

The question is that the rapid growth of information and the intrinsic complexity of many modern databases often require extraordinary efforts for exploring the entire volume of information and for maximizing its true value. Besides the volume of Big Data, there are additional important aspects to take into account. These are variety, veracity, velocity, validity and volatility. In fact, data complexity increases not only with data volume, but also with the heterogeneity of the information, the non-linearity of the relationships, and the rate by which the data flow changes.

As I said, all that complexity can be a problem. Sometimes we think to solve this problem by just ignoring it. We tend to simplify. Unfortunately, excessive simplification can lead us towards wrong Earth models, wrong medical diagnosis, and wrong financial predictions. Finally, that simplistic approach drives us towards wrong decisions. On the other side, complexity often represents an opportunity rather than a problem. Complexity, if properly managed and correctly understood, can trigger positive changes and innovative ideas. It is an intellectual, scientific, technical challenge.

This book is a systematic discussion about methods and techniques for winning that challenge. The final objective of the following chapters is to introduce algorithms, methods and approaches to extract the maximum informative value from complex information.

As I said, dealing with “Big Data” and with complex integrated workflows is the normal scenario in many Earth disciplines, especially in the case of extensive industrial applications. For this reason, the book starts from the domain of geosciences, where I have developed my professional experience. However, the discussion is not confined to applications in geology and geophysics. It is expanded into other scientific areas, like medical disciplines and various engineering sectors. Similar to geosciences, also in these fields, scientists and professionals are continuously faced with the problem of how to get the maximum value from their datasets. That objective can be obtained using a multitude of approaches.

In the book, algorithms, techniques and methods are discussed in separate chapters, but in the frame of the same unitary view. These methods include data fusion and quantitative approaches of model integration, multimodal data analysis in different physical domains, audio-video displays of data through advanced techniques of “sonification”, multimedia machine learning and hybrid methods of data analysis.

Finally, human cognition is also taken into account as a key factor for enhancing the informative value of data and models. Indeed, the basic intuition inspiring me in writing this book is that information is like an empty box if we do not extract any coherent significance from it. This can be a geological, medical, or financial type of significance, depending on the field of application. In other words, the value of information increases if we understand its deep meaning. That intuitive principle is true in science as well as in ordinary life. We can effectively estimate the value of information only after understanding its significance. Consequently, the problem of maximizing the information value is translated into a more general problem: maximizing our capability to extract significance from the information itself.

This methodological approach requires that human sciences are involved in the workflow. In particular, it is important to clarify the concept of “significance of information”. This concept is extremely complex and has involved philosophers and scientists for many centuries. For that reason, I have tried to summarize the “question of significance” in different parts of the book, explaining my point of view about it. Especially in the final part, I discuss how modern neurosciences, cognitive disciplines and epistemology can contribute to the process of maximization of the information value through the analysis of its semantic aspects.[1]

A multitude of examples, tutorials and real case histories are included in each chapter, for supporting the theoretical discussion with experimental evidences. Finally, I have included a set of appendices at the end of the book, in order to provide some insight about the mathematical aspects not explicitly discussed in the chapters.

Due to the multidisciplinary approach that I use in this book, I hope that it can engage the interest of a large audience. This should include geophysicists, geologists, seismologists, volcanologists and data scientists. Moreover, researchers in other areas, such as medical diagnostic disciplines, cognitive sciences and the health industry, can find interesting ideas in the following chapters. No specific background is required for catching my key messages. In fact, this book is aimed mainly at introducing novel ideas and new research directions rather than exhaustively covering specialist topics. Consequently, I have often preferred to discuss the technical details in the appendices, in order to make the discussion more fluid and readable. Furthermore, I have provided the main references and many suggested readings at the end of each chapter for those who are interested in expanding a specific subject.

In summary, the only fundamental requirement for deriving benefit from this book is to read it with an open mind, with the curiosity to investigate the fascinating links between disciplines commonly considered independent.

Download a 30 pages extract from Cambridge Scholars

—————————————————-

[1] In the linguistic field, Semantics is the study of meaning. In the semiotic field, it deals with the relations between signs and what they denote. In this book, I use the term “semantic” in a very general sense, for denoting the meaning of words, of sentences, of concepts, and of information in general.