With the onset of massive cosmological data collection through media such as the Sloan Digital Sky Survey (SDSS), galaxy classification has been accomplished for the most part with the help of citizen science communities like Galaxy Zoo. Seeking the wisdom of the crowd for such Big Data processing has proved extremely beneficial. However, an analysis of one of the Galaxy Zoo morphological classification data sets has shown that a significant majority of all classified galaxies are labelled as “Uncertain”. This book reports on how to use data mining, more specifically clustering, to identify galaxies that the public has shown some degree of uncertainty for as to whether they belong to one morphology type or another. The book shows the importance of transitions between different data mining techniques in an insightful workflow. It demonstrates that Clustering enables to identify discriminating features in the analysed data sets, adopting a novel feature selection algorithms called Incremental Feature Selection (IFS). The book shows the use of state-of-the-art classification techniques, Random Forests and Support Vector Machines to validate the acquired results. It is concluded that a vast majority of these galaxies are, in fact, of spiral morphology with a small subset potentially consisting of stars, elliptical galaxies or galaxies of other morphological variants.
Big Data in Radio Astronomy: Scientific Data Processing for Advanced Radio Telescopes provides the latest research developments in big data methods and techniques for radio astronomy. Providing examples from such projects as the Square Kilometer Array (SKA), the world’s largest radio telescope that generates over an Exabyte of data every day, the book offers solutions for coping with the challenges and opportunities presented by the exponential growth of astronomical data. Presenting state-of-the-art results and research, this book is a timely reference for both practitioners and researchers working in radio astronomy, as well as students looking for a basic understanding of big data in astronomy. Bridges the gap between radio astronomy and computer science Includes coverage of the observation lifecycle as well as data collection, processing and analysis Presents state-of-the-art research and techniques in big data related to radio astronomy Utilizes real-world examples, such as Square Kilometer Array (SKA) and Five-hundred-meter Aperture Spherical radio Telescope (FAST)
Knowledge Discovery in Big Data from Astronomy and Earth Observation: Astrogeoinformatics bridges the gap between astronomy and geoscience in the context of applications, techniques and key principles of big data. Machine learning and parallel computing are increasingly becoming cross-disciplinary as the phenomena of Big Data is becoming common place. This book provides insight into the common workflows and data science tools used for big data in astronomy and geoscience. After establishing similarity in data gathering, pre-processing and handling, the data science aspects are illustrated in the context of both fields. Software, hardware and algorithms of big data are addressed. Finally, the book offers insight into the emerging science which combines data and expertise from both fields in studying the effect of cosmos on the earth and its inhabitants. Addresses both astronomy and geosciences in parallel, from a big data perspective Includes introductory information, key principles, applications and the latest techniques Well-supported by computing and information science-oriented chapters to introduce the necessary knowledge in these fields
As the availability of high-throughput data-collection technologies, such as information-sensing mobile devices, remote sensing, internet log records, and wireless sensor networks has grown, science, engineering, and business have rapidly transitioned from striving to develop information from scant data to a situation in which the challenge is now that the amount of information exceeds a human's ability to examine, let alone absorb, it. Data sets are increasingly complex, and this potentially increases the problems associated with such concerns as missing information and other quality concerns, data heterogeneity, and differing data formats. The nation's ability to make use of data depends heavily on the availability of a workforce that is properly trained and ready to tackle high-need areas. Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data-science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data-science program. Training Students to Extract Value from Big Data summarizes a workshop convened in April 2014 by the National Research Council's Committee on Applied and Theoretical Statistics to explore how best to train students to use big data. The workshop explored the need for training and curricula and coursework that should be included. One impetus for the workshop was the current fragmented view of what is meant by analysis of big data, data analytics, or data science. New graduate programs are introduced regularly, and they have their own notions of what is meant by those terms and, most important, of what students need to know to be proficient in data-intensive work. This report provides a variety of perspectives about those elements and about their integration into courses and curricula.
"This book discusses the exponential growth of information size and the innovative methods for data capture, storage, sharing, and analysis for big data"--Provided by publisher.
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.
This book constitutes the refereed proceedings of the First International Conference on Big Scientific Data Management, BigSDM 2018, held in Beijing, Greece, in November/December 2018. The 24 full papers presented together with 7 short papers were carefully reviewed and selected from 86 submissions. The topics involved application cases in the big scientific data management, paradigms for enhancing scientific discovery through big data, data management challenges posed by big scientific data, machine learning methods to facilitate scientific discovery, science platforms and storage systems for large scale scientific applications, data cleansing and quality assurance of science data, and data policies.
The volume of data being collected in solar astronomy has exponentially increased over the past decade and we will be entering the age of petabyte solar data. Deep learning has been an invaluable tool exploited to efficiently extract key information from the massive solar observation data, to solve the tasks of data archiving/classification, object detection and recognition. Astronomical study starts with imaging from recorded raw data, followed by image processing, such as image reconstruction, inpainting and generation, to enhance imaging quality. We study deep learning for solar image processing. First, image deconvolution is investigated for synthesis aperture imaging. Second, image inpainting is explored to repair over-saturated solar image due to light intensity beyond threshold of optical lens. Third, image translation among UV/EUV observation of the chromosphere/corona, Ha observation of the chromosphere and magnetogram of the photosphere is realized by using GAN, exhibiting powerful image domain transfer ability among multiple wavebands and different observation devices. It can compensate the lack of observation time or waveband. In addition, time series model, e.g., LSTM, is exploited to forecast solar burst and solar activity indices. This book presents a comprehensive overview of the deep learning applications in solar astronomy. It is suitable for the students and young researchers who are major in astronomy and computer science, especially interdisciplinary research of them.
Big data and machine learning are driving the Fourth Industrial Revolution. With the age of big data upon us, we risk drowning in a flood of digital data. Big data has now become a critical part of both the business world and daily life, as the synthesis and synergy of machine learning and big data has enormous potential. Big data and machine learning are projected to not only maximize citizen wealth, but also promote societal health. As big data continues to evolve and the demand for professionals in the field increases, access to the most current information about the concepts, issues, trends, and technologies in this interdisciplinary area is needed. The Encyclopedia of Data Science and Machine Learning examines current, state-of-the-art research in the areas of data science, machine learning, data mining, and more. It provides an international forum for experts within these fields to advance the knowledge and practice in all facets of big data and machine learning, emphasizing emerging theories, principals, models, processes, and applications to inspire and circulate innovative findings into research, business, and communities. Covering topics such as benefit management, recommendation system analysis, and global software development, this expansive reference provides a dynamic resource for data scientists, data analysts, computer scientists, technical managers, corporate executives, students and educators of higher education, government officials, researchers, and academicians.
This two-volume set of LNAI 12798 and 12799 constitutes the thoroughly refereed proceedings of the 34th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, held virtually and in Kuala Lumpur, Malaysia, in July 2021. The 87 full papers and 19 short papers presented were carefully reviewed and selected from 145 submissions. The IEA/AIE 2021 conference will continue the tradition of emphasizing on applications of applied intelligent systems to solve real-life problems in all areas. These areas include the following: Part I, Artificial Intelligence Practices: Knowledge discovery and pattern mining; artificial intelligence and machine learning; sematic, topology, and ontology models; medical and health-related applications; graphic and social network analysis; signal and bioinformatics processing; evolutionary computation; attack security; natural language and text processing; fuzzy inference and theory; and sensor and communication networks Part II, From Theory to Practice: Prediction and recommendation; data management, clustering and classification; robotics; knowledge based and decision support systems; multimedia applications; innovative applications of intelligent systems; CPS and industrial applications; defect, anomaly and intrusion detection; financial and supply chain applications; Bayesian networks; BigData and time series processing; and information retrieval and relation extraction
This revelatory exploration of big data, which refers to our newfound ability to crunch vast amounts of information, analyze it instantly and draw profound and surprising conclusions from it, discusses how it will change our lives and what we can do to protect ourselves from its hazards. 75,000 first printing.