Month: April 2018

New paper showing how a fundamental law of universal computation also applies to weaker forms of computation and how this can evaluate the effectivity of measures of complexity

Previously referred to as ‘miraculous’ in the scientific literature because of its powerful properties and its wide application as optimal solution to the problem of induction/inference, (approximations to) Algorithmic Probability (AP) and the associated Universal Distribution are (or should be) of the greatest importance in science. Here we investigate the emergence, the rates of emergence and convergence, and the Coding-theorem like behaviour of AP in Turing-subuniversal models of computation. We investigate empirical distributions of computing models in the Chomsky hierarchy. We introduce measures of algorithmic probability and algorithmic complexity based upon resource-bounded computation, in contrast to previously thoroughly investigated distributions produced from the output distribution of Turing machines. This approach allows for numerical approximations to algorithmic (Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a computational hierarchy. We demonstrate that all these estimations are correlated in rank and that they converge both in rank and values as a function of computational power, despite fundamental differences between computational models. In the context of natural processes that operate below the Turing universal level because of finite resources and physical degradation, the investigation of natural biases stemming from algorithmic rules may shed light on the distribution of outcomes. We show that up to 60% of the simplicity/complexity bias in distributions produced even by the weakest of the computational models can be accounted for by Algorithmic Probability in its approximation to the Universal Distribution. Coding theorem-like behaviour and emergence of the Universal Distribution. Correlation in rank (distributions were sorted in terms of each other) of empirical output distributions as compared to the output distribution of TM(5, 2). A progression towards greater correlation is noticed as a function of increasing computational power. Bold black labels are placed at their Chomsky level and gray labels are placed within the highest correlated level. Shannon entropy and lossless compression (Compress) distribute values below or at about the first 2 Chomsky types, as expected. It is not surprising to see the LBA with runtime 107 further deviate in ranking, because LBA after 27 steps produced the highest frequency strings, which are expected to converge faster. Eventually LBA 107 (which is none other than TM(4,2)) will converge to TM(5,2). An empirical bound of non-halting models seems to be low LBA even when increasing the number of states (or symbols for CA).

Source: www.tandfonline.com

Spatial diffusion and churn of social media

Innovative ideas, products or services spread on social networks that, in the digital age, are maintained to large extent via telecommunication tools such as emails or social media. One of the intriguing puzzles in social contagion under such conditions is the role of physical space. It is not understood either how geography influences the disappearance of products at the end of their life-cycle. In this paper, we utilize a unique dataset compiled from a Hungarian on-line social network (OSN) to uncover novel features in the spatial adoption and churn of digital technologies. The studied OSN was established in 2002 and failed in international competition about a decade later. We find that early adopter towns churn early; while individuals tend to follow the churn of nearby friends and are less influenced by the churn of distant contacts. An agent-based Bass Diffusion Model describes the process how the product gets adopted in the overall population. We show the limitations of the model regarding the spatial aspects of diffusion and identify the directions of model corrections. Assortativity of adoption time, urban scaling of adoption over the product life-cycle and a distance decay function of diffusion probability are the main factors that spatial diffusion models need to account for.

Spatial diffusion and churn of social media
Balázs Lengyel, Riccardo Di Clemente, János Kertész, Marta C. González

Source: arxiv.org

CITIZEN DATA SCIENCE FOR SOCIAL GOOD IN COMPLEX SYSTEMS

The confluence of massive amounts of openly available data, sophisticated machine learning algorithms and an enlightened citizenry willing to engage in data science presents novel opportunities for crowd sourced data science for social good. In this submission, I present vignettes of data science projects that I have been involved in and which have impact in various spheres of life and on social good. Complex systems are all around us: from social networks to transportation systems, cities, economies and financial markets. Understanding these complex systems may lead to solutions for problems ranging from famines, global crises, poverty, climate change and sustainable living despite over-population. Big data and citizen data science allows unprecedented computational power and collective intelligence to be brought to bear on fundamental challenges facing humanity like poverty, diseases, famines and developmental challenges.

CITIZEN DATA SCIENCE FOR
SOCIAL GOOD IN COMPLEX SYSTEMS

Soumya Banerjee
INDECS 16(1), 88-91, 2018
DOI 10.7906/indecs.16.1.6

Source: indecs.eu

Success in books: a big data approach to bestsellers

Reading remains the preferred leisure activity for most individuals, continuing to offer a unique path to knowledge and learning. As such, books remain an important cultural product, consumed widely. Yet, while over 3 million books are published each year, very few are read widely and less than 500 make it to the New York Times bestseller lists. And once there, only a handful of authors can command the lists for more than a few weeks. Here we bring a big data approach to book success by investigating the properties and sales trajectories of bestsellers. We find that there are seasonal patterns to book sales with more books being sold during holidays, and even among bestsellers, fiction books sell more copies than nonfiction books. General fiction and biographies make the list more often than any other genre books, and the higher a book’s initial place in the rankings, the longer the book stays on the list as well. Looking at patterns characterizing authors, we find that fiction writers are more productive than nonfiction writers, commonly achieving bestseller status with multiple books. Additionally, there is no gender disparity among bestselling fiction authors but nonfiction, most bestsellers are written by male authors. Finally we find that there is a universal pattern to book sales. Using this universality we introduce a statistical model to explain the time evolution of sales. This model not only reproduces the entire sales trajectory of a book but also predicts the total number of copies it will sell in its lifetime, based on its early sales numbers. The analysis of the bestseller characteristics and the discovery of the universal nature of sales patterns with its driving forces are crucial for our understanding of the book industry, and more generally, of how we as a society interact with cultural products.

Success in books: a big data approach to bestsellers
Burcu Yucesoy, Xindi Wang, Junming Huang and Albert-László BarabásiEmail authorView ORCID ID profile
EPJ Data Science20187:7
https://doi.org/10.1140/epjds/s13688-018-0135-y

Source: epjdatascience.springeropen.com

Optimal diversification strategies in the networks of related products and of related research areas

Countries and cities are likely to enter economic activities that are related to those that are already present in them. Yet, while these path dependencies are universally acknowledged, we lack an understanding of the diversification strategies that can optimally balance the development of related and unrelated activities. Here, we develop algorithms to identify the activities that are optimal to target at each time step. We find that the strategies that minimize the total time needed to diversify an economy target highly connected activities during a narrow and specific time window. We compare the strategies suggested by our model with the strategies followed by countries in the diversification of their exports and research activities, finding that countries follow strategies that are close to the ones suggested by the model. These findings add to our understanding of economic diversification and also to our general understanding of diffusion in networks.

Optimal diversification strategies in the networks of related products and of related research areas
Aamena Alshamsi, Flávio L. Pinheiro & Cesar A. Hidalgo

Nature Communications volume 9, Article number: 1328 (2018)
doi:10.1038/s41467-018-03740-9

Source: www.nature.com