In digital cameras, we find a major limitation: the image and video form inherited from a film camera obstructs it from capturing the rapidly changing photonic world. Here, we present vidar, a bit sequence array where each bit represents whether the accumulation of photons has reached a threshold, to record and reconstruct the scene radiance at any moment. By employing only consumer-level CMOS sensors and integrated circuits, we have developed a vidar camera that is 1,000× faster than conventional cameras. By treating vidar as spike trains in biological vision, we have further developed a spiking neural network-based machine vision system that combines the speed of the machine and the mechanism of biological vision, achieving high-speed object detection and tracking 1,000× faster than human vision. We demonstrate the utility of the vidar camera and the super vision system in an assistant referee and target pointing system. Our study is expected to fundamentally revolutionize the image and video concepts and related industries, including photography, movies, and visual media, and to unseal a new spiking neural network-enabled speed-free machine vision era.
Event cameras as bioinspired vision sensors have shown great advantages in high dynamic range and high temporal resolution in vision tasks. Asynchronous spikes from event cameras can be depicted using the marked spatiotemporal point processes (MSTPPs). However, how to measure the distance between asynchronous spikes in the MSTPPs still remains an open issue. To address this problem, we propose a general asynchronous spatiotemporal spike metric considering both spatiotemporal structural properties and polarity attributes for event cameras. Technically, the conditional probability density function is first introduced to describe the spatiotemporal distribution and polarity prior in the MSTPPs. Besides, a spatiotemporal Gaussian kernel is defined to capture the spatiotemporal structure, which transforms discrete spikes into the continuous function in a reproducing kernel Hilbert space (RKHS). Finally, the distance between asynchronous spikes can be quantified by the inner product in the RKHS. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves significant improvement in computational efficiency. Especially, it is able to better depict the changes involving spatiotemporal structural properties and polarity attributes.
Images of visual scenes comprise essential features important for visual cognition of the brain. The complexity of visual features lies at different levels, from simple artificial patterns to natural images with different scenes. It has been a focus of using stimulus images to predict neural responses. However, it remains unclear how to extract features from neuronal responses. Here we addressed this question by leveraging two-photon calcium neural data recorded from the visual cortex of awake macaque monkeys. With stimuli including various categories of artificial patterns and diverse scenes of natural images, we employed a deep neural network decoder inspired by image segmentation technique. Consistent with the notation of sparse coding for natural images, a few neurons with stronger responses dominated the decoding performance, whereas decoding of artificial patterns needs a large number of neurons. When decoding natural images using the model pre-trained on artificial patterns, salient features of natural scenes can be extracted, as well as the conventional category information. Altogether, our results give a new perspective on studying neural encoding principles using reverse-engineering decoding strategies.
Fog computing has been an effective paradigm of real-time applications in the IoT area, which enables task offloading at network edge devices. Particularly, many emerging vehicular applications require real-time interaction between the terminal users and computation servers, which can be implemented in fog-based architecture. However, it is still challenging to apply fog computing in vehicular networks due to high mobility of vehicles and uneven distribution of vehicle density, which may result in performance degradation, such as unbalanced workload and unexpected task failure. In this article, we investigate a new service scenario of task offloading under a three-layer service architecture, where the resources of vehicular fog (VF), fog server (FS), and central cloud (CC) are utilized in a cooperative way. On this basis, we formulate the probabilistic task offloading (PTO) problem by synthesizing task transmission, computation, and result retrieval, as well as characterizing the heterogeneity of computation servers. The objective of the PTO is to minimize the weighted sum of execution delay, energy consumption, and payment cost. To resolve the PTO problem, we propose a comprehensive task offloading algorithm by combining the alternating direction method of multipliers (ADMMs) and particle swarm optimization (PSO), called ADMM-PSO. The basic idea of the ADMM-PSO is to divide the PTO problem into multiple unconstrained subproblems and achieve the optimal solution in the form of an iterative coordination process. For each iteration, the solution is achieved by solving each subproblem with the PSO and updated based on a designed rule, which is able to converge to the optimal solution when the stop criterion is satisfied. Finally, we build the simulation model and implement the proposed algorithm for performance evaluation. The simulation results demonstrate the superiority of the proposed algorithm under a wide range of service scenarios.
Neuronal circuits formed in the brain are complex with intricate connection patterns. Such complexity is also observed in the retina with a relatively simple neuronal circuit. A retinal ganglion cell (GC) receives excitatory inputs from neurons in previous layers as driving forces to fire spikes. Analytical methods are required to decipher these components in a systematic manner. Recently a method called spike-triggered non-negative matrix factorization (STNMF) has been proposed for this purpose. In this study, we extend the scope of the STNMF method. By using retinal GCs as a model system, we show that STNMF can detect various computational properties of upstream bipolar cells (BCs), including spatial receptive field, temporal filter, and transfer nonlinearity. In addition, we recover synaptic connection strengths from the weight matrix of STNMF. Furthermore, we show that STNMF can separate spikes of a GC into a few subsets of spikes, where each subset is contributed by one presynaptic BC. Taken together, these results corroborate that STNMF is a useful method for deciphering the structure of neuronal circuits.
A crucial question in data science is to extract meaningful information embedded in high-dimensional data. Such information is often formed into a low-dimensional space with a set of features that can represent the original data at different levels. Wavelet analysis is a pervasive method for decomposing time-series signals into a few levels with detailed temporal resolution. However, the wavelets after decomposition are intertwined and could be over-represented across levels for each sample and across different samples within one population. In this work, using simulated spikes, experimental neural spikes and calcium imaging signals, and human electrocortigraphic signals, we leveraged conditional mutual information between wavelets for feature selection. The meaningfulness of selected features was verified to decode stimulus or condition from dynamic neural responses. We demonstrated that decoding with only a small set of these features can achieve high decoding. These results provide a new way of wavelet analysis for extracting essential features of the dynamics of spatiotemporal neural data, which then enables to support novel model design of machine learning with representative features.
Clinical studies sometimes encounter truncation by death, rendering some outcomes undefined. Statistical analysis based solely on observed survivors may give biased results because the characteristics of survivors differ between treatment groups. By principal stratification, the survivor average causal effect was proposed as a causal estimand defined in always-survivors. However, this estimand is not identifiable when there is unmeasured confounding between the treatment assignment and survival or outcome process. In this paper, we consider the comparison between an aggressive treatment and a conservative treatment with monotonicity on survival. First, we show that the survivor average causal effect on the conservative treatment is identifiable based on a substitutional variable under appropriate assumptions, even when the treatment assignment is not ignorable. Next, we propose an augmented inverse probability weighting (AIPW) type estimator for this estimand with double robustness. Finally, large sample properties of this estimator are established. The proposed method is applied to investigate the effect of allogeneic stem cell transplantation types on leukemia relapse.
Deep learning has been successfully applied for predicting asset prices using financial time series data. However, image-based deep learning models excel at extracting spatial information from images and their potential in financial applications has not been fully explored. Here we propose a new model---channel and spatial attention convolutional neural network (CS-ACNN)---for price trend prediction that takes arbitrary images constructed from financial time series data as input. The model incorporates attention mechanisms between convolutional layers to focus on specific areas of each image that are the most relevant for price trends. CS-ACNN outperforms benchmarks on exchange-traded funds (ETF) data in terms of both model classification metrics and investment profitability, achieving out-of-sample Sharpe ratios ranging from 1.57 to 3.03 after accounting for transaction costs. In addition, we confirm that the images constructed based on our methodology lead to better performance when compared to models based on traditional time series data. Finally, the model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.