Iliski, a software for robust calculation of transfer functions

Understanding the relationships between biological processes is paramount to unravel pathophysiological mechanisms. These relationships can be modeled with Transfer Functions (TFs), with no need of a priori hypotheses as to the shape of the transfer function. Here we present Iliski, a software dedicated to TFs computation between two signals. It includes different pre-treatment routines and TF computation processes: deconvolution, deterministic and non-deterministic optimization algorithms that are adapted to disparate datasets. We apply Iliski to data on neurovascular coupling, an ensemble of cellular mechanisms that link neuronal activity to local changes of blood flow, highlighting the software benefits and caveats in the computation and evaluation of TFs. We also propose a workflow that will help users to choose the best computation according to the dataset. Iliski is available under the open-source license CC BY 4.0 on GitHub (https://github.com/alike-aydin/Iliski) and can be used on the most common operating systems, either within the MATLAB environment, or as a standalone application.


26
Modelling and understanding of the relationship between complex and intermingled biological 27 signals is often difficult, particularly when the drivers of the signals are unknown. The problem 28 of the relationship between two time series can be address using deconvolution, which 29 provides Transfer Functions (TFs) representative of the system processing on the input signal 30 to generate the output signal 1 . Extracting the transfer function of a neuron or neurons is widely 31 used in neurobiological studies 2-9 , and is widely used to solve general problems in signal 32 analysis, such as predicting the output of complex electrical circuits 10 or other industrial 33 systems for which a proper model is very complex due to multiple processes working in 34 parallel 11 . In brain imaging based on blood flow dynamics, transfer functions are classically 35 used to lump the multitude of cellular and molecular processes linking neural activation to 36 changes in blood flow. This coupling between neural activity and hemodynamics is known as 37 neurovascular coupling (NVC) 12 . While there are many successful phenomenological models of 38 NVC 3,13-17 , most physiology-based models of neurovascular coupling 18-21 focus on a single 39 mechanism. As NVC is mediated through multiple processes, a more integrated approach is 40 necessary. NVC has often been assessed with deconvolution 8,22 , either in the frequency domain 41 or with matrix-based approaches, like Toeplitz matrices 23 . While these approaches allow the 42 unbiased extraction of the TF, these deconvolution methods suffer from sensitivity to noise, 43 affecting the quality of the computed TFs. Reducing the noise (or bandwidth) of the signals 44 improves the estimate of the TF. Alternatively, one can opt for optimization of known functions 45 or a kernel of functions 7,9 . The first option may lead to information loss, e.g. in cases where the 46 noise is not well characterized. Sophisticated smoothing methods partially prevent this loss, 47 like Savitzky-Golay filter, or noise modelling as proposed by Seghouane and colleagues 24 . The 48 second option relies on parametric functions to find the TF best linking the input to the output 49 signals. The transfer function for neural activity to hemodynamic signals has been canonically 50 Iliski can be used either as a suite of functions or through a Graphical User Interface (Fig 1A). 74 Functions are grouped according to the analysis workflow to keep the interface simple. Fig 1B  75 shows the general purpose of Iliski. 76 77 3.1. Data loading and pre-treatment 78 We propose two input files format: either plain text files or HDF5 data, the latter being an open-79 source file format with advanced database features. As experimental acquisitions are prone to 80 multiple component noise, we provided, as an option to the analysis workflow, smoothing 81 (Savitzky-Golay method) and median filter functions, to exclude outliers. The input and output 82 signals are interpolated to a chosen time interval ( ). Both signals can be cut between two 83 given time points to study continuous recordings while computing TFs on chunks of signal (Fig  84   1C).

Post-computation 106
The results structure is arranged to be as informative as possible while avoiding useless 107 repetition of data. Iliski allows for loading previously computed results structures to check 108 them again. After TF computation, results structure can be saved either as XLS file, readable by 109 any Excel-like software, or as a MAT-file (MATLAB formatted binary file format), but it is also 110 available in Matlab workspace to be exported in various data formats by the user. 111

112
Iliski is accessible both as a GUI and as a set of functions to be used in scripts. It has been 113 developed using Matlab R2018a, with the following dependencies: Optimization Toolbox, 114 Signal Processing Toolbox and Global Optimization Toolbox. 115 Common user errors are thoroughly prevented by various messages and fail safes. In parallel, 116 all errors are treated and saved in a LOG file, to allow for efficient bug-fixing by any developer. 117 We purposely kept just a few parameters to modify through the GUI, with the goal of providing 118 an easy-to-use tool to people not used to these functions. In most cases, Matlab default 119 parameters of each deconvolution/optimization function worked well with our data, and we 120 believe that it can be extended to many biological datasets. However, a user skilled with Matlab 121 and optimization algorithms can easily modify the parameters of each function used.

127
Here we present the use of Iliski to find the best mathematical representation of neurovascular 128 coupling, an ensemble of cellular mechanisms that links brain activation to local increases of 129 blood flow. Neural activity is reported by GCaMP6f 26 , a calcium-sensitive protein expressed in 130 specific neurons. Blood flow is quantified by measuring red blood cells velocity changes in 131 capillaries 27 . 132 Several deconvolution and function optimization algorithms are provided. Choosing the 133 algorithm(s) and settings to compute a TF that gives faithful and robust predictions is not 134 always a straightforward task. It must be done according to the data features. Here we use some 135 data from our published study on neurovascular coupling 25 to point out how TFs change with 136 different algorithms and settings, and we show the critical points in the usage of non-137 deterministic methods. Finally, we propose a step-by-step guide to optimize the best TF on 138 practical situations. neuronal (Ca 2+ ) and vascular (red blood cells velocity) activations recorded in a mouse upon 142 odor application. Our example data display unavoidable and complex noise coming from many 143 sources: the biological system, the optical setup, the electronics, etc. Deconvolution with 144 Fourier or Toeplitz approaches predicts the vascular responses very well for a given data set. 145 However, the high-frequency noise is amplified by deconvolution 24 and transmitted to the TF, 146 the predictions are not robust across data sets and the actual dynamics of neurovascular 147 coupling is completely hidden in the TF noise (Fig 2). In this example, we show what we regard 148 as a typical case of overfitting. The TF is capturing the high frequency noise of the system 149 Below is the one -driven function we used with our data. 157 Where p1 ,.., p4 are the parameters to optimize and H is the Heaviside function to include a time-159 shift parameter (p3) which, in some cases, significantly improved the prediction and is a known 160 biological phenomenon to consider 29 . Its four parameters are not all independent from one 161 another, e.g. 1 , 2 and 4 all impact the TF amplitude. This inter-dependency between the 162 parameters brings an ill-posed optimization problem with multiple local minima of the cost 163 function, the sum of the residual squares, in the 4D space of the parameters. 164 A derivative-free optimization method (provided by Matlab and used previously 6 ) is provided 165 by the fminsearch function in Matlab. This approach on our data produced a TF with more than 166 one underivable point that is not representative of the smooth dynamic of neurovascular 167

coupling. 168
Another common option is provided by Quasi-Newton optimization algorithms: for this 169 approach too, we tested an unconstrained built-in method (fminunc function, Matlab). This 170 prediction is, overall, as good as with fminsearch (Fig 2, Pearson coefficients, fminunc vs.  171 fminsearch: 0.95 vs. 0.96), but the onset phase is not properly fit. Moreover, although not 172 evident from the plot, the optimized time shift was negative (-120 ms), implying that the onset 173 of the vascular response precedes the neuronal activation. 174 All the optimization methods tested above are deterministic, meaning that repeating them with 175 the same initial parameters will bring the same result. The pitfall of these methods when 176 applied to ill-posed problem is that optimization process will get attracted to the nearest local 177 minimum, regardless of the many other deeper minima, which may be far away in the 178 parameters space. In other words, deterministic algorithms are sensitive to the initial 179 parameters set before starting the optimization. Note that such bonds can be set through the Iliski GUI for any constrainable algorithm.  Fig 3A). However, as in the example of Fig 2, deterministic algorithms are prone to 196 biologically inconsistent TFs (Fig 3B). The non-deterministic, Simulated Annealing algorithm 197 with subsequent iterations method allows to efficiently exclude these TFs and obtain the best to obtain a good TF. In our experience, starting the optimization with a 'bad' TF -whose shape 207 is different from what is expected for the processed dataset -helps to collect more local minima 208 in a pool of optimization runs. For example, in our previous study 25 , we proposed iterations of 209 50 runs and started with the initial values of the standard TF (one HRF) which, peaking at 5 210 seconds, turned to be much slower than any of the optimized TFs. The sequence of 50-runs 211 iterations stopped when, within an iteration, no clear improvement was found in the optimized 212 TF 25 . On average, 2 iterations were sufficient to get a stable TF with Pearson coefficient above 213 0.9. Here, we investigated if a higher number of runs is beneficial to the detection of the 214 minimum of the cost function and if it prevents the need for further iterations. We compared 215 50 and 200 runs with single and double iterations, in cascade (Fig 4A). In a mouse dataset, we 216 observed a non-significant trend towards more scattered TFs shapes for computation using 50 217 runs versus 200 runs (1-way ANOVA, F(3, 16) = 2.086, p = 0.14, Fig 4B). Similarly, the quality 218 of the TFs did not significantly improve with increasing runs (1-way ANOVA, F (3, 16) = 2.299, 219 p = 0.12). As a result, TFs with fast dynamics (peaking within 1 and 2 sec), was a common 220 feature independently of the adopted protocol (Fig 4C). In a dataset from another mouse (Fig  221   4D,E) 229 We provide a decision diagram to choose the best approach to compute a TF based on the 230 features of the user's dataset (Fig 5). Nonetheless, we believe it is always a good choice to test 231 different approaches before making the final choice. 232 233

234
Iliski provides a user friendly, interactive and rich in option software for quickly testing 235 different methods and settings to compute TFs between biological events. In addition to its 236 standard integrated functions, it also allows for user-defined functions of any number of 237 parameters. Using data from the NVC field, we demonstrate how critical is the choice of the 238 method for computing TFs and the caveats of some parameters such as the number of iterations 239 necessary to non-deterministic algorithms. Note that we did not report the influence of 240 smoothing, interpolation, fitting and cost functions choice which are also known to affect the 241 final result. The use of multimodal datasets, i.e. neuronal calcium signal, measurements of 242 vascular responses at both the microscopic and mesoscopic scales enabled us to demonstrate 243 that NVC is represented by a similar TF which is much faster than the classical HRF, a finding 244 which is getting accepted in the field of brain imaging based on blood flow 5,25,30,31 . 245 The usage of Iliski are many; although it has been conceived for biological data, there is no limitation to load any discretized signal, serving as a tool for fast testing different approaches for TFs computation. (C) Iliski workflow is modular so that signal pre-processing is optional and functions to compute TFs can be modified by the user preserving the I/O modules. Odor stimulation elicited a neuronal response in the Olfactory Bulb of a mouse, reported by a calcium-dependent fluorescent signal (in blue, left panel), providing the input of TF computation. Output is given by the vascular response, measured as the change in speed of red blood cells flowing inside a capillary proximal to the recorded neuronal activation (in yellow, right panel). Both experimental data have been resampled at 50ms and used to compute a set of TFs (in orange) either with direct deconvolution approaches (Fourier or Toeplitz methods, middle-upper panel TFs) or with 1- function optimization performed by 3 different algorithms (middle-lower panel TFs). Complex TFs bring accurate prediction but amplify the noise of the data used to deconvolve them, with a consequent loss of robustness on other datasets. Smoother TFs are less accurate on the training dataset, but much robust when applied to test datasets. (A) 3 algorithms were compared in terms of the residuals of the cost function of the optimized TF on 7 mice datasets (Derivative free algorithm failed in optimizing a TF in a mouse). No significant difference was found across the 3 methods. (B) However, simulated annealing was the only approach to provide TFs consistent with the nature of biological data (TF with no more than 1 non-derivable point), while both the other deterministic methods run into inconsistent TFs in roughly 60% of the cases.