The Neural Engine: A Reprogrammable Low Power Platform for Closed-loop Optogenetics


Abstract—Brain-machine Interfaces (BMI) hold great potential for treating neurological disorders such as epilepsy. Technological progress is allowing for a shift from open-loop, pacemaker-class, intervention towards fully closed-loop neural control systems. Low power programmable processing systems are therefore required which can operate within the thermal window of 2o C for medical implants and maintain long battery life. In this work, we have developed a low power neural engine with an optimized set of algorithms which can operate under a power cycling domain. We have integrated our system with a custom-designed brain implant chip and demonstrated the operational applicability to the closed-loop modulating neural activities in in-vitro and in-vivo brain tissues: the local field potentials can be modulated at required central frequency ranges. Also, both a freely-moving non-human primate (24-hour) and a rodent (1-hour) in-vivo experiments were performed to show system reliable recording performance. The overall system consumes only 2.93mA during operation with a biological recording frequency 50Hz sampling rate (the lifespan is approximately 56 hours). A library of algorithms has been implemented in terms of detection, suppression and optical intervention to allow for exploratory applications in different neurological disorders. Thermal experiments demonstrated that operation creates minimal heating as well as battery performance exceeding 24 hours on a freely moving rodent. Therefore, this technology shows great capabilities for both neuroscience in-vitro/in-vivo applications and medical implantable processing units.

The work on this paper a result of the CANDO project (www.cando.ac.uk), which was directly supported by the EPSRC (NS/A000026/1) and the Wellcome Trust (REF 102037/2/13/2).

Junwen Luo, Dimitris Firflionis, Ahmed Soltan, Reza Ramezani, Richard Bailey, Enrique Escobedo-Cousin, Anthony O’Neill and Patrick Degenaar are with the School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU, U.K. ([dimitrios.firflionis; reza.ramezani; Richard.Bailey2; enrique.escobedo-cousin; anthony.oneill; patrick.degenaar]@ncl.ac.uk). Junwen Luo now is a Research Scientist at computing technology lab, Alibaba Group, Sunnyvale, U.S (junwen.luo@alibaba-inc.com). Ahmed Soltan now is an assistant professor, NISC group, Nile University (A.Soltan@nu.edu.eg). Wei Xu, Mark Turnbull, Darren Walsh and Andrew Jackson are with the Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, U.K ([Wei.xu; mark.turnbull; andrew.jackson, darren.walsh]@ncl.ac.uk).

Ahmad Shah Idil and Nick Donaldson are with the Department of Medical Physics and Biomedical Engineering, University College London WC1E, 6BT U.K. (l.a.shahidil; nickd@medphys.ncl.ac.uk).

Yan Liu and Tim Constadinou are with the Centre for Bio-Inspired Technology, Department of Electrical and Electronic Engineering, Imperial College London, SW7 2AZ London, U.K. (e-mail: yan.liu06@imperial.ac.uk; t.constadinou@imperial.ac.uk).

Copyright (c) 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org

Fig. 1. A conceptual description of the developed neural co-processor. It is aimed for a closed-loop processing with electrical recording and optical stimulation. The processing has four stages: recording interrupt, closed-loop algorithm, optical converter and data storage.

I. INTRODUCTION

The very first implantable pacemaker was introduced by Senning and Elmqvist in 1958 [1] for use in cardiac synchronization. Later in 1963 Bekthereva [2] demonstrated the first chronic implant in the human brain, though the field of Deep Brain Stimulation (DBS) is largely attributed to the work of both the Benabid and Blond, and Sigfried groups in 1991 [1]. The architecture of early systems were largely analogue with simple open-loop oscillatory functions. Since then, modern medical devices have developed digital control units with data logging and wireless communication capabilities. Nevertheless, such systems still largely implement open-loop oscillatory stimuli.

More recently, the development of the Reactive Neural Stimulator [3] demonstrates a direction of travel towards closed-loop systems which can provide targeted intervention based on specific neural activity. With the advent of human trials of optogenetics, initially in the retina [4][5], we could soon see the development of interference-free binomial control with electrical recording and optical stimuli.

To achieve this, there needs to be a platform which can implement closed-loop stimuli within both the constraints of battery operation and thermal output. For the latter, the surface
of the implant should not exceed 2°C to stay within the regulatory limits [6]. For the former, the largest rechargeable medical-grade batteries at the time of writing are between 300-350 mAhr (e.g. EaglePitcher Contego 325). As such, the target current draw needs to be an average of 7 mA or better (less) with a recharge cycle of 2 days or better. Furthermore, a similar figure can be attained if we assume batteries need to last at least five years over a recommended 1000 recharge cycles.

In order to reach human trials, proposed clinical architectures need to be verified in in-vitro/in-vivo neuroscience experiments. In particular, testing is required in freely moving non-human primates (NHP) and/or rodent models. Such experiments require a system with remarkably similar attributes to a clinical device. The physical size of a neural co-processor (embedded control system) needs to be 2 cm x 2 cm or better to mount in a head cradle (NHP) or backpack (rodent). The space available for the developed neural co-processor (embedded control system), in a medical control unit (chest unit) after the battery is similar. The battery pack is also perhaps surprisingly similar. The maximum capacity of clinical grade rechargeable batteries is similar to that of two CR2025 class watch batteries, which have a feasible weight (5.2 g) for both NHP and rat models. To achieve 48-hour recording at this capacity (minimum clinical target), a current consumption of around 7-8 mA or better is required.

Other academic groups and industry have also been developing implantable control systems to meet these specifications. The technologies to achieve this can be classified into three groups: (i) Custom Application-specific Integrated Circuits (ASICs) with inbuilt digital processing (ii) Reconfigurable logic modules such as Field-programmable gate arrays (FPGAs) or Complex Programmable Logic Units. (iii) General purpose microcontroller (MCU) systems.

These technologies are complementary rather than mutually exclusive [7]. Dedicated ASIC implementations can be low power provided because they are implemented on a sufficiently advanced technology node. For example, Liu et al. [8], presented a bidirectional Brain-Machine Interface (BMI) ASIC for closed-loop neuroscience research which could perform neural feature extraction. They demonstrated operation with a total power requirement of 8 mW (2.16 mA). Similarly, Kassiri et al. [9] introduced an inductively-powered seizure-predicting ASIC based microsystem for monitoring and treatment of intractable epilepsy which operated at a total power of 2.78 mW (1.52 mA). However, these DSP functions cannot be reconfigured which means they are primarily useful when an intervention algorithm has been identified, and only operational variables need to be changed. For neuroscience research or immature clinical interventions, such a lack of flexibility is undesirable.

An alternative to having fixed algorithms in silicon is to configure multiple independent filters which can be called in sequence or utilize reconfigurable digital logic. Gagnon-Turcotte et al. [10] utilized option (ii), and developed a wireless headset for high resolution 32-channel electrical recording in tandem with optical stimulations. Their system utilized a low-power FPGA based DSP for a digital spike detector and a wavelet data compression module. Such systems can be efficient for high power parallel processing but need special tuning for use in low power systems due to high leakage power. At a spike rate of 45 s⁻¹ conditions, the whole system consumes 119mW (32mA) when all channels are active. To provide like for like comparison, processing-only aspect of the FPGA consumed 47.2 mW (12.75 mA).

The third option is to utilize a general purpose microcontroller unit. These are very flexible with significant peripheral functionalities (e.g. SPI, ADC/DAC, timer). Such units form computation functions in software, thus taking multiple clock cycles. This allows for more complex computation such as recursive functionality and multitasking with peripherals, but at the cost of computational efficiency. However, typically such microcontrollers are available in advanced deep sub-micron technologies and have significant power cycling architectures. S. Zanos et al. [11] previously designed an MCU based system: The NeuroChip-2 for freely-moving NHP experiments which could perform closed-loop electrical stimulation and recording. Their system utilized commercial PSoC (Prototyping System-on-Chip) development boards and therefore required a minimum power of 284 mW (78.8mA).

Another example is the Cortex M4 which we adopted in this work. This platform consumes a nominal 14 mA current at clock frequency 40 MHz under full CPU load. However, with advanced power management, it is possible to define power modes at each stage in the computational cycle so that full power is only utilized in short bursts. Other operations can be performed operating in various levels from deep sleep to fully awake.

In this work, we have created a low-power flexible neural engine specifically for computing optogenetic closed-loop interfaces as shown in Fig. 1: It has one custom designed embedded Hardware/Software (H/W) processing system, and integrates with an ASIC based circuit for electrical recording and optical stimulations. Also, we have also incorporated communication protocols to make it compatible with both commercial data acquisition systems such as Intan as well as our previously published ASIC [12]. The remainder of this paper is structured as follows: Section II describes the hardware configuration; Section III and Section IV introduce the software architecture and BMI processing toolbox. The system processing has four stages: recording, interrupt, closed-loop
algorithm, optical converter and data storage. Sections V VI present the neuroscience in-vitro/in-vivo experimental configurations and results as well as power consumptions/thermal analysis. For power consumption comparisons with past systems, it can operate at an average of 2.93 mA under an optimised power cycling scheme. Finally, discussions and conclusions are included in Sections VII and VIII: the system has an architecture which could also progress towards clinical use, and great potentials towards animal behavioural control.

II. HARDWARE OVERVIEW

The hardware architecture is shown in Fig. 2. This consists of two parts: one is an embedded microcontroller based neural co-processor, and the previously designed ASIC-based neural interface [12] integrated within a headstage board.

The embedded processor is an ARM Cortex-M4 microcontroller (MCU) with DSP function block. Specifically, we used an MK22FN512VLH12 MCU - 120 MHz with 512 KB flash memory, 128 KB RAM in a 64 pin LQFP package. This is employed to implement the closed-loop algorithms as well as communication with a head-stage chip which specifically performed the signal acquisition and driving. The MCU was implemented on a 25 x 22 mm printed circuit board (PCB) as shown in Fig. 3(a). The PCB board also contains a (TLV70233DBVR) low dropout (LDO) regulator to provide a consistent 3.3 V supply up to 330 mA from a 3.7 V lithium-ion portable battery. It also contains a (LTC3525ESC6-5#TRMPBF) synchronous boost converter which can provide a consistent 5 V supply boosted from the regulator. This latter supply is required for providing power sources to the optical stimulation circuits in the ASIC based headstage as defined in our prior work [13][14]. In addition, there is an indicator LED (CLV1A-FKB-CJ1M1F1BB7R453 full-colour LED) and a micro-SD card slot for long term data recording (the micro-SD can address up to 64 Gb). For external testing and MCU programming, a Multilink FX programmer with a JTAG connector was used. Additionally, there is a UART port for connecting with a computer for real-time debug. Finally, a 10-pin flexible PCB cable (FPC) connector has been used to provide a connecting link between the embedded controller and the head stage. This is adaptable to commercial systems, but in this case for use with our previously described neural interface chip[12].

The rodent head-stage board is shown in Fig. 3(b). The head stage has dimensions of 10 x 10 mm, with 8 pins for stimulation and recording. It should be noted that we originally tried a backack configuration. But if the weight is sufficiently low, a head mount is superior as the rodent cannot scratch it off. Fig. 3 (c) shows the NHP head mount configuration. In this case, the system is mounted within a crown unit which is attached to the NHP head. The crown protects the electronics from the various vigorous activities of the NHP.

III. SOFTWARE ARCHITECTURE

The state machine operation of the software is shown in Fig. 4. The basic computing mechanism is as follows: after the chip
A BMI processing toolbox has three calculation stages: detection, suppressor and optical conversion. A detection stage has a simple thresholding, line length and bandpass with integral three functions (a-c). A suppressor has a phase shift algorithm and a PID linear controller (d-e). And optical conversions has a max light intensity, a Pulse Amplitude Modulation (PAM) and a Pulse Width Modulation (PWM) three functions (f-h). The detailed simulation results are shown at texts and Supplement S1.

**TABLE I**

<table>
<thead>
<tr>
<th>BMI TOOLBOX ALGORITHMS</th>
<th>Detection</th>
<th>Suppress</th>
<th>Optical converter</th>
</tr>
</thead>
<tbody>
<tr>
<td>(a) Simple threshold</td>
<td>Use time domain signal amplitudes as a detection criteria.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(b) Line length</td>
<td>Calculate the changing rates of data points to detect abrupt signals.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(c) Bandpass with integral</td>
<td>Filtering signals with a bandpass filter and integral functions.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(d) Phase shift</td>
<td>Phase shift signals in a certain degree at a frequency bandwidth.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(e) PID</td>
<td>Use PID controller to process signals.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(f) Threshold-max</td>
<td>Delivery the maximum light intensities when signals are above thresholds.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(g) Pulse amplitude modulation</td>
<td>Convert algorithm outputs into LED DAC values using pulse amplitude modulations.</td>
<td></td>
<td></td>
</tr>
<tr>
<td>(h) Pulse width modulation</td>
<td>Convert algorithm outputs into LED DAC values using pulse width modulations.</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

initialization, a system is configured into a sleep mode, and a recording interrupt is reset. This indicates that most of the system peripherals are disabled except for a low power mode timer (LPT). The system is constantly at sleep mode until the LPT counter equals the biological recording periods. Then a recording interrupt is triggered to wake up the system into a normal run mode. The system will generate a recording request to an ASIC based head-stage via a serial peripheral interface (SPI). After a short recording delay (e.g. 45 μs), a 16-bit data will be received, that contains LFPs. Therefore, a recording is requested every time the microcontroller is interrupted. The interrupt occurs periodically, and this period equals the neural recording sampling frequency (e.g. 500 Hz). So effectively the neural recording is continuous with discrete time steps. A detection algorithm is employed to decide whether recorded LFPs belong to abnormal activities or not. A suppressor will be performed when abnormal activities are detected. The outputs of a suppressor will be translated into stimulation commands by using an optical converter. Then the head-stage will control light intensity based on received stimulation commands. In addition, the received data will be directly stored into an SD card or to a PC via UART (for debugging purposes). After that, the system is configured into sleep mode again to wait for the next interrupt. The SPI communication details are described in previous work [12]. Particularly, we designed a BMI based toolbox for three-stage processing: detection, suppression and optical simulation results are displayed in the Supplement S1 conversion. The details will be described in section IV. And the timing diagram of the power cycling technique is shown at the bottom of Fig. 4: a red dash line indicates a recording interrupt, sleep modes are in green areas, and normal run modes are in grey areas. The duration of a normal run mode depends on the computational loads, while the sleep mode is based on biological recording periods. The details of the power cycling approach can be seen at the Supplement S2.

**IV. BMI PROCESSING TOOLBOX**

The BMI processing toolbox has three stages: detection, suppressor and optical converter. Each stage has different algorithms as illustrated in Fig. 5. Also, the brief descriptions of each entity are presented in Table I, and the detailed results of each algorithm are described in Supplement S1.

The detection stage is a pre-processing stage to determine whether to intervene or not. We include three techniques: simple thresholding [15], line length [16] and bandpass with integral function. These specifically address on entities of signal amplitudes, signal change rates and frequency domains. A testing chirp signal is generated which contains blocks of $V_{pp}$
This technique enjoys minimum latency and has been widely used in neuroscience [17][18] and neurotechnology applications [19]. This technique is an ideal candidate for a pre-processing stage or open-loop stimulus. One typical application is to compare recorded signals with the biological experimental noise level to enhance biological information processing [20]. The other model-based criteria such as visual threshold circular signals [21] and position [22] can also be considered as threshold factors. The example in Fig. 5(a) shows a comparison between the absolute threshold (red dash line) and neural signals, in which signals above the threshold values are considered as abnormal activities. The detailed example is illustrated in Fig. S1-1 (b).

The line length technique [16] is shown in Fig. 5(b). It is sensitive to the change rates of amplitudes in the time domain, and has already been widely applied on epilepsy detections [23][24]. It follows the relation in equation 1:

$$LL(n) = \frac{1}{K} \sum_{k=n-N}^{n} abs[x(k - 1) - x(k)] = \frac{L(n)}{K}$$  \hspace{1cm} (1)$$

where $LL(n)$ is the normalized line length value at discrete time index $n$, $L(n)$ is the mining sum of distances between successive points within the sliding window of size $N$, $x(k)$ is the data sequence value at the $k$ sample, $K$ is the normalization constant, $N$ is the sliding window length, and $abs$ stands for absolute value. Window length and value $K$ depends on signal behaviours and experimental protocols. The LL simulation results in Fig. S1-1 (c) demonstrate that algorithm output not only depends on the abrupt changes in amplitudes but also frequency variations.

The experimental overall picture is shown at (c) top right.

Fig. 6. System experimental setup. (a) In-vitro closed-loop processing platform. The recoding optrode is NeuroNexus 16-channel array probes and stimulation site is an optical fiber. (b) A rodent carries developed system in a freely moving environment (The holder is tailor designed by using a 3D printer). Fig. 6 (b) shows system in-vitro experiment mechanical setup, a custom designed optrode(3mm) with electrodes can fully implanted in a rodent brain, the optrode has 8 pins which can integrates with the Neural Engine head-stage pinouts, the LED will be on the shaft. (c) A primate control system on the head for freely moving recording experiment. There are three components of the system as labelled: 1) the brain prototype; 2) a system holder with the neural engine; 3) a battery. The experimental overall picture is shown at (c) top right.
(A) In-vitro closed-loop LFP modulations

(B) In-vivo the rodent freely moving recordings

(C) In-vivo the primate 24-hour freely moving recordings

Fig. 8. (A): In-vitro closed-loop processing experiments: (a) is the LFPs recording results; (b) and (c) are optical stimulation commands and phase conditiona lines; (d) is wavelet transform power analysis performances; and (e)-(g) are three different cases in details. (B): a rodent freely moving recording results. The top figure displays LFP recording activities in time domain; and the bottom one shows LFP data with PSD analysis. (C): the primate 24-hour freely moving LFP recording experiments, the top figure displays 24-hour LFP recording data in the time domain, and the bottom shows 24-hour data with PSD analysis. The primate LFP different oscillations are also labeled in the bottom figure.

the developed technology has reasonable frequency classification behaviour.

The second stage is the suppression functions. This includes a phase shift algorithm and a PID controller. The phase shift algorithm is described in Fig. 5(d). This algorithm determines the intensity of the signal at a given frequency and then returns a corresponding signal with a phase shift. The relation is given in equation (5) below:

\[ y = e^{-kt} \cos(2\pi ft + \varphi) \]  

Where \( k \) is a decay constant, \( f \) is a central frequency and \( \varphi \) is the signal shift (in degrees). A case study of phase shift algorithm with central frequency 5Hz and 180-degree shift is given at Fig. S1-2(b).

The other suppressor technique, based on our previous work [25][7] and designed neural mass [26], is a PID control system. This is described in Fig5. (e) and described in equation (6) below:

\[ u(t) = k_p \times e + k_i \times \int_0^t e(t)dt + k_d \times (e - \dot{e})/dt \]  

Where \( k_p \), \( k_i \) and \( k_d \) are proportional, integral and derivative gain; \( e \) is an error signal calculated by reference signals and current outputs. \( \dot{e} \) is a previous error signal, and \( t \) is an integration step. Also, the developed control system can be further evolved into adaptive control schemes [27] for advanced operations. A system with the parameter setting \( k_p = 1, k_i = 10 \) and \( k_d = 0 \), results are shown at Fig. S1-2(c).

The last stage is an optical conversion. This is to convert algorithm outputs to implantable LED control commands. The main purpose of this stage is to convert numerical outcomes from the suppressor into real-world optical intensities and waveforms. These need to take into account, light penetration [6] and Channelrhodopsin (ChR2) encoded cell performance [28]. Also, the efficiency of the LED drive circuit [12], LEDs and optrode electro-thermal-optical characteristics need to be taken into account[6].

Three essential converting methods are therefore implemented in the toolbox: 1) a maximum light intensity method with a threshold; 2) a Pulse Amplitude Modulation (PAM) and 3) a Pulse Width Modulation (PWM) technique. The maximum light intensity approach is described in Figure 5 (f). It simply provides a pulse of maximum intensity when the intervention is beyond a certain threshold.

The other two techniques show modulation variants in
intensity and time domains which are presented in Figure 5 (g-h). A PAM is used to translate LFP values into LED DAC value linearly, and a PWM technique to translate LFP values into corresponding LED pulse width. The LFPs are displayed by using violin plot at each stimulus condition. The mean value is labelled in black line and the medium value in labelled in red line.

V. EXPERIMENTAL SETUP

Optogenetics is a relatively new tool that through genetic engineering, expresses light sensitive ion channels (e.g. channelrhodopsin) and/or pumps (e.g. halorhodopsin) on the cell membrane. It is therefore possible to use optical stimuli to activate or inhibit a neuron cell activity.

The recent Allergan-Retrosense trial (US Clinical Trial identifier: NCT02556736) demonstrates that channelrhodopsin-2 can be reverse engineered to human retina using Adeno Associated Viral (AAV) vectors. Ingusci[29] provides a review of alternative approaches with Lenti Virii. In this case, we used channelrhodopsin-2 delivered into animals via AAV.

A. In-vitro experimental setup

Rodent brain slices were cut using a 5100mz vibratome (Camden Instruments). The slices were later transferred incubated at room temperature in a brain tissue interface and holding chamber until later electrophysiological recordings. During recordings, the slices were perfused with oxygenated ACSF (in mm: 126 NaCl, 24 NaHCO₃, 1.2 MgSO₄, 1.2 CaCl₂, 10 glucose, 3 KCl, 1.25 and NaH₂PO₄) which also contained the compound 4-aminopyridine (4-AP; 200 µM) to induce epileptiform activity in rodent brain slices held at 32.5°C. As Fig. 6(a) depicts, all in-vivo electrophysiological recordings were performed using an interface recording chamber. Also, we utilized a commercial 16-channel linear multi-electrode array probe (NeuroNexus Technologies: A16x1-2mm-100-177 probes – shanks are 100 µm apart; recording site area on each shank). The simulation output of the developed ASIC based headstage was a 0-5 V voltage signal drove a blue LED light source (473 nm, M470F1; Thorlabs) coupled to a 200 µm diameter optical fibre (M89L01–200; Thorlabs). In the freely moving setup we recorded onto a SD card. The data transmitting latency is 20 𝜇s per cycle (Table II), which is significantly less than the real-time time constraint 2ms. Meanwhile, for this in-vitro case we transmitted the data directly to a PC via UART communication to allow for real time display and analysis.

Fig. 9. In-vivo closed processing results. (a) is the overlay of stimulation onto filtered data. Different colour represents optical stimulation commands at different phases (from 0 deg to 315 deg). The algorithm processing time is ON for 5 seconds and OFF for 5 seconds. (b) are the effects of the stimulus in in-vivo closed loop processing. The LFPs are displayed by using violin plot at each stimulus condition. The mean value is labelled in black line and the medium value in labelled in red line.
reactive ion etch (DRIE) windows in the top oxide were opened by a matching top layer of silicon dioxide insulation. Contact electrodes, which can integrate with the Neural Engine head in a rodent brain, were then separated from the wafer using a deep wet etch: NH$_4$OH and H$_2$O$_2$ for the Ti and K$_2$Fe(CN)$_6$, Na$_2$S$_2$O$_3$ and CS(NH$_2$)$_2$ for the Au. The metal patterns were then covered by a matching top layer of silicon dioxide insulation. Contact windows in the top oxide were opened by a Deep Reactive Ion Etch (DRIE) using a mixture of Ar and SF$_6$. The individual optrodes were then separated from the wafer using a deep reactive ion etch step to cut through the silicon. A gold wire was bonded into each recording site before the shank was coated in medical grade silicone encapsulation (MED6015), after which the gold wire was cut to open a conductive path through the passivation layer.

**B. Freely-moving rodent setup**

Two male, 3 to 6 month old, Sprague Dawley rats were used in the electrophysiological recording experiments. The data presented in this study were acquired while the animals were placed in an enclosed home cage during a 2-3 h sleep session. The animals were given no tasks and naturally fell asleep on their own. The behavioural procedure and the electrophysiological recordings were performed under UK Home Office licenses and were approved by the Newcastle University Animal Welfare and Ethics Review Board. A custom designed system holder was mechanically fixed on a rodent head (as per Fig. 6(b). The developed system was connected to the implantable electrodes with 130k ohm [30], and is enclosed in this holder. Data was recorded using an SD card on the control board. Fig 6 (b4) shows an exemplar custom designed optrode with electrodes which can be fully implanted in a rodent brain. The optrode has a shaft length of 3mm and 2 electrodes, which can integrate with the Neural Headstage pinouts.

Briefly, the fabrication of the custom probes was as follows: a silicon wafer was coated with 1 um of silicon dioxide via chemical vapour deposition (CVD). Ti/Au/Ti (20/200/20 nm) metallisation was deposited by evaporation and then patterned by photolithography followed by a three-stage wet etch: NH$_4$OH and H$_2$O$_2$ for the Ti and K$_2$Fe(CN)$_6$, Na$_2$S$_2$O$_3$ and CS(NH$_2$)$_2$ for the Au. The metal patterns were then covered by a matching top layer of silicon dioxide insulation. Contact windows in the top oxide were opened by a Deep Reactive Ion Etch (DRIE) using a mixture of Ar and SF$_6$. The individual optrodes were then separated from the wafer using a deep reactive ion etch step to cut through the silicon. A gold wire was bonded into each recording site before the shank was coated in medical grade silicone encapsulation (MED6015), after which the gold wire was cut to open a conductive path through the passivation layer.

**C. Freely moving NHP setup**

Experiments were approved by the local ethics committee and performed under appropriate UK 112 Home Office licenses in accordance with the Animals (Scientific Procedures) Act 1986. Recordings were made from a female macaque with implanted micro-electrodes in the primary motor cortex. The animal was implanted with 116 custom arrays comprising 12 moveable 50 µm diameter tungsten micro-wires (of impedance117 ~200 kΩ at 1 kHz) using the same technique as previously described[31]. There were six components of the system: (1) an ASIC recording head-stage; (2) a system holder; (3) a processing unit; (4) a recording pinout; (5) a reference pinout and (6) a battery connector. The developed system was integrated with a custom designed holder which was placed on the female macaque head as shown at Fig. 6(c). The data was recorded using the SD card.

**D. In-vivo closed-loop processing**

As it is shown in Figure 7, Mice weighing 40 g were anaesthetized via inhalation of isoflurane (confirmed by the absence of the pedal withdrawal reflex). After a sufficient depth of anaesthesia was achieved, the animal was fixed in a stereotaxic frame (Kopf, Tujunga, CA, USA). A heating pad with feedback temperature control via a rectal probe (Harvard Apparatus, Holliston, MA, USA) maintained the core temperature of the mouse at 32°C. A skin incision was made in
TABLE II
SYSTEM SPECIFICATIONS

<table>
<thead>
<tr>
<th>Overview</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Control unit(size)</td>
<td>25mm×22mm</td>
</tr>
<tr>
<td>Head-stage(size)</td>
<td>10mm×10mm</td>
</tr>
<tr>
<td>Control unit(weight)</td>
<td>4.1g</td>
</tr>
<tr>
<td>Head-stage(weight)</td>
<td>1g</td>
</tr>
<tr>
<td>Recording channel</td>
<td>4</td>
</tr>
<tr>
<td>Stimulus channel</td>
<td>8</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Speed</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Recording</td>
<td>45μs</td>
</tr>
<tr>
<td>Stimulation</td>
<td>150μs</td>
</tr>
<tr>
<td>Algorithm processing</td>
<td>3-270μs</td>
</tr>
<tr>
<td>Data logger</td>
<td>19.5μs</td>
</tr>
<tr>
<td>UART transmission</td>
<td>22μs</td>
</tr>
</tbody>
</table>

Power

| Control unit              | 0.63mA            |
| Head-stage                | 1mA               |
| SD card                   | 1mA               |
| PMU                       | 0.3mA             |
| Overall                   | 2.93mA            |

The system recording frequency is setup at 50Hz, and clock frequency is at 40MHz.

the scalp before the peristeum was retracted to expose the bregma. A craniotomy was drilled above the parietal cortex of the left hemisphere. Optrodes were implanted into the left hemisphere. Two custom designed optrodes (previously described) one for electrical recording and one for optical stimulation were placed in corresponding places. The data was transmitted via UART communication in real-time.

VI. RESULTS

A. Modulation local field potential activities in a closed-loop

A testing algorithm of phase shift-pulse amplitude modulation (PAM) was implemented as a case study. Specifically, we employed algorithms with different shifted phases from 0 to 315 degrees relative to a target frequency. The experiment was performed in-vitro as previously mentioned. Recorded signals were first filtered at a central frequency \( f = 6 \) Hz (experimental LFP central frequency), and shifted by pre-defined degrees. Then the outputs were generated by comparing with a threshold value \( V_{th} = 0 \) V. At the final stage they were translated into LED driving voltages to control light sources via the PAM algorithm.

The optical simulation command was updated every 2 ms in our experiments. The amplitude depended on the implemented closed-loop algorithm. The algorithm processing time of each degree on/off period was 10 seconds. The results are shown in Fig. 8A: (a) shows the LFPs recording results; (b) and (c) show the optical stimulation commands and phase condition lines respectively. (d) shows a wavelet transform power analysis performances. Based on the result, it can clearly seen that depending on the different phase shift, the LFP signal oscillates at specific frequency ranges between the 10-15Hz (indicated by the red dash arrows) range of interest. Particularly, three cases are displayed in detail in (e)-(g); (e) displays the LFP oscillations under shifted degree 45 condition; (f) shows the LFP oscillations under shifted degree 180 condition; and (g) gives a result of LFP oscillations under shifted degree 270 condition. These clearly illustrate that the developed system has a capacity to modulate LFP activities at certain frequency bandwidth in a closed-loop processing manner.

B. Freely-moving rodent recording experiment

Recorded data captured during a freely moving rodent experiment is displayed at Fig 8B: (top) LFP data in the time domain, and (bottom) frequency domain with Power Spectral Density (PSD) analysis. In general, there are three different types of signals; at the very beginning there have 50 Hz noises which are at experimental setting up stage: the recording pin was floating in the air. In between there are the other two types of signals: one is spindle oscillations (light sleep) between 10-12 Hz; the other one is Delta oscillation (deep sleep) between 0-5 Hz. These LFP oscillations are matched with the rodent sleep-awake behaviours in the experiment.

C. Freely-moving NHP recording experiment

A 24-hour non human primate freely moving recording is shown in Fig 8C: the top figure displays 24-hour LFP recording data in the time domain, and the bottom figure shows data with PSD analysis. The data clearly shows the differences between the primate in asleep and in wake-up conditions. At the beginning of time period 11:00-13:00: there are lots of movement introduced large amplitude signals (over 1mV), which is due to the electrodes movements as well as connection wire issues. At time period 13:00-17:00, the data displays LFP oscillations with beta oscillations with broadband artefacts that indicates the primate was at the awake condition. At time period 17:00-05:00, The LFPs had regular delta oscillations (0-5Hz) and spindle oscillations (10-12Hz), which indicates that the primate was in light and deep sleep status. After 05:00, the LFP sleep patterns smoothly disappeared and came back to the beta oscillations (20Hz) in the awake condition. In addition, the movement introduced signal can be filtered out since it has clear difference characteristics (e.g. frequency range, amplitudes) with normal LFP oscillations.

D. In-vivo closed-loop processing

The LFPs and the closed-loop optical stimulation commands of the in-vivo experiment are presented in figure 9 (a). It shows the stimulation pattern defined by the closed-loop algorithm which overlay of stimulation onto filtered data. The central frequency of the algorithm was set to 10 Hz (with bandpass filter 5-15Hz). The bandpass range was chosen in such a way that the correlation between the input recorded data and the output stimulations. The algorithm processing time is ON for 5 seconds and OFF for 5 seconds. Each colour correlates to a different phase in the kernel of the algorithm (from 0 degree to 315 degree). This leads to eight discrete phases, which repeat in a cyclic manner. Figure 9(b) shows the effect of the stimulus in closed-loop processing. The LFPs data are displayed by using violin plots at each stimulus phase condition. By comparing with an Alg-off violin shape, there are clear modifications on neural patterns in all shifted degree domains since violin shapes
The system specification is shown in Table II. The head-stage is 10 mm × 10 mm and 1g, and a control unit is 25 mm × 22 mm and 4.1g. System recording and stimulation latency is 45 μs and 150 μs (per one request) respectively. Depending on the implemented algorithms, the latency of closed-loop processing varies from 3-270 μs, the latency of data logger (SD card) and the universal asynchronous receiver-transmitter (UART) is 19.5μs and 22μs, respectively.

The average power consumption of the control unit is 0.63 mA at the condition in which the recording frequency is 50 Hz, and the MCU clock frequency is set to 40 MHz. The head-stage approximately consumes 1 mA [12], the SD card and power management system consume 1 mA and 0.3 mA respectively. The maximum power that could be consumed by a single LED can be up to 5mA (at voltage supply 5V), though we typically consume much less to ensure no adverse thermal effect. However, the average power is very much dependent on the frequency of adverse activity in the tissue and the threshold for intervention. As such, the power budget for this would need to be calculated separately for each application.

The system power cycling performance is shown in Fig. 10(A): an embedded controller consumes 12 mA (peak currents at 14 mA) normal run mode and approximately only 0.3mA in sleep mode. The normal run period is 475 μs at this case, and the rest periods are all in sleep mode. The specific details of the power cycling approach are described in Supplement S2.

F. System thermal experiment

Previously discussed implantable electronics have been constrained by battery capacities and thermal issues [6]. This places significant limits on the processing and stimulus. Interestingly, the constraints of implantable electronics are very similar to our work, such as 24-hour battery replacement. A thermal experiment was therefore carried out to investigate the thermal performance of the developed embedded system (at room temperature 25.1°C). For comparison purposes, systems with a normal run mode and with a power cycling mode were processed for an hour. The results are shown in Fig. 10(B): at the normal run condition, the device gradually raised up to 27.2°C after 10 minutes, and then maintained this value. While at power cycling mode system was at standstill at a room temperature of 25.1°C. This indicates that the developed technology holds a promising solution for addressing thermal issues of implantable electronics.

VII. DISCUSSION

A. Power cycling technology

We investigated the neural engine power cycling performance. A biological recording frequency range between 1 Hz and 2000 Hz. In our developed BMI processing toolbox, two sets of algorithms are implemented to represent computationally light and computationally heavy loads (9 μs, 333 s). The light case is a simple threshold of the maximum intensity, and the heavy case is a bandpass-phase shift-PAM. The results are displayed in Fig. 10(C).

In general, as the biological recording frequencies increase, the current consumption increases up linearly. Specifically, the developed neural co-processor consumes less than 1 mA when recording frequency is below 100 Hz. This demonstrates that at low-frequency recording applications such as brain field signals (e.g. less than 50 Hz) [32], the developed technique has significant power reduction performance, which approximately equals the ASIC circuits level (e.g. less than 5 mW) [8]. However, when recording frequency is required over 100 Hz for fast signals such as action potentials, the developed technology shows similar power performances to the standard embedded hardware. Meanwhile, the system with different computational load displays significant power consumption variations (at recording frequency 1000 Hz, the best case consume 0.42 mA while the worst case is 2.6 mA). Also, Fig. 10(D) displays the energy consumptions of each algorithm. This can be further optimized by reducing window size and kernel taps, change the data-type from the floating-point to the integer, and use software in-line functions.

B. The comparison to the other works

We show several similar works developed recently in Table III. Liu et al. [8] and Kassiri et al. [9] develops an ultra-low power BMI system with elegant ASIC designs with 8 mW and 2.8 mW power consumptions, respectively. They both showed valid in-vivo data with AP detection along with phase synchronization performances. Regarding energy and physical dimension issues, these systems are very well developed.

---

**TABLE III**

**THE COMPARISON BETWEEN THE OTHER WORKS**

<table>
<thead>
<tr>
<th>System</th>
<th>Recording/Stim channels</th>
<th>Computing flexibility</th>
<th>Weight/size</th>
<th>Power consumption</th>
<th>Data transmission</th>
<th>Techniques</th>
<th>DSP functions</th>
<th>Optical stimulus</th>
<th>Freely moving Experiments#</th>
</tr>
</thead>
<tbody>
<tr>
<td>This work</td>
<td>2/8</td>
<td>YES</td>
<td>4.1g/25mm x 22mm</td>
<td>1g/10mm x 10mm</td>
<td>2.9μA (9.7mW)</td>
<td>UART/SD card</td>
<td>ASIC+MCU</td>
<td>BMI processing tool box</td>
<td>YES</td>
</tr>
<tr>
<td>Gagnon, 2017</td>
<td>32/32</td>
<td>NO</td>
<td>2.8g/17mm x 18mm</td>
<td>1.9g/10mm x 10mm</td>
<td>32μA (119mW)</td>
<td>Wireless</td>
<td>ASIC+FPGA</td>
<td>Adaptive threshold &amp; Wavelet compression</td>
<td>YES</td>
</tr>
<tr>
<td>Liu, 2017</td>
<td>16/16</td>
<td>NO</td>
<td>18g/13mm x 20mm</td>
<td>5g/10mm x 10mm</td>
<td>2.1μA (8mW)</td>
<td>Wireless</td>
<td>ASIC+MCU</td>
<td>AP detection &amp; PID</td>
<td>NO</td>
</tr>
<tr>
<td>Kassiri, 2017</td>
<td>24/24</td>
<td>NO</td>
<td>6.2g/20mm x 20mm</td>
<td>2g/10mm x 10mm</td>
<td>1.5μA (2.8mW)</td>
<td>Wireless</td>
<td>ASIC+FPGA</td>
<td>Phase synchronization</td>
<td>NO</td>
</tr>
<tr>
<td>Zanos, 2011</td>
<td>3/3</td>
<td>YES</td>
<td>0.035g x 55mm</td>
<td>0.35g x 10mm</td>
<td>78.8μA (284mW)</td>
<td>SD card</td>
<td>ASIC+MCU</td>
<td>None</td>
<td>NO</td>
</tr>
</tbody>
</table>

* The Power consumption data in this table doesn’t include implantable LEDs. #: The freely moving experiments are indicates both on primates/rodents.

---

*We show several similar works developed recently in Table III. Liu et al. [8] and Kassiri et al. [9] develops an ultra-low power BMI system with elegant ASIC designs with 8 mW and 2.8 mW power consumptions, respectively. They both showed valid in-vivo data with AP detection along with phase synchronization performances. Regarding energy and physical dimension issues, these systems are very well developed.*

---

**E. System specifications**

The system specification is shown in Table II. The head-stage is 10 mm × 10 mm and 1g, and a control unit is 25 mm × 22 mm and 4.1g. System recording and stimulation latency is 45 μs and 150 μs (per one request) respectively. Depending on the implemented algorithms, the latency of closed-loop processing varies from 3-270 μs, the latency of data logger (SD card) and the universal asynchronous receiver-transmitter (UART) is 19.5μs and 22μs, respectively.

The average power consumption of the control unit is 0.63 mA at the condition in which the recording frequency is 50 Hz, and the MCU clock frequency is set to 40 MHz. The head-stage approximately consumes 1 mA [12], the SD card and power management system consume 1 mA and 0.3 mA respectively. The maximum power that could be consumed by a single LED can be up to 5mA (at voltage supply 5V), though we typically consume much less to ensure no adverse thermal effect. However, the average power is very much dependent on the frequency of adverse activity in the tissue and the threshold for intervention. As such, the power budget for this would need to be calculated separately for each application.

The system power cycling performance is shown in Fig. 10(A): an embedded controller consumes 12 mA (peak currents at 14 mA) normal run mode and approximately only 0.3mA in sleep mode. The normal run period is 475 μs at this case, and the rest periods are all in sleep mode. The specific details of the power cycling approach are described in Supplement S2.

---

**F. System thermal experiment**

Previously discussed implantable electronics have been constrained by battery capacities and thermal issues [6]. This places significant limits on the processing and stimulus. Interestingly, the constraints of implantable electronics are very similar to our work, such as 24-hour battery replacement. A thermal experiment was therefore carried out to investigate the thermal performance of the developed embedded system (at room temperature 25.1°C). For comparison purposes, systems with a normal run mode and with a power cycling mode were processed for an hour. The results are shown in Fig. 10(B): at the normal run condition, the device gradually raised up to 27.2°C after 10 minutes, and then maintained this value. While at power cycling mode system was at standstill at a room temperature of 25.1°C. This indicates that the developed technology holds a promising solution for addressing thermal issues of implantable electronics.
However, fixed CMOS circuits cannot be reprogrammable which may not ideal for medical practical applications. Zanos et al. [11] designed an MCU based system for primate freely-moving experiments. By taking advantages of a commercial development board PSoC, the algorithm could be updated. However, the overall consumption was significantly large at 284mW. Similarly, Gagnon-Turcotte et al. demonstrated a strong computing scalability system with 32 recording/stimulation channels with a comprehensive wavelet data compression technique [10]. The power consumption is still relatively high at 119mW. Therefore, translating these devices into clinical applications would require further power optimization efforts to avoid battery life and thermal issues. Meanwhile, [10] and our own effort employ optogenetic based stimulations [33], while the others use traditional electrical stimulation, which cannot be genetically targeted [34]. Regarding freely moving experiments, our system also shows both primates/rodents reliable LFP recordings. And for in-vitro experiments, the developed system demonstrates considerable closed-loop neuromodulation outcomes, which includes both stimulation and recording. Therefore, the technology presented in this paper is the only one with both ultra-low power and computing flexibility performances as well as valid freely-moving LFP recording data. And this embedded-ASIC hardware may inspire us for the next generation BMIs [35][36][37][38].

Also it should be noted that we have utilized an SD card as as our biological teams found this more convenient in this instance.

Last but not least, there have been some of the other techniques show various capabilities in optogenetic fields. Tae-il Kim[39] developed a multifunctional optrode with injected light sources, detectors, sensors and other components which can place into a precise location of the deep brain. This technique provides optical, thermal, and electrophysiological studies in a freely moving environment. Yu [40] provided a wirelessly controlled, implantable, micro-LEDs based optical neural face for behaving animals, which is interesting for animal behavior research in neuroscience.

C. The applications

The BMI processing toolbox is developed for neuroscience various applications. Regarding the three-stage algorithm configuration: for a detection stage, a simple threshold algorithm shows a low detection accuracy but short latency, which is suitable for noise detections or explicit time domain processing; a bandpass filter allows only interested signals pass into the next stage process with a considerable process delays (149µs at 101 taps at sampling frequency 250Hz). Therefore, it suitable for frequency domain applications. A line length method can be applied either in time or frequency domain since it’s sensitive to signal abrupt variations. The phase shift algorithm could help us investigate several fundamental neuroscience mechanisms such as neural coherence [41][42] and synchronization [43][44]. A standard PID controller has been implemented for control/modification of abnormal activities – in particular, for epilepsy [45][46]. The final optical converter stage is determined by the required suppression/activation level as well as opsin expression, opto-electronics design and LED performance.

It should be noted that the parameter selection in each algorithm is experimentally determined. So there will need to be a long-term experimental follow up to explore these parameter ranges for different applications.

Two case studies have been employed to demonstrate system various applications. The power consumptions of benchmark 1(best case) and benchmark 2 (worst case) is 1.38mW and 8.5mW at recording frequency 500Hz (without extra averaging function), respectively. Therefore, the neural engine can be reconfigurable depends on experimental requirements such as closed-loop algorithms/parameters.

D. Future work of in-vivo closed loop processing

The current in-vitro and in-vivo is sufficient to demonstrate the platform is capable of electrical recording, closed loop processing and optical stimulation in a form factor that can be deployed in-vivo. For the near future in-vivo freely moving closed loop experiments, optrode LED bonding technique and encapsulation should be tailor designed for alleviating artefacts and leakage current issues. More importantly, system level approach is defined to evaluate animal behaviors corresponding with closed loop processing. Based on these, the developed platform BMI toolbox should be correspondingly modulated to maximum the system performances as well as mechanical setups (e.g. grounding, optrode locations).

VIII. CONCLUSION

In this work, we have developed a neural engine for closed-loop optogenetic processing. First, an in-vitro experiment is performed to demonstrate system closed-loop processing for modulating LFPs at specific frequencies. Both non-human primate and rodent in-vivo experiments have been shown with reliable system recordings. Also, a thermal experiment and a system current dynamic analysis has been carried out to illustrate system ultra-low-power computing performances for long lifetime.

Therefore, the major contributions are as below: 1) A novel embedded-ASIC hardware architecture has been developed for optogenetics closed-loop applications, which shows both ultra-low power performance and strong computing flexibility, we also demonstrate optical closed-loop neuromodulation performances; 2) An in-vitro and in-vivo closed-loop processing platform has been developed to meet various neuroscience/neurotechnology applications requirements. Particularly, tailor designed BMI toolbox is implemented which allows the system to has wider application scopes; 3) An in-vivo recording platform has been developed for both primates/rodents freely moving environments We have demonstrated the ability to record activity through electrical recording, modulate activity through optogenetic stimulation, and perform closed loop processing have all been demonstrated together in in-vitro preparation. We have also tested this same hardware in-vivo to show the form factor is appropriate. In the near future we will include in-vivo freely moving experiment and focus on the animal behaviors control research by taking advantages of developed closed-loop system.
ACKNOWLEDGEMENT

We would thank for the Wellcome Trust (102037/Z/13/Z) and the Engineering and Physical Sciences Research Council (EPSRC NS/A000026/1) for funding Controlling Abnormal Network Dynamics using Optogenetic (CANDO) project. We would also like to give our thanks to Jeffrey Warren, Paul Killan and Darren Mackie for their help in the PCB design, fabrication and assembly.

REFERENCES


[23] K. Jansen and L. Lagae, “Cardiac changes in


Supplementary S1 A BMI toolbox processing

Developed BMI toolbox processing has three stages: a detection, a suppressor and an optical converter. Each stage contains a variety of algorithms. This supplementary provides the details simulation results of each algorithm both in time and frequency domain.

1. Stage 1: Detection

The detection stage has three different algorithms: thresholding, line length and bandpass filter. Fig. S1-1 displays three algorithm results both in the time domain and frequency domain (wavelet transformation).

A testing chirp signal is generated which contains blocks of $V_{pp} = 20mV$, $V_{pp} = 100mV$ and $V_{pp} = 180mV$. Each block is a chirp signal with frequency: 1Hz -5Hz -1Hz (Fig. S1 a(1)). This indicates that signals encode information changes both in time and frequency domain. The magnitude scalogram results are also shown at Fig. S1 a(2). For each algorithm, signals were pre-stored into a waveform generator (Keysight 33500B) for testing purposes.

Fig. S1-1 b(1) displays the results of the simple threshold algorithm. A threshold value is set as 0V. Signal amplitudes which are above 0V are considered as abnormal activities. Also, the result of wavelet transformation is shown in Fig. S1-1 b(2). Interestingly, by applying the simple threshold technique, there are two major changes regarding outcomes. Firstly, the absolute amplitudes are reduced to a certain value based on the thresholds. Secondly, there are some second and third harmonic waves in the frequency domain. Fig. S1-1 c(1-2) shows Line Length (LL) results, the LL entropy and threshold lines are labelled as well. Only signals LL entropy outputs are above the threshold values are considered as abnormal activities. Also, black lines indicate the LL outputs, and corresponding frequency domain results are illustrated in Fig. S1-1 c(2). Particularly, some wavelet transformation results are not displayed due to out of the effect areas (the effect areas are indicated by the white dash line). Fig. S1-1 d(1-2) depicts the bandpass (3-7Hz) filter results. Only signal frequencies in the range of 3-7Hz can be passed for the next stage processing; the rests are strongly attenuated in time domain. The filter tap is 100 at recording frequency 50Hz condition.

2. Stages 2: Suppressor

The suppressor stage has two algorithms: bandpass filter with certain bandwidths and PID control system. Fig. S1-2 displays two algorithm results both in the time, phase (FFT analysis) and frequency domain (wavelet transformation).

A chirp signal (sweeps from 1Hz to 10Hz) is generated and were pre-stored into a waveform generator (Keysight 33500B) for testing purposes. Fig. S1-2 a(1-3) shows generated signals in time, frequency and phase domains. The phase shift algorithm (central frequency is set as 5Hz and shift degree is set as 180 degree) outputs are displayed in Fig. S1-2 (b): only signals with 5Hz are passed out, and the degree is shifted 180 degrees as shown at Fig. S1-2 (b3). Signal frequencies are not at 3-7Hz are strongly attenuated as Fig. S1-2 (b2) shows. Also, Fig. S1-2 c(1-3) displays a custom implemented PID controller result. With the parameter setting $k_p = 1, k_i = 10$ and $k_d = 0$ The signal phases are strongly disturbed due to system integrating behaviours shown at Fig. S1-2 (c3).
Fig. S1-1: The detection algorithm results. The input signals are shown at a(1-2) both in time and frequency domains. b (1-2) shows the simple threshold results in time and frequency domain, and the absolute threshold value is 0V. c(1-2) depicts the line length results in time and frequency domains. The LL entropy outputs and threshold are labelled in the red line. The window size N = 100, K= 30 and LL outputs threshold is 60. d(1-2) illustrates the bandpass filter results in time and frequency domains. The filter tap is 100 at recording frequency 50Hz condition.
Fig. S1-2: a(1-3) displays testing chirp signals (1-10Hz); b(1-3) shows the phase shift algorithm results. The central frequency is 5Hz, $k$ is 12.5 and $\phi$ is 180 degree. The filter tap is optimized at 50; c(1-4) illustrates a PID controller results. The parameter is $k_p = 1, k_i = 10$ and $k_d = 0$. The reference signal is set as 0 which indicates neural network is at standstill condition.

3. Stages 3-Optical Converter

The optical converter stage has three different algorithms: maximum threshold, Pulse Amplitude Modulation (PAM) and Pulse Width Modulation (PWM). Fig. S1-3 shows each algorithm simulation results.

Fig. S1-3(a) provides a pulse of maximum intensity (60 $/mm^2$) when the intervention is beyond a certain threshold 140; Fig. S1-3(b) and Fig. S1-3(c) displays PAM and PWM results: where each output is linearly transformed into a pulse amplitude (100-500) and pulse width (0-20ms) respectively.

Fig. S1-3: (a) Maximum intensity results, where the threshold is 140, and the maximum intensity is 60 $60 mW/mm^2$; (b) the pulse amplitude modulation results. The outputs are linearly transformed into LED DAC values between 100- 200; (c) the pulse width modulation results. The outputs are linearly transformed into pulses values between 0-20ms.
Supplementary S2 Low power cycling technology [1] [2]

One of the most important factors to take into account when developing an implantable device is power consumption. It is vital that the amount of operational hours of the device are significantly higher compared to the amount of time it takes for the device to recharge. Hence, not restricting the patient from any daily needs and activities. This can be achieved by developing efficient hardware with low power consumption, powering up parts of the hardware only when needed and by implementing energy harvesting techniques. In this document, the emphasis is on describing the implementation of power cycling on the microcontroller core utilised. In other words, keeping the microcontroller powered when performing specific tasks and placing it into a low power mode when not.

1. Specifications

<table>
<thead>
<tr>
<th>Mode</th>
<th>Normal</th>
<th>VLPS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Current consumption</td>
<td>11.96 mA</td>
<td>0.2 mA</td>
</tr>
<tr>
<td>Transition period enter (from normal)</td>
<td>-</td>
<td>20 us</td>
</tr>
<tr>
<td>Transition period exit (to normal)</td>
<td>-</td>
<td>20 us</td>
</tr>
</tbody>
</table>

The data was measure by using the developed embedded ASIC system. The presented data has been obtained by using a DC power analyzer (Agilent Technologies N6705B).

2. Power mode transition diagram

Figure S2-1 illustrates all the possible modes and transitions between them. The modes and transitions followed in this case have been highlighted in blue and red accordingly.

3. Low power technique overview

In order to achieve high performance within a constrained energy budget, a power cycling technique was used. More specifically, the Microcontroller Unit (MCU) cycles between the normal RUN mode and the Very Low Power Stop (VLPS) mode, achieving a significant reduction of static and dynamic power consumption. This is mainly achieved by disabling some of the peripherals and clocks on the MCU. Figure S2-2 illustrates the active (blue) and disabled (orange) peripherals when the MCU is placed in VLPS mode.

4. VLPS Methodology

4.1. Before Entering VLPS mode

Prior to entering the VLPS mode the corresponding registers and appropriate clocks have to be set and selected. In this work the LPTMR timer module was chosen as a wake-up unit. An interrupt occurring every 5 ms would wake up the MCU placing it back to normal RUN mode. The clock used to time the LPTMR module was LPO. Figure S2-3 illustrates an example diagram of all the available clocks in the MCU. In order for the LPO to be selected as the
LPTMR clock source, the PCS bits in the LPTMR_PSR register had to be set to 0b01. This has to be done while the LPTMR module is disabled.

There are three other clocks that can be chosen as an input to the LPTMR module. MCG (grey) is the Multipurpose Clock Generator and is the main clock module. The MCG clock is disabled when the MCU is in VLPS mode (except for when it is in debug mode), so the MCGIRCLK could not be used. As for the system oscillator, it provides the OSCERCLK_UNDIV clock, which comes from external crystal circuit or directly from EXTAL. In this work no external circuitry or input to the EXTAL pin was used, hence this clock could not be used. Finally, the RTC clock could have been made available through the ERCLK32K and selected, but again no external crystal circuit was used in this work. The System Integration Module (SIM) is responsible for controlling the clock selection and distribution of each clock in the VLPS mode.

Figure S2-1: Diagram of all power modes and transitions. The modes and transitions that have been utilised in this work are highlighted in blue and red respectively. When the system is powered up it enters RUN mode through a normal BOOT. Then the system cycles through the RUN and VLPS mode as described in section 4.
Feature 

**Figure S2-2:** The figure illustrates all the available peripherals in the MCU. When the system is in RUN mode all the peripherals (both orange and blue) are active. After entering VLPS mode only the blue peripherals remain active and all the peripherals with an orange background are disabled.

**Figure S2-3:** Example clock diagram of the MCU. The LPO timer in chosen as an interrupt source. The lines highlighted in red illustrate what part of the hardware is required in order for the LPO to be selected.

The next step is to allow the MCU to enter the VLPS mode. This is achieved by setting the SMC_PROT register.

```
SMC_PMPROT = SMC_PMPROT_AVLP_MASK;
```

Then, the VLPS mode is selected by setting the STOPM bits in the SMC_PMCTRL register to 0b010.

```
SMC_PMCTRL &= ~SMC_PMCTRL_STOPM_MASK;
SMC_PMCTRL |= SMC_PMCTRL_STOPM(2);
```

Following the stop mode selection and completing the initialisation phase is setting the SLEEPDEEP bit to 1.

```
SCB_SCR |= SCB_SCR_SLEEPDEEP_MASK;
```

### 4.2. Entering VLPS mode

After following the correct initialisation the Wait For Interrupt (WFI) instruction is executed, which causes immediate entry to the VLPS mode.

```
#ifdef CMSIS
    __wfi();
#else
    /* WFI instruction will start entry into STOP mode */
#endif
```
4.3. Exiting VLPS mode

While in VLPS mode the MCU can be woken up by an interrupt caused by the LPTMR module, provided that the correct initialisation and the appropriate clock source have been chosen, as described in 4.1.

References
[1] “Power Management for Kinetis MCUs When and how to use Kinetis low-power modes”, Document Number: AN4503, Rev. 2, 04/2015