A Survey of Emerging Interconnects for On-chip Efficient Multicast and Broadcast in Many-cores

Ammar Karkar¹,², Terrence Mak³, Kin-Fai Tong⁴, and Alex Yakovlev¹

¹School of Electrical and Electronic Engineering, Newcastle University, UK, Email: {a.j.m.karkar, alex.yakovlev}@newcastle.ac.uk
²IT Research Centre, University of Kufa, Iraq, Email: ammar.karkar@uokufa.edu.iq
³Electronics and Computer Science, University of Southampton, UK, Email: tmak@ecs.soton.ac.uk
⁴Department of Electrical and Electronic Engineering, University College London, UK, Email: k.tong@ucl.ac.uk

Abstract—Networks-on-chip (NoC) have emerged to tackle different on-chip communication challenges and can satisfy different demands in terms of performance, cost and reliability. Currently, interconnects based on metal are reaching performance limits given relentless technology scaling. In particular, a performance bottleneck has emerged due to the demands for communication in terms of bandwidth for multicasting and broadcasting. As a result, various state-of-the-art architectures have been proposed as alternatives and emerging interconnects including the use of optics or radio frequency (RF). This article presents a comprehensive survey of these various interconnect fabrics, and discusses their current and future potentials and obstacles as well. This article aims to drive the research community to achieve a better utilization of the merits of on-chip interconnects and addresses the challenges involved. New interconnect technologies, such as optical interconnect, wireless NoC (WiNoC), RF transmission lines (RF-I) and surface wave interconnects (SWI), are discussed, evaluated and compared. Consequently, these emerging interconnects can continue to provide the cost efficiency and performance that are highly demanded for future many-core processors and high performance computing.

Keywords—on-chip interconnects, network-on-chip, surface-wave, optical interconnects, wireless interconnects, transmission-lines, multicast, chip multiprocessors.

I. INTRODUCTION

Due to growing market demands, integrated circuit technology processes are scaling rapidly, causing an intensification of current and future systems-on-chip (SoC) in terms of transistor density and functional complexity. As a result, the number of integrated intellectual property (IP) cores inside a single SoC has increased dramatically, leading the research community [1], [2] and industry [3] to adopt NoC (networks-on-chip) as the underlying communication structure.

This is especially true for chip multiprocessors (CMPs), which were introduced to provide near-linear performance improvements when complexity increases (Pollack’s rule), while maintaining lower power and frequency budgets [4]. CMP performance and power consumption depend both on NoC and cache coherence protocols. Cache coherence protocols depend on a range of multicast (one-to-many, shortly 1-to-M) or broadcast (1-to-all) communication patterns [5], [6]. This type of traffic is projected to scale in terms of destinations, burstiness, and spatial distribution as the number of cores scale [7]. For instance, broadcast-based cache coherence protocols produce a relatively high multicast ratio over the total packet injection rate (PIR) of up to 52.4% [6], [5]. This could be catastrophic for global coherence and NoC performance unless the interconnect fabric supports 1-to-M communication. Therefore, there is a need to eliminate these constraints and improve performance by proposing interconnect architectures that support 1-to-M.

Relevant NoC studies have struggled to achieve 1-to-M latency and energy close to wire-latency and wire-energy [6], [5]. This will not be sufficient in the near future given the projected issues with regular metal-based NoC since these interconnect fabrics struggle to match the required scalability, especially for global communication in terms of latency and energy ($J/b$) [8], [9]. Some studies have proposed 3D-integration to ease the global communication issues by reducing the NoC hop-count. However, although promising, this technology faces various technical challenges such as process control requirements, wafer thinning, low TSV capacitance and design challenges [10], [8], [9]. 3D-integration will not be discussed further here, because it is beyond the scope of this research.

Thus, these wiring challenges have inspired many researchers to look for alternative interconnects, such as radio frequency (RF) interconnects (RF-I) [11], [12], [13], [14], wireless NoC (WiNoC) [15], [16], [10], [17], [18], [19], optical interconnects (ONoC) [20], [21], [22], [23] and surface wave interconnects (SWI) [24], [25], [26], [27]. However, these types of interconnects are facing variant challenges due to their complexity, power consumption and/or area overheads. This paper discusses the merits and drawbacks of these interconnects from a system-level design point of view. In addition, the focus is on the multicast architectures enabled by these interconnects, since multicast is a crucial requirement for CMPs ($\sim$ 100 cores) and toward many-cores ($\sim$ 1000 of cores). To the authors best knowledge, there has been no comprehensive review of emerging interconnects focusing on supporting multicast. Moreover, most survey papers discuss only one type of interconnect [21], [15] or a subset of emerging interconnects [10]. In contrast, this paper provides a comprehensive view of the current status of on-chip communication enabled by these emerging interconnects. This article will be:

- Presenting a comprehensive view of current knowledge of merits and drawbacks of emerging interconnects,
especially for interconnect architectures that support multicast. Subsequently, research can be inspired to utilize their advantages and addressing their challenges.

- Providing a system-level comparison of these promising types of interconnects. Especially in terms of matching communication functionality requirements and current under-layer of these fabrics technology challenges.

II. BACKGROUND

This section discusses the projected with issues in regular wire and highlights the multicast requirements for future many-core systems.

A. Wire Issues

The on-chip interconnect trend for decades has been relying only on the regular metal wire interconnect, which transmits the signal by charging/discharging the whole wire. The wires, also known as resistive and capacitance (RC)-lines, provide a cheap and easy to implement communication media. Although the interconnect fabric has changed from bus to NoC [1], [2], the under-layer media is still the same. Wires have been meeting the performance, power consumption, and area overhead requirements for intra-chip communication for many generations of technology. However, with the continuous scaling of CMOS technology, the projections of wire global communication does not seems promising.

Even though the global wiring length might remain the same or increase slightly, the wire thickness and spacing have been continuously decreasing with technology scaling down. This increases wire resistance and capacitance [8], [9]. Subsequently, wire delay increases because it is inversely proportional to wire resistance and capacitance [28]. Fig. 1 shows the increasing gap between gate delay and wire delay [8]. Moreover, Ho et al. predicted that the global and semi-global wiring delay for delivering 50% of the signal might exponentially increase [29]. The local wiring does not have this problem because, unlike global wiring; its length decreases with technology scaling, as shown in Fig. 1.

This latency will decrease the single wire bandwidth and overall interconnect throughput. The attempt to keep wire dimensions (thickness and spacing) constant regardless of technology scaling is known as fat wires. This approach has serious drawback, which is reducing the ratio of bit per area; and thereby aggregate bandwidth could be severely reduced comparing to the delay resulting from decreasing the wire geometry [29]. However, the industrial sector now using mixed wire geometry in different layers in the IC based on wire length and functionality to mitigate the delay problem [9]. Other solutions, such as introducing a new conductor and dielectric material with better physical characteristics [9] or using repeaters [30], [8], could postpone the problem for a few years but will be unable to meet future demands. For instance, some studies show that introducing repeaters for global and semi-global wires mitigates the delay problem by making the delay rise linearly with technology scaling [30], [8].

On the other hand, power consumption is also crucial issue facing regular metal interconnects. Magen et al. predict that on-chip interconnects will consume up to 80% of chip power [31]. This is mainly due to the projected increasing in global wire power dissipation and decreasing gate power consumption. In addition, the uses of repeaters to handle the delay issue scales power consumption even more. Therefore, some studies have been conducted to manage repeater placement and minimize the number and size of repeaters with an acceptable delay penalty [30]. The drawback of such solutions is that neither the power nor latency are optimal.

B. Multicast Requirements

In the literature, NoC conventionally treat 1-to-M traffic patterns as queued unicast traffic, which is referred to as software multicast [2]. This basic handling will have a dramatic effect on the NoC, for the following reasons: (1) 1-to-M increases congestion and thus creates a bottleneck on the source node of this traffic, such as the router, network interface and links, (2) causes poor quality of service (QoS) due to the queueing of repeated unicast packets on the same communication fabric, (3) power consumption is increased due to retransmitting the same data but to different destinations. As a result, even a small percentage of 1-to-M traffic will have severe effects on NoC performance and cost, as shown in Fig. 2b. Moreover, the number of destinations, burstiness, and spatial distribution of multicast traffic are founded proportional to the number of cores [7].

Cache coherence protocols depend on a range of 1-to-M communication patterns, such as multicasting invalidation requests (directory-based protocols) and broadcasting ordering tokens (broadcast-based protocols) [5], [6]. Cache coherence broadcast-based protocols such as token coherence offer less hardware overheads and delay than directory storage, which scales with the number of cores as well as offering relatively low latency compared to other cache coherence protocols [6], [32]. However, in these protocols the ratio of multicast to the total PIR is considered to be relatively high ranging from ~5% to 52.4% [6], [5]. For instance, Fig. 2a shows multicast ratios for a set of standard benchmark applications from PARSEC.
Fig. 2: (a) The non-trivial 1-to-M traffic percentage according to our simulation of a range of CMP benchmark applications (from PARSEC and SPLASH2) with MESI cache coherence protocol; (b) our 6 × 6 regular mesh NoC simulations with random traffic plus random traffic with a small percentage of multicast or broadcast (5%). The introduction of multicast or broadcast leads to severe deterioration in performance in terms of latency and saturation PIR.

TABLE I: Summary of reported key features for implementation of integrated optical interconnects.

<table>
<thead>
<tr>
<th>Technology node</th>
<th>Gbps</th>
<th>pJ/bit</th>
</tr>
</thead>
<tbody>
<tr>
<td>Meade et al. [39]</td>
<td>180nm</td>
<td>5</td>
</tr>
<tr>
<td>Dong et al. [22]</td>
<td>20</td>
<td>-</td>
</tr>
<tr>
<td>Cunningham et al. [40]</td>
<td>40nm and 130nm</td>
<td>10</td>
</tr>
<tr>
<td>Zheng et al. [41]</td>
<td>40nm</td>
<td>10</td>
</tr>
</tbody>
</table>

The interconnect infrastructure should support 1-to-M communication to cope with future many-core requirements, as mentioned earlier. Therefore, although the optical-interconnects does not offer a natural fanout feature, many studies have proposed optical interconnect architectures for multicast. These studies either suggest free-space or wave-guided optical interconnects. The free-space optical interconnect directs the signal using chip surface devices such as micro-lenses, micro-mirrors, diffractive optical elements (DOEs), laser sources and photo-detectors (PDs) [42], [21]. However, only a few studies have investigated the option of optical free-space for clock distribution, but not for data multicast [21].

On the other hand, waveguided optical interconnects have been thoroughly investigated and many state-of-the-art architectures have been proposed. These vary in topology and the on-chip devices that support them. For example, the tree-topology requires splitters and combiners to fork and join the optical signals [43], as shown in Fig. 3a. Another example is a bus-based topology that utilizes wavelength-division-multiplexing (WDM) and then uses a bank of microring modulators, which can be configured to listen to a selected channel.
source, photo detectors, modulators/filters, waveguides, and the optical signal [20], [21]. The main devices are laser fer signals from electrical to optical form and for routing area hungry and some times non-CMOS devices to trans-

complexity. This is due to the fact that they need expensive, have almost eliminate the CMOS compatibility challenge. Th is

Some of these devices, such as laser sources, might need lenses, photonic switching elements and splitters/combin ers. Also, depending on the interconnect architecture, other optical devices might be needed such as nanoscale mirrors, micro-

lenses, photonic switching elements and splitters/combiners. Some of these devices, such as laser sources, might need to be placed off-chip [23], [9]. This creates issues with manufacture complexity (such as packing and pin number requirements) and high coupling losses that might dominate the power consumption budget [9]. However, advances in More-

than-Moore options represented by silicon photonics devices have almost eliminate the CMOS compatibility challenge. This

B. Challenges

Despite of all the previously mentioned merits, optical interconnects faces significant challenges, mainly in terms of complexity, thermal regulation, and power budget requirements [21], [42], [44]. In terms of power consumption, there is debate over whether optical interconnects will reduce or increase overall power consumption. Many optimistic researchers [20], [45] argue that the absence of resistance loss and the assumption that quantum sourcing and detecting can be used in the future could require less J/bit than regular metal wires. In contrast, pessimists researchers question the potential power savings unless these interconnects are used for relatively long communication distances, since currently proposed optical devices are so power-hungry [23], [9]. Moreover, researchers have yet to tackle the extra power requirements for scalable multicast, since current devices decay the signal significantly.

Optical interconnects have other major challenge, which is complexity. This is due to the fact that they need expensive, area hungry and some times non-CMOS devices to trans-

fer signals from electrical to optical form and for routing the optical signal [20], [21]. The main devices are laser source, photo detectors, modulators/filters, waveguides, and laser-waveguide couplers in the case of off-die laser sources. Also, depending on the interconnect architecture, other optical devices might be needed such as nanoscale mirrors, micro-

lenses, photonic switching elements and splitters/combiners. Some of these devices, such as laser sources, might need to be placed off-chip [23], [9]. This creates issues with manufacture complexity (such as packing and pin number requirements) and high coupling losses that might dominate the power consumption budget [9]. However, advances in More-

than-Moore options represented by silicon photonics devices have almost eliminate the CMOS compatibility challenge. This

IV. WIRELESS INTERCONNECTS (WiNoC)

RF-based interconnects such as wireless interconnects or wireless NoC (WiNoC) appear to to be a cost-effective alterna-
tive compared to optical interconnects [15], [16], [10]. This is due to the fact that RF circuitry is compatible with CMOS tech-

nology and therefore less area and power-hungry. Many stud-
ies have proposed WiNoC solutions as either supplementary [17], [18], [19] or possible replacement [48] interconnects for regular wire-based NoCs. This type of interconnect basically transfers the electrical signal into an electromagnetic (EM) signal via the use of an integrated transceiver and antenna. This EM signal would propagate in one-hop via free space to the surrounding nodes in the coverage area at nearly the speed of light. In terms of physical channel bandwidth, predictions show an increase in transistor switching speed as CMOS technology scales down. This would enable the use of higher carrier frequencies [49], [8], [50]. As a result, a wide spectrum of frequencies up to the terahertz (THz) is possible, which is necessary to allow multi-channel realization at this shared media [15]. Moreover, these high frequencies would require an integrated antenna which is smaller in size. Table II reviews examples from literature of implemented integrated wireless communication systems. These wide range of studies shows the level of technology maturity of this type of interconnects.

A. Multicast Architectures

The WiNoC have natural scalable fanout capability which makes them preferable for 1-to-M enabled interconnect ar-

chitectures. As a result, many studies have suggested the
WiNoC for CMPs with multicast requirements [16], [57], [58]. However, the WiNoC fanout capability depends on the antenna radiation pattern and coverage distance, which are up to 23 mm [15]. This is due to high power dissipation of the RF signal in the free space propagation, which leads to a low coverage distance to power ratio. Therefore, the transceiver power amplifier and the antenna design should take into consideration the required distance and the directions of the destinations. For instance, some studies have proposed run-time tunable transmitting power based on the required destination [59].

In terms of connectivity, most researchers have proposed a virtually 1-to-all connectivity for each RF-transmitter node in the wireless interconnect layer [48], [16]. This does not mean that all the nodes are able to communicate with all other nodes simultaneously. However, these RF-transmitter nodes are competing over the shared media. Therefore, contention issues are a main challenge for such architectures. The other type of multicast architecture depends on NoC clustering, where each cluster, either statically or dynamically, would be listening for a specific carrier frequency [58], [57], as shown in Fig. 4. Thus, this clustering should mitigate contention, but would increase reconfigurability and routing complexity.

B. Challenges

WiNoC technology is considered to be one of the most mature emerging interconnect types since many implementations of WiNoC components such as integrated antennas and transceivers have been presented in the literature [60], [48], [53], [61]. However, so far, there are some challenges facing WiNoC. For instance, researchers are finding it difficult to design an antenna with wide frequency bandwidth, low power dissipation, larger coverage area and small area overhead [15]. Firstly, the WiNoC channel bandwidth is limited by the antenna operational frequency ($F_c$, the central resonance frequency) and the 3 dB bandwidth ($B$). For example, the 0.38 mm zigzag antenna whose transmission gain (S21) shows $B$ around 15 GHz [62]. Antenna percentage bandwidth ($B_r$) is inversely proportional to the operational frequencies, as shown by the equation:

$$B_r = \frac{F_1 - F_2}{F_c} \times 100\%$$

where $F_1$, $F_2$ are the starting and ending frequencies of the 3 dB bandwidth. For example, the zigzag antenna mentioned earlier has $B_r = 27\%$ [62], [15]. Thus, the WiNoC link might require a cluster of antennas with different central frequency and design characteristics in order to collectively provide the required frequency range. Other solutions include the use of antennas with high operational frequencies, such as in the THz range, where they would consume less area and have wider frequency bandwidth [59]. However, these solutions waste a large part of the frequency limited spectrum, which governed by the CMOS technology cut-off frequency. The second solution is a time multiplexing approach [63], which obviously decays the throughput of the channel. In terms of area overheads, integrated antennae are considered to be area-hungry passive components [15], [63]. However, antenna dimension is reversibly proportional to operational frequency. Therefore, with the scaling down of technology and the realization of THz, the area overhead could be effectively reduced [15], [63], [60]. Other solutions for antennae include the use of carbon nanotube [64] or planer graphene [16]. These techniques could improve power and area budgets and might allow to some extent a configurable resonance frequency. However, the implementation challenges of these technologies have yet to be addressed.

Other challenges facing WiNoC are related to channel reliability. Due to nearby circuitry, a noise could be injected into the transceivers or the antenna [60]. However, previous studies show that effective isotropic radiated power (EIRP) has almost negligible effects on adjacent circuits such as DRAMs [65] and analog-to-digital converters [66]. Moreover, many studies have addressed how to alleviate channel interference and error rates by adjusting transmitter power, in other word adjusting signal-to-noise-ratio (SNR) [67], [18]. In addition, the antenna is influenced by the chip packaging [60]. Therefore, these issues need to be carefully considered in transceiver and antenna designs.

V. TRANSMISSION LINES (RF-I)

The other alternative to electromagnetic free space signal propagation is waveguided propagation via transmission lines (TLs), which is known as RF-I [49], [12], [13], [11], [14]. These types of interconnects are similar to the WiNoC in
TABLE III: Examples of a reported implementations of integrated transmission lines along with their key features for a single link.

<table>
<thead>
<tr>
<th>technology node</th>
<th>Gb/s</th>
<th>pJ/bit</th>
<th>TL</th>
</tr>
</thead>
<tbody>
<tr>
<td>Chang et al. [67]</td>
<td>180nm</td>
<td>4-20 (predicted)</td>
<td>CPW</td>
</tr>
<tr>
<td>Chang et al. [69]</td>
<td>90nm</td>
<td>5</td>
<td>CPW</td>
</tr>
<tr>
<td>Hsu et al. [14]</td>
<td>90nm</td>
<td>-</td>
<td>modified CPW</td>
</tr>
<tr>
<td>Ito et al. [68]</td>
<td>90nm</td>
<td>8</td>
<td>0.3-0.9 CPS</td>
</tr>
</tbody>
</table>

terms of CMOS compatibility, close to the speed of light signal velocity, low global communication energy and high throughput compared to regular wires. As a result, many studies propose RF-I as a supplementary interconnect for the metal wire [49], [11]. Moreover, some studies have even discussed the possibility of replacing metal wire with RF-I [12]. These studies utilize the RF-I either as a special-purpose interconnect [12] or as general purpose express links [49], [11]. In terms of RF-I maturity, demonstrations of on-chip RF-I implementation have been presented in many studies [67], [14], [68], [69]. Table III presents key features of some recent on-chip implementation of RF-I in literature. Moreover, there are high-end chips that utilize global transmission lines for clock distribution already exist [70].

RF-Is require an integrated transceiver, similar to WiNoC, to transfer the electrical signal into an RF signal. However, instead of an antenna, the RF-I uses the on-chip transmission lines as waveguides to propagate the signal. Consequently, the RF-I has less power dissipation and less power consumption is required. There are three main types of on-chip TLs [12], [71], which are the microstrip line (MSL), the coplanar waveguide (CPW), and the differential line or coplanar strips (CPS), see Fig. 5. The MSL is known for its simplicity compared to the CPS and CPW, while the latter two show better robustness against crosstalk, especially in mm-waves [12]. Moreover, the CPS is known for its higher interconnect density compared to the CPW [12].

RF-I has the same WiNoC inherited limitation in terms of the cut-off frequency of the CMOS technology. However, designers have the option to have more than one shared media by adding more TLs. This would increase the aggregated data bandwidth [12], [49]. Moreover, unlike the WiNoC, frequency spectrum of RF-Is is not limited by the resonance frequency of the antenna and \( B_r \).

A. Multicast Architectures

Although the RF-I has a low ratio of power dissipation to signal propagation distance, RF-I-based multicast architectures are face several challenges. For instance, RF-I tree-topology forking requires stubs, which means an impedance discontinuity. Therefore, a careful matching circuit design is required at the end of each stub [28]. This would increase design complexity especially if the stub lengths and distribution of forking points are non-uniform. Therefore, to avoid using a tree of TLs, many designs have proposed a worm or cycle layout of these thick wires to pass through all the nodes, as shown in Fig. 6.

Fig. 5: Structure of the main three types of the transmission lines: (a) microstrip line (MSL), (b) differential line or coplanar strips (CPS), and (c) coplanar waveguide (CPW).

Fig. 6: Examples of some RF-I multicast architectures [11], [12].

This layout involves another another set of challenges such as adding nontrivial area overheads, signal decay and signal latency. Firstly, in terms of area overhead, the signal distribution in RF-I is limited to the nodes that transmission lines passes by them. As a result, the worm or cycle layout of these thick wires should go through almost every tile in the chip [49], [11], [12]. This might add nontrivial area overheads and on-chip routing issues because of pitch of the TLs (width and spacing) is relatively large. Secondly, this layout might mitigate but not eliminate the impedance discontinuity. Therefore, multicasting the signal to many destinations is not scalable because, with each drop point, the signal decay, latency and signal reflections are increased unless careful matching circuits are designed [14], [28].
multi-drop scalability and TLs discontinuity [12], [49], [68].

earlier. Therefore, many researchers have tried to mitigate

frequency instead of many segments of these costly wires [12],

necessary to utilize these costly wires by having multichannel

which raises the question of scalability in many-core proce-

justify this cost.

Thus, TLs require significant performance improvements to

whole chip in worm or cycle layouts, as mentioned earlier.

mission lines in a multi-layer design to reduce parasitic ef-

studies propose inserting metal pattern underneath the trans-

and cross-talk [14]. These costly wires might need to span the

metal dielectric to control parasitic effect [13]. Moreover, some

they have large capacitance and therefore require a wider inter-

high dimension wires have low resistance. However, these high
dimension wires have low resistance. However, they have large capacitance and therefore require a wider intermetal dielectric to control parasitic effect [13]. Moreover, some studies propose inserting metal pattern underneath the transmission lines in a multi-layer design to reduce parasitic effects and cross-talk [14]. These costly wires might need to span the whole chip in worm or cycle layouts, as mentioned earlier. Thus, TLs require significant performance improvements to justify this cost.

The second main challenge concerns the area overhead and interconnect density. These TLs are fabricated using the upperlayer of CMOS metal wires because of the thickness required. These high dimension wires have low resistance. However, they have large capacitance and therefore require a wider intermetal dielectric to control parasitic effect [13]. Moreover, some studies propose inserting metal pattern underneath the transmission lines in a multi-layer design to reduce parasitic effects and cross-talk [14]. These costly wires might need to span the whole chip in worm or cycle layouts, as mentioned earlier. Thus, TLs require significant performance improvements to justify this cost.

The third main challenge is the limitation of drop points, which raises the question of scalability in many-core processors with 1000s of cores [15], [57]. These drop points are necessary to utilize these costly wires by having multichannel frequency instead of many segments of these costly wires [12], [49], in addition to providing the fanout feature as mentioned earlier. Therefore, many researchers have tried to mitigate multi-drop scalability and TLs discontinuity [12], [49], [68].

VI. SURFACE WAVE INTERCONNECTS (SWI)

The Surface wave (SW) or Zenneck surface wave is an heterogeneous electromagnetic (EM) wave supported by a metal-dielectric surface. The designed surface is a waveguide that traps the EM signal in a two-dimensional media instead of three-dimensional free space. As a result, the E-field decay rate in the SWI from the source horizontally along the boundary is around \((1/\sqrt{d})\), as shown in Fig. 7, where \(d\) is the distance from the source [72]. This feature allows the SWI to offer relatively linear J/bit over this short distance compared to the high scaling of regular global buffered wire interconnects. The surface should be engineered by altering its dimensions, and the materials of the conductor and/or dielectric chosen so that the characteristic impedance \((Z_0)\) will be around \((10 + j300)\ \Omega\). Thus, the surface medium can consist of either a dielectric coated conductor layer or a corrugated conductor surface [73], [72].

On the other hand, a maximum transmission into the SW occurs when the incoming wave is incident at or close to the Brewster angle, where reflections are minimized. Therefore, the integration of a transducer linked to the transceiver is needed to launch the waved signal into the surface [73]. This can be as simple as, for omni-directional transmission, a coaxial to waveguide flange [72]. Also, it could be a dipole or monopole for omni-directional communication, with a parallel plate waveguide [74]. In the 3D EM simulation model shown in Fig. 7, an inverted quarter-wavelength monopole was used in experiments and simulation [73]. The transducer layer can be fabricated separately and then flip-chip bonding and the through-silicon-via (TSV) technique is used to connect it to the integrated transceiver. Recently, a laboratory experimental demonstration transferred data using two coaxial waveguide transducers and a designed corrugated aluminium sheet as surface wave has been presented [73].

A. Multicast architectures

The SWI interconnect offers natural efficient fanout features. For instance, the E-field decay rate in SW from the source horizontally along the boundary should be around \((1/\sqrt{d})\), as mentioned earlier. On the other hand, vertically, the decay is exponential away from the boundary. This allows less power dissipation for far larger coverage areas than the regular WiNoC since the signal is propagated up to 10cm [25] and 23mm [15], [63], respectively. This is due to the fact that RF wireless signals are dissipated via antennae and free space. However, both WiNoC and SWI signals are transmitted in all directions (over the surface for the SWI) at a speed close to the speed of light if we assume that the WiNoC antenna radiation pattern is circular \((360^\circ)\). Thus, SWI can fanout the signal across the chip in one clock cycle with competitive levels of power consumption and circuit complexity compared to other emerging interconnects [25]. As a results, some recent studies have proposed the SWI for NoC-based CMP multicast architectures [26], [27], [75], [76], as shown in Fig. 8.
TABLE IV: Summary comparison of a key features for current and emerging on-chip interconnects.

<table>
<thead>
<tr>
<th>Features</th>
<th>Metal wire [8], [29]</th>
<th>Transmission lines (RF-I) [49], [12]</th>
<th>Wireless interconnect (WiNoC) [15], [17], [8]</th>
<th>Optical interconnect [21], [20], [45], [8]</th>
<th>Surface wave interconnect (SWI) [25], [27], [26]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power</td>
<td>Dynamic power that is proportional to the wire capacitance and voltage.</td>
<td>High free space power dissipation.</td>
<td>High power consumption.</td>
<td>Power consumption is relatively tolerable.</td>
<td></td>
</tr>
<tr>
<td>Signal Decay</td>
<td>Limited by latency, which increases exponentially without repeaters.</td>
<td>Low signal decay and dissipation.</td>
<td>High decay, inversely proportional to distance.</td>
<td>Very low signal decay and dissipation inversely proportional to square root of distance.</td>
<td></td>
</tr>
<tr>
<td>Reliability</td>
<td>Possible cross-talk exists.</td>
<td>Cross-talk exist (capacitor and inductor coupling).</td>
<td>Noise coupling to the antenna and possibility of multi-path interference.</td>
<td>High signal integrity.</td>
<td>Less subject to noise coupling.</td>
</tr>
<tr>
<td>Fan-out</td>
<td>Needs extra power for multi-drop bus (stubs) and lowers propagation velocity.</td>
<td>Stubs cause impedance discontinuity, which will lead to signal reflection.</td>
<td>Limited by transmission signal propagation cover area only.</td>
<td>Require optical splitters and combiners that decay the optical signal (3dB per splitter).</td>
<td>Limited by transmission signal propagation cover area only.</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>Limited by interconnect delay; thus, bit rate is dependent on distance.</td>
<td>Limited process technology transistor cut-off frequency, which is currently 100 to 200 Gbps.</td>
<td>Limited process technology transistor cut-off frequency, which is currently 100 to 200 Gbps.</td>
<td>Very large bandwidth with multi-wavelength capability up to 500 Gbps.</td>
<td>Limited process technology transistor cut-off frequency, which is currently 100 to 200 Gbps.</td>
</tr>
<tr>
<td>Complexity</td>
<td>Need repeaters for cross-chip communication that consume transistors, via and restrict floor planning. However it is still the cheapest and simplest interconnect.</td>
<td>Medium complexity required: (1) integrated transceiver, (2) wide thick wires and spacing (12-45µm), (3) may require shielding wires and plans to overcome coupling, (4) matching circuits in case of forking path.</td>
<td>Medium complexity required: (1) integrated transceiver, (2) integrated antenna or cluster of antennae based on the required bandwidth and the operational frequency.</td>
<td>High complexity and some devices are not CMOS compatible, required: (1) laser source, (2) photo detectors, (3) modulators and filters, (4) waveguide, (5) laser-waveguide couplers in case of off die laser source, (6) nanoscale mirrors, (7) splitters/combiners.</td>
<td>Medium complexity, required: (1) integrated transceiver (2) integrated designed surface (3) integrated transducer.</td>
</tr>
</tbody>
</table>

B. Challenges

The SWI is considered to be one of the newest emerging interconnects. Therefore, the potentials of this emerging technology requires research to tackle a set of design and implementation challenges at different levels in order for it to be utilized in future NoC. Firstly, in terms of component integration, the realization of the SWI require some 3D integration techniques to link the transceiver to the transducer, such as TSV and flip-chip bonding. These 3D integration techniques are an active research area and face a number of manufacturing and factory integration challenges such as advanced process control requirements, thinning the wafer, low TSV capacitance, and design challenges [8]. However, great progress is being achieved in these areas and a number of solutions can be offered for each problem [8].

Secondly, in terms of communication and RF engineering, careful consideration is required in the design of the integration level of the transceiver, the surface, and the transducer. Otherwise, the SWI may pick up noise signals from any nearby integrated devices such as power distribution networks, processing elements, and other different interconnect components. This interference could affect either the transceiver or the waveguide surface. The impact on the transceiver can be addressed using techniques similar to those in WiNoC, which were mentioned earlier. However, it requires less signal-to-noise (SNR) due to the fact that SWI has less signal power dissipation. In terms of interference affecting the designed surface, there are two points that highly question any possible interference. The first point is the spacing and isolation between the surface and the integrated circuits. The second point is the reflection of any RF signal unless this signal is incident at or close to the Brewster angle [25], [73].

VII. COMPARATIVE SUMMARY

Table IV presents a summary comparison of key features that will be crucial in future interconnect architectures. Power consumption is the main limitation for future interconnects, especially after projections which show that interconnect fabrics might consume non-trivial percentage of the whole chip power consumption [31]. As shown in Table IV, RF-based interconnects that use waveguides have relatively low power consumption since they neither require power-hungry devices nor involve high power dissipation. In terms of signal decay and reliability, optical interconnects signal integrity is superior to other interconnects. The second best to ONoC in terms of reliability is the SWI. This is due to the fact that, unlike the RF-I, the designed surface waveguide is almost immune to interference from nearby circuitry. On the other hand, the WiNoC and SWI show remarkable natural fanout features compared to other emerging interconnects. As mentioned earlier, this feature is crucial for scalable multicore architectures in future many-cores processors, especially since 1-to-M and 1-to-all traffic PIR, size, and capability of creating hotspots could increase with the increase of number of cores.

With projected scaling in number of CMPs cores and the size of their communication, interconnect bandwidth is considered one of the main requirements of future many-core processors. All RF-based interconnects are limited by the cut-off frequency of CMOS technology. However, the cut-off frequency will continue scaling with technology. On the other hand, as mentioned earlier, antenna operational frequency and relative bandwidth are further limits the WiNoC channels data bandwidth. For instance, the 0.38 mm Zigzag antenna has a transmission gain (S21) that determine B to be around 15 GHz [15], as shown in Fig. 9b. In contrast, Fig. 9a
shows the SWI transmission gain (S21) with a much wider frequency spectrum [73], [72]. On the other hand, optical interconnects surpass other emerging interconnects in term of aggregated bandwidth. However, it complexity due to the many non-CMOS-compatible and/or expensive devices makes it a costly solution as shown in Table IV. Unlike other RF-based emerging interconnects such as WiNoC, RF-I, and SWI.

VIII. CONCLUSION

This paper has presented a set of radical solutions in terms of on-chip interconnects to meet future demands. These interconnect fabrics have been discussed in terms of future on-chip interconnect requirements from a system-level abstract such as bandwidth, reliability, power consumption, and fanout. The latter feature has been the main focus since providing multicast communication is one of the crucial demands of interconnects fabric for the future many-core systems. Based on this comprehensive review, it is concluded that RF-based interconnects proposed so far, such as the WiNoC, RF-I and SWI, might be cost-effective solutions for the near future compared to optical interconnects. Moreover, although all RF-based types seem very promising, the WiNoC and SWI seem to have more potentials for multicast architectures due to their merits in terms of fanout. In addition, the SWI is superior to WiNoC in terms of power dissipation and a wider frequency spectrum, whereas WiNoC technology maturity surpasses the technological maturity of the newer SWI. As a result, further research is required to harvest the potentials and eliminate the challenges of all of these emerging interconnects as we enter the many-cores era.

ACKNOWLEDGEMENT

This work was partly supported by EPSRC, Programme Grant PRiME (EP/K034448/1). Moreover, The first author would like to thank the HCED in Iraq for financing his Ph.D.

REFERENCES


**Kin-Fai Tong** received the BEng(Hons) and PhD degrees in Electronic Engineering from the City University of Hong Kong. He worked as an Expert researcher in the National Institute of Information and Communications Technology (NICT), Japan, where his main research focused on photonic-integrated millimetre-wave planar antennas for Gbits wireless communication systems. Dr Tong is now a senior lecturer at the Department of Electronic and Electrical Engineering, University College London (UCL). Early in 1994, he has been credited to be one of the first who introduced the idea of integrating microstrip patch antennas into mobile phones. Moreover, he pioneered in developing Finite Difference Time Domain (FDTD) models for the investigations of the ultra-wideband behaviour of U-slot microstrip patch antennas. The works have been cited for more than 800 times by peer researchers. Dr Tong was TPC member, session organiser and chairman of many international antennas and microwaves conferences. He has co-authored two book chapters on planar antenna designs and is author or co-author of over 90 publications.

**Professor Alex Yakovlev** DSc, FIET, SMIEEE (AY, UoN) founded and leads the MicroSystems Research Group, and co-founded the Asynchronous Systems Laboratory at Newcastle University. He was awarded an EPSRC Dream Fellowship in 2011-13. He has published 8 edited and co-authored monographs and more than 300 papers in academic journals and conferences, most of which are in the area of concurrent and asynchronous systems. He has chaired program committees of several international conferences in this area, including the IEEE Int. Symposium on Asynchronous Circuits and Systems (ASYNC), Petri nets (ICATPN), Applications of Concurrency to Systems Design (ACSD), and he has been Chairman of the Steering committee of the Conference on Application of Concurrency to System Design since 2001. He has been principal investigator on more than 25 research grants and supervised 40 PhD students.