Home > Electronics Engineer Magazine > Communications
PC shipments rise to 69.9m units
Intel adds momentum; AMD makes long-term gains in Q1 microprocessor market, according to iSuppli Corp
US IPTV subscribers nearly quadruple in 2007
But Internet Protocol Television is not stealing customers from satellite television in the Americas region – at least for now, according to survey
Touch screens are display touchstones
Touch screens have the Midas touch for growth, spurring a flood of competition, technologies and OEM interest
RFID in 2008: where is the action?
Predictions of a $5.29 billion RFID market in 2008, up 7.3 per cent on the $4.93 billion in 2007
Exploring quasi-resonant converters for power supplies
Jon Harper looks at how equipment makers can bring the efficiency advantages and lower EMI of quasi-resonant power conversion to lower power systems
LED backlights to take over Notebook PCs
Nintey per cent of large-sized LCD notebook-PC panels shipped in 2012 will employ LEDs to backlight their displays
Fig. 1. Hardware architecture trend in I/O subsystems.



A clockless crossbar switch helps improve embedded system designs

There is a rich and long-standing debate in the literature over the advantages and disadvantages of asynchronous logic. Asynchronous merely means not synchronous, and there are many different types of asynchronous (or clockless) design styles.

Fulcrum identified the I/O subsystems of networking equipment as a good technology and market fit for a blended synchronous-asynchronous IC. The functions in networking subsystems can be sufficiently complex that they involve many chips or even many circuit boards working together. It is often impractical to make these whole systems synchronous.

Fig. 1 shows a hardware architecture trend in I/O subsystems of networking equipment. Up until the Gigabit generation of networking equipment, it was common to build I/O subsystems based on bus architectures. The data rates of these systems are limited by electrical load from bus fan-out and bus-clock distribution to multiple chips on the board.

As market demand drove data rates higher, higher speed point-to-point protocols emerged, first based on LVDS and then eventually on SERDES technology, and the bus architectures became less common. The connectivity of devices on the board has become more limited. Currently, it is most common for new networking ICs to have a single, or at most two, dual-simplex interfaces. This limits the connectivity to a cascade of devices.

The emergence of standard products instead of ASICs to meet advanced networking functions, such as framers, network processors (NPUs), embedded processors, alook-aside' devices, and traffic managers impacts a designer's interconnect choice. When a system architecture is required to use standard devices, there often needs to be a higher device count, and more complex traffic flows through the devices.

In synchronous systems, a significant percentage of the clock period is dedicated to margin during which there is no useful logical computation. Examples are clock skew and jitter margin, manufacturing margin, and design margin. Design margin can be the largest; it is the difference between the worst-case paths in the clock domain, which define the clock period, and the average-case paths.

In contrast, delay-insensitive asynchronous systems do not require the logic to have settled by a certain time, and thus there is no needed margin in the cycle time, yielding an average-case performance. In 0.13µm and smaller technology nodes, the average-case performance is significantly better than the worst-case performance.

Fulcrum's clockless technology is a form of quasi-delay insensitive design that relies on four-phased handshake signaling. This is an asynchronous channel consisting of dual-rail data wires and an acknowledge wire.

Network switching

Fulcrum's PivotPoint chip is a blended synchronous-asynchronous system-on-a-chip with both synchronous and asynchronous methodologies used in the chip, depending on the region of the circuit. This is the first generally available commercial product in which the majority of the circuitry is high-performance asynchronous logic. The PivotPoint chip switches the SPI-4.2 protocol for 10Gbit/s networking applications. The protocol interface logic for SPI-4.2, JTAG, and a generic CPU bus are synchronous, while the internal switching element and channelised FIFO elements are asynchronous.

To serve as a blade-level interconnect, PivotPoint was designed to meet a number of requirements: No head of the line (HOL) blocking. Blocking occurs when one packet cannot make progress, and it blocks packets behind it that are unrelated. Channelised interfaces. Virtual channels, sometimes called ports, on the interface are used to transport separated traffic flows. Virtual channels have distinct hardware resources, buffers and flow control mechanisms, but share a common physical interface. A channelised device is said to be free of HOL blocking if a blocked packet only disrupts packets behind it on the same channel. Channel mapping. Any channel on any interface may be mapped to any channel on any other interface.

This allows complex flows through multiple devices: packets may skip devices and packets may flow back to devices that they have already visited. Low latency cut-through switching. Introducing a switch device into an architecture will increase the latency on all flows that pass through the switch. A switch that disregards the latency budget may break the architecture. Cut-through switching is where a packet may leave the switching device before the whole packet has entered the switch.

Low latency requirements translate into cut-through switching requirements for large packets. Separate clock domains. Point-to-point protocols such as SPI-4.2 are only synchronous from one chip to another. It is often not feasible to use the same clock reference across multiple point-to-point links, leaving the links in different phase domains.

The architecture of PivotPoint is partitioned into three structures: a system-on-a-chip (SoC) interconnect, a SPI-4 protocol block, and a FIFO and transaction manager. Fulcrum built the SoC interconnect in a previous project, and the SPI-4.2 protocol block is an implementation of the commercial standard in which the SPI-4 queues are located outside of the block.

The FIFO manager provides the queuing and bridges the SPI-4 protocol block and the SoC crossbar, and adds the necessary logic to meet the features of PivotPoint, such as SPI-4.2 flow-controlled queues, port mapping, non-HOL blocking, and flow-control linking between the chips' ingress and egress SPI-4.2 domains. The architecture is partitioned to take advantage of both synchronous and asynchronous design styles.

From a protocol perspective, SoC interconnects are generally difficult to clock-gate, so Fulcrum developed a low-level SoC interconnect to handle this, called Nexus. The interconnect was made generic in anticipation that it would be used in simple systems, or as a low-level building block for more complex systems, in which additional logic is added in later design phases.

At the lowest protocol levels, SPI-4 is fully synchronous: transmission occurs at a fixed frequency, data and status channels are at locked multiples, transmission is uninterruptible within a segment, and there is a well-defined, deterministic protocol for how bits on one side of the interface translate into bits on the other at each clock. For these reasons, the SPI-4 interface logic for PivotPoint was built in a standard synchronous ASIC flow.

We decided to implement the FIFO managers with asynchronous logic. The rationale is similar to that of the Nexus. The system environment fundamentally imposes non-determinism on the switch, and that non-determinism is exposed at the crossbar. In fact, if the FIFO managers were synchronous, many of the advantages of implementing the Nexus with asynchronous logic would be lost, since low latency and high-speed circuitry need to extend into the FIFOs.

PivotPoint uses asynchronous SRAM. This is a controversial design for a small fabless semiconductor company, given the analogue challenges of designing memory and prevalence of vendor-provided memory IP in the industry. SRAM must be dense, and a constant concern of asynchronous design is area overhead.

However, it was argued above that the high-speed low-latency circuitry must not just be implemented in the crossbar, but extend all the way into the memories, to establish the needed overspeed of the FIFO Managers. The memory uses the foundry-developed SRAM state bit, enjoying foundry-optimised area and yield

Fig.2 shows the PivotPoint architecture. The chip consists of a datapath and a control path. The datapath has 6SPI-4 interfaces, 6FIFO managers, and the Nexus crossbar interconnect. Routing decisions in the crossbar are made through arbitration of requests that come from the datapath SPI-4 units. The control path has a generic CPU bus interface, the JTAG interface, and a control distribution network, organised with a packetised, serial protocol.

The control path and the datapath connect only through control status registers during operation, and through the scan points during test and debug. The control path configures the channel map, but is not involved in the routing decisions of SPI-4 segments.

Though the circuit has 14 separate clock domains, it is largely asynchronous. The chip (Fig.3) has about 32million transistors, and 83percent of them are in asynchronous logic domains. This includes the Nexus and high-performance memory. It has 192KB of memory. The synchronous interfaces receive data up to 450MHz. That is two times greater than the internal synchronous datapaths, the synchronous logic has been synthesised to 225MHz. The asynchronous circuitry operates above 600MHz in the SRAMs and above 1GHz in the crossbar in a standard 0.13µm process at 1.0V of operation.

Uri Cummings is Co-Founder & Vice-President of Product Development, Fulcrum Microsystems, Burbank, Ca, USA. www.fulcrummicro.com