# Achieving Less Than 2% 3-σ Mismatch with Minimum Channel-Length CMOS Devices

Vishal Gupta, *Student Member, IEEE*, and Gabriel A. Rincón-Mora, *Senior Member, IEEE* Georgia Tech Analog and Power IC Design Lab

Abstract-Offset and speed are critical yet conflicting design parameters in high-speed amplifiers and comparators, especially those used to process the characteristically high-frequency, lowamplitude signals of today's wireless transceiver systems. As device area is decreased to reduce parasitic capacitances and hence achieve higher bandwidth, random mismatches inherently increase. The proposed Survivor strategy circumvents this tradeoff by fabricating a number of small-geometry device pairs on-chip (each of which have high bandwidth) and having the IC self-select the best-matched set of devices during start-up and/or power-on-reset events, and use them in critical portions of the circuit. In the experiments conducted on a prototype fabricated using a 0.6µm-CMOS technology, a mirror using the bestmatched, minimum channel-length pair chosen from a bank of 32 pairs (6µm/0.6µm) had a 3-o offset performance (1.94%) similar to that of a mirror using 48µm/4.8µm devices (1.91%) and hence a bandwidth that was 64 times higher (BW<sub>6/0.6</sub>  $\approx$  64BW<sub>48/4.8</sub>).

Index Terms-offset, mismatch, calibration, trimming.

# I. INTRODUCTION

Maintaining system bandwidth while obtaining low offsets is a major design challenge for high speed electronics, where critical differential pairs and current mirrors are designed with large geometry devices to mitigate the effects of random process- and package-induced mismatches [1-2]. Unfortunately, the parasitic poles introduced by these large area components superimpose low bandwidth and long settling time limits to the system [3-5]. Other than increasing device size, trimming and/or dynamic switching solutions are viable but often less appealing options.

Trimming with fuses, laser, and floating-gate devices [6-8] are typical in precision analog electronics. Trimming is performed before the packaging process, when more cost effective, or after, when package-shift effects can also be addressed. Although its effectiveness cannot be denied, the increase in manufacturing time and equipment costs (e.g., laser) is often prohibitive for low cost solutions [6, 9].

The popularity of switching techniques like dynamicelement-matching (DEM) is growing because of their effectiveness in the absence of trimming. In these schemes, mismatch effects are reduced by periodically interchanging the electrical positions of critical devices (with a clock and a simple switching network) and thereby equally averaging their mismatch effects on their relevant circuit nodes [9-12]. The

G. A. Rincón-Mora is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250 (email: rincon-mora@ece.gatech.edu). associated switching noise, however, is a significant drawback. The fluctuations induced by periodically switching devices and interchanging offsets can severely degrade system performance. As a result, these strategies require the use of large capacitors, which not only limit bandwidth but also incur severe silicon real-estate demands on the system.

In light of these costly and noisy alternatives, the Survivor technique is proposed because of its ability to achieve high bandwidth, good matching performance, and low noise at low cost. The following sections detail the basic strategy, its implementation, experimental results, and final conclusions.

## II. PROPOSED SURVIVOR STRATEGY

# A. Basic Concept

The heart of the Survivor strategy lies in identifying the best matching pair of devices from a bank of minimum feature-size transistors during start-up and/or power-on-reset events (Fig. 1), which is similar in philosophy to [13], except it does not require an accurate reference and DAC, or a memory bank. The best pair is then placed in critical sections of a circuit, like the input stage and load mirrors of an operational amplifier. Well-matched, minimum-sized devices yield both high bandwidth and high accuracy.



**Fig. 1. Block diagram of the Survivor strategy.** A bank of pairs, each of which is assigned a unique digital code, is fabricated on-chip. Every time the system starts up or resets, a digital engine connects two pairs from this bank to a high resolution current comparator via a set of switches. The comparator then determines which of the two connected pairs has higher offset (worse mismatch). The digital engine processes the output of the comparator to discard the pair with the higher offset (the *loser*) and connects another pair from the bank in its place. This new pair is then compared to the *winner* from the previous cycle, and so on. After processing all pairs

in the bank, the winner of the last cycle (the survivor) is the

pair with the least mismatch.

Manuscript received March 30, 2006.

V. Gupta is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250 (phone: 404-894-1299; e-mail: vishalg@ece.gatech.edu).

# B. Circuit

Comparator: The most important component of the Survivor scheme is the comparator because its resolution determines the matching performance of the winner of every cycle and ultimately the surviving pair. The proposed comparator (Fig. 2) is a variation of the differential difference amplifier discussed in [14] and is comprised of accurate current-mirror MP1,2, well-matched current sources I<sub>BIAS1.2.3</sub>, gain stage MP3 and INV, and a transition-detect block. The two pairs of devices to be tested are first placed in Positions A and B and fed to accurate current-mirror MP1,2. The offsets of these two pairs ( $\Delta I_1$  and  $\Delta I_2$ ) determine the state of inverter INV (i.e., V<sub>OUT</sub> is low if MN12 and MN22, when connected together, conduct more current than their respective counterparts). In the second phase, the connectivity of one of the pairs is reversed, and a resulting state change implies the offset of this reversed pair is dominant  $(|\Delta I_2| > |\Delta I_1|)$ ; otherwise, no state change occurs. This state-reversal result is then used to select which pair to discard, and to allow the next pair to take a position, after which point another pair can be processed.



The accuracy of this circuit is key and is dependent on the matching performance of IBIAS1,2,3 and devices MP1,2,3. The overall input-referred offset resulting from a current density mismatch in MP3, which is dependent on how well MP3 matches MP1,2 and I<sub>BIAS3</sub> matches I<sub>BIAS1,2</sub>, is minimal because it is divided by MP3's transconductance and the voltage gain of the first stage, which is on the order of 30-40 dB [3]. Offsets in mirror devices MP1,2 and IBIAS1,2, however, are virtually unattenuated when referred to the input, which is why a dynamic element-matching scheme is used for both sets of devices. DEM nearly eliminates their mismatch effects by exposing the offset to both sides of the mirror (MP1,2) and both pairs (I<sub>BIAS1.2</sub>) equally. This is achieved by exchanging the connectivity of MP1,2 and IBIAS1,2 several times, with every clock cycle, and therefore, over time, averaging their overall effects to zero. This averaging (low pass filter) function is performed by capacitor CM, whose Miller effect enhances its filtering capabilities [10].

A large filter capacitance, on one hand, improves the offsetcanceling capabilities of DEM, but on the other, increases the comparator's propagation delay, that is, the processing time for each comparison and consequently the system's overall start-up time. As a result, the DEM frequency should be high, but not enough to degrade the matching accuracy of the mirror and the comparator with switching noise (charge injection and clock feed-through) [10]. In simulations, the proposed comparator displayed a worst-case resolution and settling time of  $300\mu$ V and  $200\mu$ s with a 25kHz DEM frequency and 10pF filter capacitor. Since the system is in start-up or reset mode when DEM is engaged, however, the disadvantages associated with DEM (switching noise and low bandwidth) have no effect on system performance, when the best matched pair (i.e., survivor) is already selected and the comparator is off.

The transition-detect block detects a change of state by storing the result of the first phase comparison in a 1-bit latch, before the onset of the second phase. The result of the second phase is then compared with that of the first via the XOR gate. If the two states differ, the output of the gate is high, which is the code for a state reversal.

Digital Controller: The block diagram of the Survivor system is presented in Fig. 3. For simplicity, the switching network used to implement DEM is omitted and only 4 pairs of devices are used in the bank. Each comparison cycle consists of four phases synchronized by a clock-driven shift register. In the first phase, the output of the comparator is stored in latch D-LTCH1. On the falling edge of this phase, switch network S1 interchanges the terminals of the pair at Position B. The new output of the comparator, produced during this second phase, is compared with the contents of the latch via the XOR gate, as described in the comparator section.

In the third phase, the output of the XOR gate is sampled by latch D-LTCH2, which drives demultiplexer DEMUX. DEMUX is fed by an M-bit counter (CTR), whose output corresponds to one among the 2<sup>M</sup> pairs in the bank. As a result, when CTR toggles on the falling edge, the new code corresponding to the next sequential pair in the bank is routed through DEMUX to either decoder DEC A or DEC B (which control the switch network of Positions A and B) in the fourth phase to replace the code corresponding to the loser, leaving the winner code intact. During start-up, the phase generator is initialized to Phase 4, CTR is set to the code of the second pair, and D-LTCH2 is set to 1 (to allow the counter to drive one of the decoders), which place the first two pairs in the bank to Positions A and B. Since the counter sequentially and monotonically increments its output up, and its result connects a pair to one of the two positions, there is no chance that a pair will ever be connected to both positions.

#### C. Simulations

The system shown in Fig. 3 was simulated using BSIM3 models of AMI's  $0.6\mu m$  CMOS process for transistors and switches and AHDL macromodels for digital blocks, and the bank of devices consisted of eight pairs. An artificial inputreferred offset was added to each pair, as shown in Table 1. Going down the list on Table 1 sequentially, the Survivor strategy should (and the simulation results of Fig. 4, which shows the binary code corresponding to the winner of each cycle corroborate this) keep the lowest offset pair, i.e., Pairs 1, 2, 3, and finally 5, which has the lowest offset.



Fig. 4. Simulations results showing the digital code of the winner of each cycle with convergence to Pair 101 (Pair 5).

### **III. MEASUREMENTS**

# A. Prototype

Mirrors are widely popular analog building blocks (used in almost all comparators and amplifiers) and their matching performance improves with increasing device areas, which inherently results in bandwidth degradation. In an IC prototype, the effectiveness of the Survivor strategy in improving the 3- $\sigma$  matching performance of a mirror constructed of devices having minimum-channel length (0.6µm) and a width-to-length (W/L) ratio of 10 was fabricated with AMI's 0.6µm-CMOS process, measured, and evaluated (die photograph is shown in Fig. 5). The

The functions of the counter, which generates the digital code corresponding to the next pair in the sequence, and demultiplexer, which routes this code to decoders DEC A or DEC B based on the comparator's output, were manually carried out for ease of testability. Along with these decoders, a third decoder DEC M was fabricated to connect the survivor pair, once determined, in a current-mirror configuration through another switching network to then measure the offset performance of the survivor. Finally, mirrors using devices having the same W/L ratio (10) as the candidate pairs but 3x, 5x, 8x, and 10x the channel length (i.e., dimensions of 18µm/1.8µm, 30µm/3.0µm, 48µm/4.8µm, and 60µm/6.0µm) were also fabricated on the IC to gauge the bandwidth advantage of the surviving small-geometry mirror. The experiments were performed on 30 samples to increase the statistical validity of the measurements.

## B. Test Setup and Procedure

In the test setup (Fig. 6), the switching frequency of clock CLK1 for DEM was set to 25kHz and CLK2 for swapping the terminals of the pair at Position B to 5kHz. The inputs of DEC\_A and DEC\_B were set with external Single Pole Double-Throw (SPDT) switches and the output of the comparator was then monitored with an oscilloscope. If the output toggled with CLK2 (if the pair at Position B has higher offset), DEC\_B was reprogrammed with the code of the next pair in the sequence. If the output remained unchanged (the pair at Position A has a higher offset), DEC\_A was reprogrammed with the code of the next pair. This procedure was repeated until the last digital code was reached (Pair 32: 1111). The winner of this last comparison is the Survivor.



#### Fig. 6. Experimental setup.

DEC\_A and DEC\_B were then disabled and the survivor code was programmed into the inputs of DEC\_M, which connected this pair in a current-mirror configuration. The offset of this current-mirror was determined by using a semiconductor parameter analyzer, forcing a known current of  $15\mu$ A to the input of the current-mirror and measuring its output current to extract the offset (mismatch) of the pair. The drain-source voltages of the current-mirror pair were equalized to eliminate the effects of channel-length modulation on offset. Finally, DEC\_M was disabled and the offsets of the four large geometry mirrors were measured.

# C. Experimental Results

To verify if the Survivor strategy was indeed converging on the pair with the best matching performance, the offsets of the pairs in the banks of 5 samples were measured by connecting each of them in a current-mirror configuration. The experimental offset measurements of one such sample are presented in Table 2, showing Pair 19 as the best matching pair. Fig. 7 shows how the experimental code progression of the Survivor strategy converges on Pair 19, the survivor. When measured over 30 samples for this technology, the 3- $\sigma$ offset performance of a minimum channel-length pair was 22.2% whereas the survivor was 1.9% (Fig. 8), roughly an order of magnitude improvement.

Fig. 9 illustrates how the statistical matching performance relates to the number of devices placed in the bank and how a single but larger device compares  $(3-\sigma \text{ range for a } 95\%$ 

confidence interval is also shown, which can be decreased by increasing the number of samples). The results show that the survivor of 32 (6/0.6) pairs displays the matching performance of a (48/4.8) pair while retaining the speed of a (6/0.6) pair, which amounts to a 64x bandwidth improvement (in the 95% confidence range, the survivor performance at worst and at best is 5x and 10x the device size). Decreasing the geometry of one device from (60/6.0) to (6/0.6) degrades its matching performance from 1.72% to 22.23% (Table 3).

| TABLE 2. MEASURED OFFSETS OF CURRENT-MIRROR PAIRS IN A SAMPLE IC. |               |      |               |      |               |      |               |  |  |
|-------------------------------------------------------------------|---------------|------|---------------|------|---------------|------|---------------|--|--|
| Pair                                                              | Offset<br>[%] | Pair | Offset<br>[%] | Pair | Offset<br>[%] | Pair | Offset<br>[%] |  |  |
| 0                                                                 | 13.2          | 8    | 2.6           | 16   | 9.7           | 24   | 5.1           |  |  |
| 1                                                                 | 6.7           | 9    | 10.3          | 17   | 6.8           | 25   | 8.3           |  |  |
| 2                                                                 | 2.9           | 10   | 17.6          | 18   | 8.1           | 26   | 15.9          |  |  |
| 3                                                                 | 0.8           | 11   | 7.1           | 19   | 0.2           | 27   | 5.5           |  |  |
| 4                                                                 | 3.8           | 12   | 1.9           | 20   | 8.1           | 28   | 1.8           |  |  |
| 5                                                                 | 4.1           | 13   | 0.5           | 21   | 4.0           | 29   | 9.9           |  |  |
| 6                                                                 | 4.7           | 14   | 1.5           | 22   | 16.0          | 30   | 4.3           |  |  |
| 7                                                                 | 5.8           | 15   | 4.3           | 23   | 2.0           | 31   | 7.3           |  |  |



Fig. 7. Experimental code progression of the IC with the current-mirror devices depicted in Table 2.





and the (6/0.6) survivor out of 32 pairs.

From Fig. 9, it can also be seen that decreasing the number of pairs in the bank from 32 to 1 degrades the matching performance of the survivor from 1.94% to 22.23%. For example, the survivor of 4 devices outperforms a single device by a factor of nearly 4x while the survivor of 32 outperforms a single device by roughly 11x. The number of pairs required to achieve an offset specification depends on the inherent offset of the individual pairs, as derived in the Appendix.

The number of pairs in the bank is ultimately limited by diearea limits and start-up time, which increase with the number of device pairs to be compared before the system starts up. The cost of silicon real estate in today's driving CMOS technologies, however, tends to be less than that of trimming test time, especially if post-package trimming is adopted to mitigate the adverse piezo effects of packaging on matching. In the  $0.6\mu m$  CMOS prototype built, the bank of 32 pairs, the three 5-bit decoders, and the switching network occupied 0.78mm<sup>2</sup> and the overall scanning time, which is the time required to converge to the surviving pair, was 24.8ms (31 comparisons at 800µsec per comparison with a 5kHz 4-phase shift-register clock frequency). While this delay has a significant impact on systems and subsystems that start in less than 15-100ms (e.g., hard disk-drives and various power supplies), it incurs minimal overhead on portable devices like cellular phones and MP3 players, which take seconds to start. **Fig. 9. Statistical experimental offset performance of the Survivor scheme** 

and a series of single but larger geometry pairs (95% confidence



interval).

TABLE 3. EXPERIMENTAL OFFSET PERFORMANCE OF A SINGLE DEVICE PAIR WITH VARIOUS WIDTH-TO-LENGTH DIMENSIONS.

| W/L [μm/μm]                   | 6/0.6 | 18/1.8 | 30/3.0 | 48/4.8 | 60/6.0 |
|-------------------------------|-------|--------|--------|--------|--------|
| Normalized Area               | 1     | 9      | 25     | 64     | 100    |
| 3-σ Offset [%]                | 22.23 | 7.09   | 3.14   | 1.91   | 1.73   |
| 3-σ Offset of<br>Survivor [%] |       |        | 1.94   |        |        |

## IV. CONCLUSION

The IC prototype of the proposed Survivor strategy reliably converged on the best matching pair of a bank of 32 devices, yielding the 3- $\sigma$  matching performance of (48/4.8) MOSFETs while retaining the bandwidth of (6/0.6) transistors, which amounts to a bandwidth increase of 64x. This improvement is cost-effective because test-time and noise associated with trimming and DEM schemes are entirely avoided. The primary trade-off for this is silicon real estate: area used by the bank of devices and relevant control circuits  $(0.78 \text{mm}^2 \text{ in the})$ prototyped 0.6µm CMOS system) versus the number of fuses or EEPROM electronics used to trim a (6/0.6) device to yield the matching performance of a (48/4.8) transistor. Even if the proposed scheme demands more silicon area, its resulting cost is easier to absorb than test time in light of today's increasingly dense and volume-driven CMOS technologies. In summary, the Survivor strategy is a cost-effective method for reducing the random process- and package-induced effects on common, yet critical, analog building blocks and, when applied periodically, like during power-on-reset events, driftover-time shifts (which could result from thermal cycling and extended operational times).

## APPENDIX

If n, R, and r denote the number of device pairs in the bank of a sample, the  $3-\sigma$  offset of these n pairs, and the desired  $3-\sigma$ offset of the survivor pairs of N samples, the probability a pair within a sample has an offset lower than or equal to r is

$$p = \varphi\left(\frac{r}{R}\right) = \frac{1}{2} \operatorname{erf}\left(\frac{1}{\sqrt{2}} \frac{r}{R}\right), \tag{1}$$

where  $\varphi()$  represents the normal distribution function. As a result, the probability that *none* of the n devices within a sample has the required offset is  $(1-p)^n$ . If the probability of finding *at least one pair* from each sample within the desired resolution range is P, then

$$P = 1 - (1-p)^n \Longrightarrow n = \frac{\log(1-P)}{\log(1-p)}.$$
 (2)

For the prototype, P was 0.9 and r/R was set to 0.1. The probability of finding one pair of devices with the targeted offset can be set higher when more die area for device pairs and startup time are allowed, which is reasonable in systems that require high both accuracy and 100ms or more to start.

#### REFERENCES

- M.J.M. Pelgrom, et al., "Matching properties of MOS transistors," *IEEE J. of Solid-State Circuits*, vol. 24, pp. 1433-1440, Oct. 1989.
- [2] A. Hastings, *The Art of Analog Layout*, Prentice Hall, 2000, ISBN: 0130870617.
- [3] P. Gray, P. Hurst, S. Lewis, and R. Meyer, Analysis and Design of Analog Integrated Circuits, John Wiley & Sons, ISBN: 0471321680.
- [4] G. Zhang, S. Saw, J. Liu, S. Sterrantino, D. K. Johnson, and S. Jung, "An accurate current source with on-chip self-calibration circuits for low-voltage current mode differential drivers," *IEEE Trans. on Circuits* and Systems I, vol. 53, pp. 40-47, Jan. 2006.
- [5] P.R. Kinget, "Device mismatch and tradeoffs in the design of analog circuits," *IEEE J. of Solid-State Circuits*, vol. 40, pp.1212-1224, June 2005.
- [6] G.A. Rincón-Mora, Voltage References, IEEE Press, John Wiley & Sons, Inc., 2002, ISBN: 0471143367.
- [7] P. Deluca, "A review of thirty-five years of laser trimming with a look to the future," in *Proc. IEEE*, vol. 90, no. 10,pp. 1614-1619, Oct. 2002.
- [8] E. Sackinger and W. Guggenbuhl, "An analog trimming circuit based on a floating gate device," *IEEE J. of Solid-State Circuits*, vol. 23, pp.1437-1440, Dec. 1988.
- [9] A. Bakker and J.H. Huijsing, "A low-cost high-accuracy CMOS smart temperature sensor," in *Proc. European Solid-State Circuits Conf.*, Duisburg, 1999, pp. 302-305.
- [10] A. Bakker and J.H. Huijsing, "A CMOS nested-chopper instrumentation amplifier with 100-nV offset," *IEEE J. of Solid-State Circuits*, vol. 35, pp. 1877-1883, Dec. 2000.
- [11] C. C. Enz and G. C. Temes, "Circuit techniques for reducing the effects of op-amp imperfections: Autozeroing, correlated double sampling, and chopper stabilization," in *Proc. IEEE*, vol. 84, Nov. 1996.
- [12] L. Hebrard, J. B. Kammerer, and F. Braun, "A chopper stabilized biasing circuit suitable for cascaded wheatstone-bridge-like sensors," *IEEE Trans. on Circuits and Systems I*, vol. 52, pp. 1653-1665, Aug. 2005.
- [13] M. P. Flynn, C. Donovan, and L. Sattler, "Digital calibration incorporating redundancy of Flash ADCs," *IEEE Trans. on Circuits and Systems II*, vol. 50, pp. 205-213, May 2003.
- [14] E. Sackinger and W. Guggenbuhl, "A versatile building block: the CMOS differential difference amplifier," *IEEE J. of Solid-State Circuits*, vol. 23, pp.1437-1440, Dec. 1988.