# A 0.2–1.45-GHz Subsampling Fractional-*N* Digital MDLL With Zero-Offset Aperture PD-Based Spur Cancellation and *In Situ* Static Phase Offset Detection

Somnath Kundu, Student Member, IEEE, Bongjin Kim, Member, IEEE, and Chris H. Kim, Senior Member, IEEE

Abstract—A digital fractional-N subsampling multiplying delay-locked loop is proposed in this paper. A zero phase-offset latch-based aperture phase detector is introduced in a reference spur cancellation loop to precisely cancel any static phase offset (SPO) between the injected reference and the digitally controlled oscillator (DCO) phases. An in situ detection scheme is employed to directly measure this phase offset accurately by obviating the requirement of a high-speed off-chip measurement setup. Moreover, a mathematical expression is derived for the calculation of reference spur generated from a given SPO. A uniformly distributed switched capacitor-based DCO frequency tuning achieves highly linear gain. The chip prototype is fabricated in a 1.2-V supply, 65-nm LP CMOS technology and covers an output frequency range of 0.2-1.45 GHz while occupying a core area of 0.054 mm<sup>2</sup>. Measured phase noise at 1.4175 GHz is -95 dBc/Hz at 100-kHz offset, which is 9 dB lower than in phase-locked loop mode of operation.

Index Terms—Aperture phase detector (APD), digitally controlled oscillator (DCO), fractional-N, multiplying delay-locked loop (MDLL), phase-locked loop (PLL), reference spur, static phase offset (SPO), subsampling.

### I. INTRODUCTION

THE design of highly digital phase-locked loop (PLL) architectures [1]–[4] is gaining traction in nanoscale CMOS processes by obviating the need for an area consuming analog loop filter and circumventing the voltage headroom issue of the charge-pump (CP). Other benefits of the digital implementation include immunity to process, voltage and temperature (PVT) variations, easier portability to technology migration, and flexibility in performance optimization by reconfiguring the loop parameters. A classical digital implementation replaces the phase-frequency detector (FD) and the CP present in an analog PLL with a time-to-digital

Manuscript received July 28, 2016; revised October 8, 2016 and November 10, 2016; accepted December 4, 2016. Date of publication January 4, 2017; date of current version March 3, 2017. This paper was approved by Associate Editor Kenichi Okada.

S. Kundu and C. H. Kim are with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: kundu006@umn.edu).

B. Kim is with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA, and also with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2016.2638432

converter (TDC). The digital loop filter (DLF), unlike an analog one, can be realized in a compact area and the loop parameters can easily be tuned. However, the fundamental limitation of any PLL to achieve low phase-noise or jitter is the loop bandwidth, which cannot exceed 1/10th of the reference frequency in order to satisfy the discrete-time stability limit, also known as Gardner's criteria. As the noise of the voltagecontrolled oscillator (VCO) is high-pass filtered by the PLL loop, this sets a limit on the maximum VCO phase noise suppression. To overcome this drawback, the multiplying delay-locked loop (MDLL) [5]-[8] and the injection locked PLL, which has similar operation to an MDLL [9]-[11], are explored recently as an alternative. Fig. 1 shows the block diagram of an MDLL. The multiplexer in the VCO periodically replaces the output edge (OUT) with the clean reference edge (REF). This periodic replacement of OUT with REF prevents the jitter accumulation over multiple reference cycles and suppresses the VCO phase-noise beyond the PLL bandwidth, as shown in Fig. 1 (right).

In spite of superior noise performance, one major drawback of an MDLL is the reference spur that is generated at the MDLL output due to the static phase offset (SPO) between the REF and the OUT edge. Several efforts have been made in the previous studies [5]-[8], [13] to cancel the SPO. The circuit technique employed in [8] uses a sampling phase detector (PD) along with different analog voltage offset cancellation schemes, e.g., autozeroing, chopper stabilization, and so on to minimize SPO, while [13] uses a self-correcting CP. However, these sophisticated analog design techniques are limited to analog PLLs only. The reference spur cancellation technique proposed in [5] relies on correlated double sampling, but it requires a high resolution and high linearity gated ring oscillator-based TDC that increases the design complexity and the power consumption. A sense amplifier flip-flop-based 1-b TDC is used in [6] to minimize the phase offset, but this is applicable to differential architectures only. Reference [7] implements reference spur corrections in a fractional-N MDLL by a coarse followed by two fine digital-to-time converters (DTCs). Another limitation in all previous implementations was that the SPO is measured off-chip from the spur at the output frequency spectrum using a dedicated high frequency measurement setup, such as highfrequency probes or packages, off-chip drivers, connectors,

0018-9200 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Block diagram of MDLL circuit. VCO phases are periodically replaced by the clean reference clock phase. This prevents the VCO jitter from accumulating over multiple reference cycles, suppressing VCO phase noise with frequencies beyond the PLL bandwidth.

and spectrum analyzer. Each of these components introduces some inaccuracy in the measurement. Moreover, the measured spur in frequency domain needs to be converted to the time domain to estimate the SPO present in the circuit.

In this paper, we propose a fractional-N digital MDLL with a reference spur cancellation loop that precisely aligns the REF and the digitally controlled oscillator (DCO) edge utilizing a DTC and a zero-offset aperture PD (APD) [12]. An *in situ* offset detection circuit is also employed to measure the phase offset in time domain accurately without relying on high-speed off-chip measurements. Furthermore, we have derived a mathematical expression to calculate reference spur generated at the output spectrum for a given SPO. Calculation is performed for a wide variation of SPO. Fractional frequency multiplication is achieved by periodic phase rotation of multiple DCO phases, which is similar to the injection locking technique proposed in [9]. A subsampling PLL architecture, which is first introduced in [14], directly samples the output of the VCO without any frequency divider in the feedback path. Thereby, it achieves a high PD gain that reduces the in-band phase noise and power consumption. A digital version of this is realized in [15]. The subsampling method is utilized in this paper for a fractional-N MDLL implementation. The rest of this paper is organized as follows. Section II describes the reference spur issue in an MDLL along with the mathematical details for calculating reference spur generated from SPO. The proposed reference spur cancellation technique and the in situ offset detection circuit are explained in Sections III and IV, respectively. Circuit implementation details of the MDLL are described in Section V, followed by measurement results in Section VI. Finally, Section VII concludes this paper.

#### II. REFERENCE SPUR ISSUE IN MDLL

While providing superior phase noise performance compared with a traditional PLL, an MDLL suffers from the reference spur issue due to the SPO between the injected reference edge and the DCO edge. As explained in Fig. 2, one of the contributors of this offset in bang-bang PD-based digital MDLLs [6], is the setup time of the D flip-flop (DFF)  $(\Delta T_1)$  used for phase comparison. The delay of the frequency divider  $(\Delta T_2)$  in the feedback path increases this offset further. As a result, a fixed offset  $(\Delta T)$  is generated between the



Fig. 2. PD inherent offset and feedback divider delay generate a static offset between the DCO phase and the injected reference. This SPO creates reference spurs at the output of the MDLL.

reference and the DCO edge under phase locked condition. In MDLL operation, when reference is inserted into the ring oscillator path, it modulates the DCO period to  $T + \Delta T$  instantaneously, creating a deterministic jitter of  $\Delta T$  (*T* is the output period when there is no SPO). This behavior repeats in every reference cycle. This additional  $\Delta T$  in one clock cycle is compensated by the next N-1 cycles, assuming a frequency-multiplication factor of *N*.

The expression of reference spur at the output spectrum generated due to  $\Delta T$  offset is derived next. As discussed earlier and also evident from Fig. 3, the  $\Delta T$  offset makes the first MDLL output period  $T + \Delta T$ . The remaining N-1 periods in every reference cycle are adjusted to  $T_e = T - (\Delta T/(N-1))$ , to maintain the phase relationship between the reference and the output. Since the pattern repeats in every reference period,  $T_{\text{ref}}$ , we need to consider each MDLL output pulse separately within one reference cycle and convolve it with a train of impulses of period  $T_{\text{ref}}$  to represent the periodic nature of MDLL output. To start with the first MDLL pulse  $(x_1(t))$  that stays at 1 for the duration  $T/2 + \Delta T$ , output after convolution with the impulse train is

$$y_1(t) = x_1(t) * \sum_{k=-\infty}^{+\infty} \delta(t - kT_{\text{ref}}).$$
 (1)



Fig. 3. Calculation of the output reference spur for a given phase offset between the VCO and the injected reference. Fourier transform of each output pulse in a given reference period can be utilized to calculate the spur accurately.

Fourier transformation of  $y_1(t)$  to convert the signal into frequency domain gives

$$Y_{1}(\omega) = X_{1}(\omega) \frac{1}{T_{\text{ref}}} \sum_{k=-\infty}^{+\infty} \delta(\omega - \frac{2\pi k}{T_{\text{ref}}})$$
$$= \left(\frac{1 - e^{-j\omega\left(\frac{T}{2} + \Delta T\right)}}{j\omega T_{\text{ref}}}\right) \sum_{k=-\infty}^{+\infty} \delta\left(\omega - \frac{2\pi k}{T_{\text{ref}}}\right). \quad (2)$$

 $X_1(\omega)$  is a sync-function having nulls at the multiples of  $1/(T/2 + \Delta T)$  and it is sampled at an interval of  $1/T_{ref}$ , as represented in Fig. 3. Similarly, the convolution of the second pulse  $(x_2(t))$  of width  $T_e/2$  with the same impulse train but time-shifted by  $T + \Delta T$  results

$$Y_2(\omega) = \left(\frac{1 - e^{-j\omega T_e/2}}{j\omega T_{\text{ref}}}\right) e^{-j\omega(T + \Delta T)} \sum_{k=-\infty}^{+\infty} \delta\left(\omega - \frac{2\pi k}{T_{\text{ref}}}\right).$$
(3)

The Fourier transformation of  $x_2(t)$  that has nulls at the multiples of  $2/T_e$  and a  $T + \Delta T$  time shift in impulse train introduces an additional phase factor of  $e^{-j\omega(T+\Delta^T)}$  in the expression of  $Y_2(\omega)$ . Using the same procedure, the impulse train for the third sample will be time shifted by  $T_e + T + \Delta T$  and so on. Since all the remaining N - 1 pulses after the first pulse in every reference cycle has the same pulsewidth

of  $T_e/2$ , their Fourier function will be the same as obtained in (3), except the phase factor. Therefore, the Fourier function of the *m*th pulse where m = 2, 3, ... N can be expressed as

$$Y_{m}(\omega) = \left(\frac{1 - e^{-j\omega T_{e}/2}}{j\omega T_{\text{ref}}}\right) e^{-j\omega[(m-2)T_{e} + (T + \Delta T)]} \times \sum_{k=-\infty}^{+\infty} \delta\left(\omega - \frac{2\pi k}{T_{\text{ref}}}\right).$$
(4)

The complete expression for the MDLL output can be calculated by adding the Fourier expression of all the pulses in one reference cycle and this is as follows:

$$Y(\omega) = Y_1(\omega) + \sum_{m=2}^{N} Y_m(\omega).$$
 (5)

Using (2) and (4), we get

$$Y(\omega) = \frac{1}{j\omega T_{\text{ref}}} \left[ 1 - e^{-j\omega \left(\frac{T}{2} + \Delta T\right)} + \left(1 - e^{-j\omega \frac{T_e}{2}}\right) \times \sum_{m=2}^{N} e^{-j\omega[(m-2)T_e + (T + \Delta T)]} \right] \times \sum_{k=-\infty}^{+\infty} \delta\left(\omega - \frac{2\pi k}{T_{\text{ref}}}\right)$$
(6)



Fig. 4. (a) Output reference spur calculated for a 100-ps phase offset in a 1-GHz MDLL output with N = 10. (b) Reference spur plot for a time offset ranging from 1 to 300 ps and the impact of frequency multiplication factor on calculated spur level which is not captured in the approximate formula of (11) and [13].

$$Y(\omega) = \frac{1}{j\omega T_{\text{ref}}} \left[ 1 - e^{-j\omega \left(\frac{T}{2} + \Delta T\right)} + \left(1 - e^{-j\omega \frac{T_e}{2}}\right) \times e^{-j\omega (T + \Delta T)} \left(\frac{1 - e^{-j\omega (N-1)T_e}}{1 - e^{-j\omega T_e}}\right) \right] \times \sum_{k=-\infty}^{+\infty} \delta\left(\omega - \frac{2\pi k}{T_{\text{ref}}}\right).$$
(7)

Since  $T_{ref} = NT$ , (7) can be simplified to

$$Y(k) = \frac{1}{j2\pi k} \left[ 1 - e^{-j\frac{2\pi k}{N} \left(\frac{1}{2} + \frac{\Delta T}{T}\right)} + \left(1 - e^{-j\frac{\pi k}{N}\frac{T_e}{T}}\right) \times e^{-j\frac{2\pi k}{N} \left(1 + \frac{\Delta T}{T}\right)} \frac{1 - e^{-j\frac{2\pi k(N-1)}{N}\frac{T_e}{T}}}{1 - e^{-j\frac{2\pi k}{N}\frac{T_e}{T}}} \right].$$
 (8)

From the above-mentioned equation, the fundamental frequency component ( $f_{out} = N/T_{ref}$ ) of the output can be obtained by calculating the magnitude of Y(k) for k = N. The ratio of the frequency component at two sidebands, i.e., for  $k = N \pm 1$  to the fundamental component gives the reference spur at the MDLL output and it is expressed as

$$\operatorname{spur}_{\operatorname{MDLL}}(f_{\operatorname{out}} \pm f_{\operatorname{ref}}) = 20 \log \left( \frac{|Y(k = N \pm 1)|}{|Y(k = N)|} \right).$$
(9)

Under the assumption of  $\Delta T \ll T$  in (8), the expression for the fundamental and the sidebands magnitude can be approximated as

$$|Y(k=N)| \approx \frac{1}{\pi}$$
 and  $|Y(k=N\pm 1)| \approx \frac{1}{\pi} \frac{\Delta T}{T}$ . (10)

This simplifies the expression for reference spur to

$$\operatorname{spur}_{\operatorname{MDLL}}(f_{\operatorname{out}} \pm f_{\operatorname{ref}}) \approx 20 \log \left(\frac{\Delta T}{T}\right).$$
 (11)

The above-mentioned equation matches the expression derived in [13]. However, the assumption is valid only when the SPO is relatively small compared with the MDLL output time period, which may not always be the case. Furthermore, the spur is also a function of N, since it decides how frequently the MDLL output can fluctuate due to SPO. So a smaller N values should have more spurs. This property is also not captured in (11). An example is shown in Fig. 4(a) for an MDLL output frequency of 1 GHz, N = 10 and  $\Delta T = 100$  ps.



Fig. 5. Reference spur cancellation technique. The DTC delay can precisely align the injected reference and the DCO phase. Accurate cancellation requires a zero-offset PD.

Calculated reference spur using (8) at  $f_{out}-f_{ref}$  and  $f_{out} + f_{ref}$ is -19.8 and -18 dB, respectively. The difference in the spur levels in two side bands is due to the contribution of the higher order harmonics (i.e., at  $2 f_{out}$ ,  $3 f_{out}$ , and so on) present in the output square wave. In this case, the 9th and 11th harmonic of the reference spur generated from the second harmonic of the output ( $2 f_{out}$ ) overlaps with the two sidebands of the fundamental causing the mismatch in the spur levels. The analysis in [13] assumes the output to be a sinusoidal signal, neglecting the impact of higher order harmonics. Reference spur calculated using (11) is -20 dB. Fig. 4(b) shows the calculated reference spur when  $\Delta T$  is varied from 1 to 300 ps and the spur for different values of N for  $\Delta T = 10$  ps. As expected, the approximation is valid only when  $\Delta T$  is very small with respect to T and N is sufficiently high.

## III. REFERENCE SPUR CANCELLATION USING ZERO-OFFSET APD

The main source of SPO, as already explained, is due to the delay mismatch between the phase detection path and the reference injection path. Therefore, additional delay in one of the paths can mitigate this mismatch. In this paper, a DTC is utilized for this purpose, as shown in Fig. 5. Referring to Fig. 2, a  $\Delta T$  offset is created between the edges of REF and OUT signals, which causes the reference spur. Now in the timing diagram of Fig. 5, the REF is delayed by the DTC to generate REF' and it is then compared with DIV by the PD. So the  $\Delta T$  offset will now be present between the edges of REF' and OUT. If the DTC delay is precisely set to  $\Delta T$ , REF can be perfectly aligned with OUT, canceling the spur. However, complete cancellation of spur is practically impossible not only due to the limited DTC resolution and other parasitic mismatches present in the circuit, but also coupling through the parasitic capacitances and the power supply induced noise during MUX switching will appear as reference spur at the MDLL output.

Since the DTC resolution and the phase detection path delay  $(\Delta T)$  vary with PVT conditions, a spur cancellation loop is proposed that adjusts the DTC delay each time before the main MDLL operation. The loop consists of a PD, a digital accumulator and a DTC. The PD compares the phase difference between the REF and the OUT edge and controls the DTC code to match the DTC delay with  $\Delta T$ . Once the DTC codes settle, the loop is disabled and the REF is injected into the DCO to start the MDLL operation. If there is a significant delay mismatch due to voltage or temperature variation when the MDLL is operating, we need to restart the process again so that the loop can readjust the DTC codes with the new voltage and temperature condition.

However, any inherent offset present in the PD will directly appear at the output as SPO. Therefore, Fig. 6(a) shows the implementation of a zero-offset APD to address the issue explained earlier. A NAND-gate SR-latch is utilized to compare the phases of two input clocks without introducing any offset. Since a latch is sensitive to both the rising and the falling edges of the input signals, an aperture selection block is placed before the latch to capture only the rising edges for phase detection. One out of five DCO phases ( $\Phi 0-\Phi 4$ ) is selected at a time by the enable signals S0-S4. The SR-latch is followed by a DFF that stores the detected value for the reference period. Depending on the latch output state, the DFF either samples 1 or resets to 0. Fig. 6(b) shows the states of internal nodes of the APD when the DCO phase leads the reference. Although APD has no offset under nominal condition, process mismatch can introduce some phase offset in the latch. Therefore, we performed 1000 run Monte Carlo mismatch simulations by sweeping the time difference between two input clock edges and counting the number of occurrences of 1 at the APD output. An rms phase offset of 4.5 ps is obtained after the Gaussian curve fitting on simulated result, as shown in Fig. 6(c). The layout of the APD is made symmetric to minimize any additional systematic offset due to parasitic mismatches. Interestingly, using the APD in the main phase locking path is unable to cancel the spur, as the delay in the feedback path will still be present, generating significant SPO. Furthermore the APD only works when two input clock phases are within the aperture window, which is not guaranteed if used in the main path.

The DTC is implemented by tuning a switch capacitor array connected as load of an inverter-based delay chain and the resolution of 1.5 ps/LSB is sufficient considering that the spur cancellation loop resolution is primarily dominated by the offset of the APD.

#### IV. IN SITU OFFSET DETECTION CIRCUIT

Phase offset in an MDLL is conventionally measured from the reference spur in the output frequency spectrum. Equation (8) in Section II can be used to estimate  $\Delta T$  from the measured spur. However, a high frequency off-chip test setup may introduce measurement error. For example, a 1-dB error in spur measurement translates into 11% error in the offset. Therefore, we propose an *in situ* scheme to measure SPO accurately in time domain. Fig. 7 shows the schematic of the proposed offset detection circuit. The programmable delay



Fig. 6. (a) Zero-offset PD implementation utilizing a latch. Aperture selection block captures only the rising edges of two input clocks for phase comparison. (b) Example timing diagram of latch internal nodes when VCO phase arrives earlier than reference. (c) Monte Carlo simulation result to estimate the input offset due to device mismatch.



Fig. 7. Proposed in situ SPO measurement circuit based on error rate calculation. Counter selection block selects a given output period at a time in every reference cycle.

block generates a variable delay  $(T_P)$ , close to the time period of the input clock ( $T_{CKMDLL}$ ). The DFF that acts as a PD, compares  $T_{CKMDLL}$  with  $T_{P}$ , and generates an error pulse at the output when  $T_P$  is larger than  $T_{CKMDLL}$ . Error rate is calculated by measuring  $T_{\text{CKMDLL}}$  and the average time period of the error output, i.e.,  $avg(T_{BER})$  [16]. A 10-b counter is used to divide the output frequency when the error rate is high. An error rate plot can be obtained by sweeping  $T_P$ . The transition from low error rate to high error rate happens when  $T_P$  is near  $T_{CKMDLL}$ , capturing the time period of the input clock. This property is utilized to measure SPO. Using this circuit, the period of every Nth clock cycle of the MDLL can be measured separately. Counter selection block selects a particular MDLL period in every reference cycle. In Fig. 8(a), for N = 4, S0 selects the first clock period to measure the error rate of the previous cycle and thereby, low to high error rate transition happens near  $T - \Delta T/3$ . Similarly, for S1 selection, transition happens near  $T + \Delta T$ . Since only the first clock period is different from the remaining periods in a reference cycle, the plot for S1 selection will be skewed relative to the others (i.e., S0, S2, and S3). The amount of skew which is equal to the time period difference between

the first period and the remaining periods, is  $N/(N-1)*\Delta T$ [shown in Fig. 8(b)]. Upon  $\Delta T$  cancellation, S1 aligns with others, eliminating any skew. Avg( $T_{\text{BER}}$ ) is calculated off-chip using an oscilloscope. As the error output frequency is very low after the 10-b counter, the measurement setup does not involve any high frequency signals.

Fig. 9(a) explains the implementation of the programmable delay generation circuit. Delay stages are made differential to minimize supply noise sensitivity. 8-b switched-capacitors perform coarse delay tuning to cover wide input clock frequency range while the supply of the delay line  $(Vdd_d)$  is varied for fine delay tuning. To measure the absolute delay, the delay stages are connected in a ring oscillator configuration by setting EN\_RO = 1 and the oscillation time period is calculated. Measured delays from the implemented test chip for different Vdd\_d values are shown in Fig. 9(b) achieving a resolution of 3.5 ps/mv.

## V. MDLL IMPLEMENTATION DETAILS

Fig. 10 shows the complete block diagram of the proposed MDLL. Due to subsampling operation, a separate frequency-locking path comprised of a fractional FD and a digital



Fig. 8. (a) Timing diagram and error rate plot of S0 and S1 selection cases. Transition from low to high error rate happens near  $T - \Delta T/3$  and  $T + \Delta T$  for S0 and S1 selection, respectively. (b) Before spur cancellation, error rate plot of S1 selection will be skewed by  $N/(N-1)*\Delta T$  where N is the frequency multiplication factor.



Fig. 9. (a) Programmable delay circuit implementation. (b) Measured delay versus Vdd\_d plot.

integrator is employed to set the operating frequency of the MDLL. The integer and fractional portion of the frequency multiplication factor is set by INT (7:0) and FRAC (1:0)control signals, respectively. A 5:1 multiplexer (MUX) and selection logic block in the fractional FD selects one out of five phases of the DCO periodically without creating a glitch during phase transition to achieve the desired fractional frequency ratio at the multiple of 1/5. After frequency locking, the integrator output is stored and the feedback path is disabled, turning on the phase locking path. A DFF in the phase locking path acts as a digital 1-b subsampler. It directly subsamples the high frequency DCO output with the input reference clock and adjusts the DCO frequency by increasing or decreasing the DLF codes. Upon phase lock, the reference and the DCO rising edges appears within the time window of the APD (i.e., when any one of S0-S4 is 1) and the reference spur cancellation circuit cancels any SPO present between the

reference and the DCO phase by tuning the 6-b DTC delay. Once the DTC codes settle and the SPO is canceled, this loop is disabled storing the DTC codes. The reference injection path is then turned on for the MDLL operation. Here, one thing to note that the phase-locking path is still active to track any phase drift of the DCO due to PVT variations.

The implementation of the multiplexed ring-DCO that realigns the DCO phase with the reference and the fractional FD in the frequency-locking path are discussed in the following.

#### A. Reference Realigned DCO

Fig. 11(a) shows the schematic of the reference realigned DCO. Each stage of the five-stage ring oscillator consists of an inverter and an MUX. When the MUX selection goes to 1, the clean edge of the reference is inserted into the ring oscillator path. Since the fractional N is generated by the periodic



Fig. 10. Block diagram of the subsampling fractional-N digital MDLL with the proposed zero-offset aperture PD-based spur cancellation loop and *in situ* SPO measurement circuit.



Fig. 11. (a) Reference realigned DCO schematic with distributed switched-capacitor branches for linear frequency tuning. (b) Measured DCO frequency verifying linear tuning characteristics.

rotation of the DCO phases for phase detection, the appropriate DCO phase needs to be replaced by the reference. For example, when  $\Phi 0$  phase is selected by the 5:1 MUX that goes to the 1-bit SSPD, S0 becomes 1 for a small duration, replacing  $\Phi 0$  with the reference in the DCO loop. The same signals (S0–S4) that enable the APD, are also used here for MUX selection. Each inversion stage of the ring oscillator consists of 16 parallel tristate inverters enabled by the coarse tuning codes (Coarse (3:0>)) to achieve a wide tuning range. 10-b binary switched capacitor branches are used for fine frequency tuning.

The frequency resolution is improved by utilizing the drain junction of a minimum sized pMOS transistor as a unit switched-capacitor element [2]. All 1024 such elements are uniformly distributed across the five-inverter stages to achieve good frequency linearity. A completely symmetric layout strategy is also incorporated to minimize device-to-device mismatch. Measured frequency tuning characteristic in Fig. 11(b) verifies the high linearity of the DCO while achieving 65-kHz/LSB resolution. The APD compares the phase of the reference at the point of injection into the DCO loop (REF<sub>i</sub>)



Fig. 12. Fractional FD with glitch free DCO phase transition for precise frequency locking. The MUX selection logic ensures no glitches are present in the DCO output.



Fig. 13. Measured error rate from the *in situ* offset detection circuit in MDLL mode (before and after spur cancellation), and in PLL mode. Reference spur for an 800-MHz clock using a 100-MHz reference is also shown.

with the DCO internal phases ( $\Phi 0-\Phi 4$ ) and thereby any delay in the reference injection path is taken into consideration by the DTC code during reference spur cancellation. The replica path for the reference matches the rise time of REF<sub>i</sub> with  $\Phi 0-\Phi 4$ , so that the APD can precisely detect the offset without any dependence on its threshold crossing. The power supply noise sensitivity is minimized with an on-chip low dropout regulator (LDO) for the DCO supply. Although the supply noise within the PLL bandwidth can be automatically tracked by the loop itself, high frequency noise beyond the PLL bandwidth is suppressed by the LDO. Therefore, a higher LDO bandwidth (about ten times of the



Fig. 14. Measured spur levels over supply variation to verify the effectiveness of the proposed spur cancellation loop.



Fig. 15. Measured output spectrum and MDLL fractional spur before and after reference spur cancellation.



Fig. 16. Measured phase noise in PLL and MDLL mode of operation. MDLL shows 16 and 9 dB lower phase noise than PLL at 100-kHz offset frequency in integer and fractional mode, respectively.

PLL bandwidth) is essential for better power supply noise suppression.

#### **B.** Fractional Frequency Detector

The fractional FD for a divider-less PLL is implemented in [17] using a high-resolution TDC and a counter to detect the fractional and integer portions, respectively. TDC improves the frequency resolution but increases the design complexity and power consumption. Fractional frequency detection in [4] and [9] is performed using a walking-one phase selector and a fractional sampler, respectively. In this paper, shown in Fig. 12, a wide detection range edge counter [18] counts the number of DCO rising edges (DCO\_OUT) within a reference period and a 5:1 MUX selects DCO phases periodically for different fraction generations. The edge counter comprises an 8-b full adder-based high-speed synchronous counter, triggered by the DCO rising edges. The counter outputs (A7–A0) are sampled and stored in every reference cycle. In order to avoid metastability, the reference clock (REF) is resynchronized to the falling edge of the DCO before sampling. Register 1 stores the recent value of the counter, while register 2 stores the value of the previous cycle. The number of DCO edges in any given reference cycle is obtained by subtracting the values stored in two registers and it is compared with INT  $\langle 7:0 \rangle$  so that under



|                                       |          | PLL mode                               | MDLL mode |  |  |
|---------------------------------------|----------|----------------------------------------|-----------|--|--|
| Technology                            |          | CMOS 65nm, 1.2V                        |           |  |  |
| Output frequency                      |          | Integer: 1.4GHz<br>Fraction: 1.4175GHz |           |  |  |
| Frequency                             | range    | 0.2–1.45 GHz                           |           |  |  |
| DCO ty                                | ре       | 5-stage ring oscillator                |           |  |  |
| In-band PN<br>(dBc/Hz<br>@100kHz)     | Integer  | -92                                    | -108      |  |  |
|                                       | Fraction | -87                                    | -96       |  |  |
| Integ. RMS<br>jitter<br>(10kHz-10MHz) | Integer  | 8.1ps                                  | 2ps       |  |  |
|                                       | Fraction | 11.7ps                                 | 2.8ps     |  |  |
| Power<br>(mW)                         | DCO      | 4.5                                    |           |  |  |
|                                       | Total    | 8.0                                    |           |  |  |
| FOM<br>(dB)                           | Integer  | -212.8                                 | -225      |  |  |
|                                       | Fraction | -209                                   | -222      |  |  |
| Area                                  | Core     | 180µm x 300µm                          |           |  |  |
|                                       | Total    | 300µm x 400µm                          |           |  |  |

Fig. 17. Test chip micrograph and performance summary.

|                                         | This<br>Work                      | Song [20]<br>ISSCC'15              | Deng [19]<br>ISSCC'15             | Marucci [7]<br>ISSCC'14           | Liu [4]<br>ISSCC'14             | Jang [3]<br>ISSCC'13               | Park [9]<br>ISSCC'12            |
|-----------------------------------------|-----------------------------------|------------------------------------|-----------------------------------|-----------------------------------|---------------------------------|------------------------------------|---------------------------------|
| Architecture                            | MDLL<br>w/o divider               | FF- PLL*<br>w/ divider             | Soft IL-PLL**<br>w/ divider       | MDLL<br>w/ divider                | PLL<br>w/ divider               | PLL<br>w/ divider                  | IL- PLL**<br>w/o divider        |
| Process                                 | 65nm                              | 14nm                               | 65nm                              | 65nm                              | 20nm                            | 28nm                               | 65nm                            |
| Output frequency<br>/Range (GHz)        | 1.4175<br>/(0.2 –1.45)            | 1<br>/(0.032–2)                    | 1.5222<br>/(0.8–1.7)              | 1.651<br>/(1.6–1.9)               | 1.2487<br>/(0.025–1.6)          | 0.962<br>/(0.032 – 2)              | 0.581<br>/(0.58–0.611)          |
| Ref. frequency<br>(MHz)                 | 87.5                              | 32                                 | 380                               | 50                                | 25                              | 30                                 | 32                              |
| Power (mW)                              | 8                                 | 2.1                                | 3                                 | 3                                 | 2.5                             | 5.3                                | 10.5                            |
| Spur (dBc)                              | -45                               | N/A                                | -63                               | -47                               | N/A                             | N/A                                | N/A                             |
| Freq. resolution                        | 17.5MHz                           | 2MHz                               | N/A                               | 0.19kHz                           | 0.78MHz                         | 0.94MHz                            | 1MHz                            |
| Intg. RMS jitter<br>(ps)<br>Intg. range | 2.8 (0.39%)<br>(10kHz –<br>10MHz) | 18.8 (1.88%)<br>(1kHz –<br>100MHz) | 3.6 (0.55%)<br>(1kHz –<br>100MHz) | 1.4 (0.23%)<br>(30kHz –<br>30MHz) | 28 (3.5%)<br>(20kHz –<br>40MHz) | 19.3 (1.85%)<br>(20kHz –<br>40MHz) | 8 (0.46%)<br>(100Hz –<br>40MHz) |
| FoM*** (dB)                             | -222                              | -211                               | -224                              | -232                              | -207                            | -207                               | -211                            |
| Area (mm²)                              | 0.054                             | 0.009                              | 0.048                             | 0.4                               | 0.012                           | 0.026                              | 0.083                           |

\*FF= Feed-forward \*\*IL=Injection-locked \*\*\*FoM=20log(σ/1s)+10log(P/1mW)

Fig. 18. Performance comparison with other state-of-the-art fractional-N ring oscillator-based frequency synthesizers.

frequency locked condition, the 8-b output ( $D_{OUT}$ ) settles to 0. FRAC (1:0) controls the fractional part by changing the order of the DCO phase selection. As an example, for generating a fraction 1/5,  $\Phi$ 0 is selected in the first reference cycle,  $\Phi$ 1 second,  $\Phi$ 2 third, and so on. However, during DCO phase transition, unwanted glitches can appear, as evident from the timing diagram in Fig. 12. These glitches alter the counter output, locking the loop to an undesired frequency. To avoid this, the phase transition must happen when both phases are either 0 or 1. For example, during transition from  $\Phi$ 0 to  $\Phi$ 1, MUX selection (SEL) should change between the rising edge of  $\Phi$ 1 and the falling edge of  $\Phi$ 0. A "MUX Selection Logic" resynchronizes the REF with the appropriate DCO phase and generates SEL using a 5-b ring counter.

#### VI. MEASUREMENT RESULTS

The proposed MDLL is realized in a 1.2 V, 65-nm LP CMOS process. Fig. 13 shows the measured error rate plot obtained from the *in situ* detection block by varying the programmable delay,  $T_p$ , for an output frequency of 800 MHz while using a 100-MHz input reference. When the spur cancellation loop is inactive, the error plot for S1 selection is skewed by 131 ps than others. This corresponds to an SPO of 115 ps. Upon activation of the spur cancellation loop; the skew is reduced to only 6 ps, which is contributed by the small offset present in the APD due to process mismatch. As expected, the error rate plot for PLL does not show any noticeable skew. These time-domain measurement results are compared with the frequency domain reference spur measured at 900 MHz

in the output spectrum, which shows -23 and -47 dBc of reference spur before and after cancellation, respectively, while it is -48 dB in PLL mode. Theoretically calculated spur using the measured SPO from the *in situ* detection circuit and (11) results -20.7 and -47.5 dBc, respectively, before and after cancellation, which are close to the measured spur. To verify the effectiveness of the spur cancellation circuit, output spur is measured by varying the MDLL supply and the results are shown in Fig. 14. Output frequency spectrums are shown in Fig. 15 (left) comparing the performance between the PLL and the MDLL operation mode at 1.4175 GHz for an input reference of 87.5 MHz. Output reference spur at the multiples of  $f_{\text{REF}}/5 = 17.5$  MHz is plotted in Fig. 15 (right) before and after reference spur cancellation. Reference spur before cancellation was -35 dB while it reduces to -45 dB after cancellation. Fig. 16 shows the measured phase-noise plot for both integer and fractional mode at output frequency of 1.4 and 1.4175 GHz, respectively. The MDLL shows about 16 and 9 dB lower phase-noise compared with a PLL having identical operating conditions at 100 kHz offset in integer and fractional modes, respectively. Increase in phase noise in fractional-N mode is due to the imbalance among multiple DCO phases. Also the presence of SPO in MDLL changes the DCO operating frequency, which translates into phase errors during periodic phase rotation. Fig. 17 shows the chip micrograph with a performance summary table. The overall chip area is 0.12 mm<sup>2</sup>, of which the MDLL/PLL core is only 0.054 mm<sup>2</sup>. Output frequency range is 0.2–1.45 GHz with a resolution of 17.5 MHz when an 87.5 MHz reference is used. Total core power consumption is 8 mW at 1.4 GHz. Fig. 18 compares the performance of this paper with other state-ofthe-art inductor-less fractional-N frequency synthesizers.

## VII. CONCLUSION

A fractional-N subsampling digital MDLL is presented that eliminates the reference spur utilizing a DTC and a zerooffset APD. An in situ detection circuit measures the SPO of MDLL very precisely in time domain without requiring any high-speed off-chip measurement setup. This paper also addresses the reference spur issue in an MDLL-based clock generation circuit, deriving a mathematical model to estimate the reference spur due to SPO. A wide frequency range ring DCO achieves good linearity by utilizing a uniformly distributed switched-capacitor elements for frequency tuning and a completely symmetric layout design approach. Finally, the proposed concepts are verified with the measurement results obtained from a prototype chip implemented in a 65-nm LP CMOS technology. Phase noise measurement result shows about 9-dB additional noise suppression in MDLL compared with a PLL at 1.4175 GHz.

#### REFERENCES

- [1] D. Tasca, M. Zanuso, G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 2.9–4.0-GHz fractional-N digital PLL with bang-bang phase detector and 560-fs<sub>rms</sub> integrated jitter at 4.5-mW power," *IEEE J. Solid-State Circuits*, vol. 46, no. 12, pp. 2745–2758, Dec. 2011.
- [2] N. August, H.-J. Lee, M. Vandepas, and R. Parker, "A TDC-less ADPLL with 200-to-3200 MHz range and 3 mW power dissipation for mobile SoC clocking in 22 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2012, pp. 246–247.

- [3] T.-K. Jang et al., "A 0.026 mm<sup>2</sup> 5.3 mW 32-to-2000 MHz digital fractional-N phase locked-loop using a phase-interpolating phase-todigital converter," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2013, pp. 254–255.
- [4] J. Liu *et al.*, "A 0.012 mm<sup>2</sup> 3.1 mW bang-bang digital fractional-N PLL with a power-supply-noise cancellation technique and a walking-onephase-selection fractional frequency divider," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2014, pp. 268–269.
- [5] B. M. Helal, M. Z. Straayer, G.-Y. Wei, and M. H. Perrott, "A highly digital MDLL-based clock multiplier that leverages a selfscrambling time-to-digital converter to achieve subpicosecond jitter performance," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 855–863, Apr. 2008.
- [6] A. Elshazly, R. Inti, B. Young, and P. K. Hanumolu, "Clock multiplication techniques using digital multiplying delay-locked loops," *IEEE J. Solid-State Circuits*, vol. 48, no. 6, pp. 1416–1428, Jun. 2013.
- [7] G. Marucci, A. Fenaroli, G. Marzin, S. Levantino, C. Samori, and A. L. Lacaita, "A 1.7 GHz MDLL-based fractional-N frequency synthesizer with 1.4ps RMS integrated jitter and 3 mW power using a 1b TDC," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2014, pp. 360–361.
- [8] P. C. Maulik and D. A. Mercer, "A DLL-based programmable clock multiplier in 0.18-μm CMOS With -70 dBc reference spur," *IEEE J. Solid-State Circuits*, vol. 42, no. 8, pp. 1642–1648, Aug. 2007.
- [9] P. Park, J. Park, H. Park, and S. Cho, "An all-digital clock generator using a fractionally injection-locked oscillator in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2012, pp. 336–337.
- [10] D. Park and S. Cho, "A 14.2 mW 2.55-to-3 GHz cascaded PLL with reference injection and 800 MHz delta-sigma modulator in 0.13  $\mu$ m CMOS," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 2989–2998, Dec. 2012.
- [11] A. Musa, W. Deng, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A compact, low-power and low-jitter dual-loop injection locked PLL using all-digital PVT calibration," *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 50–60, Jan. 2014.
- [12] S. Kundu, B. Kim, and C. H. Kim, "A 0.2-to-1.45 GHz subsampling fractional-N all-digital MDLL with zero-offset aperture PD-based spur cancellation and *in-situ* timing mismatch detection," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Jan./Feb. 2016, pp. 326–327.
- [13] S. L. J. Gierkink, "Low-spur, low-phase-noise clock multiplier based on a combination of PLL and recirculating DLL with dual-pulse ring oscillator and self-correcting charge pump," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2967–2976, Dec. 2008.
- [14] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A low noise sub-sampling PLL in which divider noise is eliminated and PD/CP noise is not multiplied by N<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3253–3263, Dec. 2009.
- [15] Z. Ru, P. Geraedts, E. Klumperink, X. He, and B. Nauta, "A 12 GHz 210 fs 6 mW digital PLL with sub-sampling binary phase detector and voltage-time modulated DCO," in *IEEE Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2013, pp. 194–195.
- [16] D. Jiao, B. Kim, and C. H. Kim, "Design, modeling, and test of a programmable adaptive phase-shifting PLL for enhancing clock data compensation," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2505–2516, Oct. 2012.
- [17] E. Temporiti, C. Weltin-Wu, D. Baldi, R. Tonietto, and F. Svelto, "A 3 GHz fractional all-digital PLL with a 1.8 MHz bandwidth implementing spur reduction techniques," *IEEE J. Solid-State Circuits*, vol. 44, no. 3, pp. 824–834, Mar. 2009.
- [18] V. Kratyuk, P. K. Hanumolu, U.-K. Moon, and K. Mayaram, "Frequency detector for fast frequency lock of digital PLLs," *Electron. Lett.*, vol. 43, no. 1, pp. 1–2, Jan. 2007.
- [19] W. Deng et al., "A 0.048 mm<sup>2</sup> 3 mW synthesizable fractional-N PLL with a soft injection-locking technique," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2015, pp. 252–253.
- [20] M. Song, T. Kim, J. Kim, W. Kim, S.-J. Kim, and H. Park, "A 0.009 mm<sup>2</sup> 2.06 mW 32-to-2000 MHz 2nd-order  $\Delta\Sigma$  analogous bang-bang digital PLL with feed-forward delay-locked and phase-locked operations in 14 nm FinFET technology," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2015, pp. 266–267.



**Somnath Kundu** (S'13) received the B.E. degree in electronics and telecommunication engineering from Jadavpur University, Kolkata, India, in 2008, and the M.S. (Research) degree in electrical engineering from IIT Delhi, New Delhi, India, in 2012. He is currently pursuing the Ph.D. degree in electrical engineering with the University of Minnesota, Minneapolis, MN, USA, with a focus on digital intensive mixed-signal circuit design, such as clock generators, analog-to-digital converters, and voltage regulators.

He was an Analog Design Engineer with STMicroelectronics, Greater Noida, India, from 2008 to 2012, where he was involved in transmitter, phase-locked loop, and bias design for different high-speed serial link IPs. He was an Intern with Xilinx and Rambus in 2014 and 2015, respectively. He also joined the Circuit Research Lab, Intel, Hillsboro, OR, USA, as an Intern, in 2015.

Mr. Kundu was a recipient of the Best Student Paper Award in the 2013 IEEE International Conference on VLSI Design.



**Bongjin Kim** (S'03–M'10) received the B.S. and M.S. degrees in electrical engineering from the Pohang University of Science and Technology, Pohang, South Korea, in 2004 and 2006, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, MN, USA, in 2014.

He was with System LSI, Samsung Electronics, Yongin, South Korea, from 2006 to 2010, where he performed research on the clock generator circuits for high-speed serial interface PHY transceivers.

He joined Wireless Business, Texas Instruments, Dallas, TX, USA, as an Intern, in 2012. He joined the Mixed-Signal Communication IC Design Group, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, as a Research Summer Intern, in 2013. He was an Engineering Intern and a Senior Technical Staff Member with Memory and Interface Division, Rambus Inc., Sunnyvale, CA, USA, from 2014 to 2016, where he was involved in the research of 28/56 G serial link circuits and microarchitectures. He is currently a Post-Doctoral Research Fellow in electrical engineering with Stanford University, Stanford, CA, USA. His current research interests include the development of next-generation high-speed communication circuits, systems, and their design methodologies.



Chris H. Kim (M'04–SM'10) received the B.S. and M.S. degrees from Seoul National University, Seoul, Korea, and the Ph.D. degree from Purdue University, Lafayette, IN, USA.

He was with Intel Corporation, where he performed research on variation-tolerant circuits, on-die leakage sensor design, and crosstalk noise analysis. He joined the Electrical and Computer Engineering Faculty, University of Minnesota, Minneapolis, MN, USA, in 2004, where he is currently a Professor. He has authored or co-authored

over 200 journal and conference papers. His current research interests include digital mixed-signal, and memory circuit design in silicon and nonsilicon (organic TFT and spin) technologies.

Dr. Kim was a recipient of the SRC Technical Excellence Award, the Council of Graduate Students Outstanding Faculty Award, the NSF CAREER Award, a Mcknight Foundation Land-Grant Professorship, the 3 M Non-tenured Faculty Award, DAC/ISSCC Student Design Contest Awards, the IBM Faculty Partnership Awards, the IEEE Circuits and Systems Society Outstanding Young Author Award, and the ISLPED Low Power Design Contest Awards. He has served as the Technical Program Committee Chair for the 2010 International Symposium on Low Power Electronics and Design.