# 5 Gb/s Burst-Mode Clock Phase Aligner with (64, 57) Hamming Codes for GPON Applications

Ming Zeng, Bhavin J. Shastri, Nicholas Zicha, Michael Vander Schueren, and David V. Plant Photonic Systems Group, Department of Electrical and Computer Engineering, McGill University 3480 University Street, Montreal, Quebec, Canada, H3A 2A7 Tel: 514-398-8053, Fax: 514-398-5208, E-mail: ming.zeng@mail.mcgill.ca

Abstract—We demonstrate a 5 Gb/s burst-mode receiver (BMRx) featuring automatic phase acquisition using a clock phase aligner, and forward-error correction using (64, 57) Hamming codes. This BMRx provides instantaneous (0-bit) phase acquisition with packet-loss ratio  $<10^{-6}$  and bit-error rate  $<10^{-10}$  for any phase step  $(\pm 2\pi$  rads) between consecutive packets. Our design is based on an oversampling local oscillator operated at twice the bit rate and a phase picking algorithm.

*Index Terms*—Burst-mode receiver, clock and data recovery, clock phase aligner, forward-error correction, passive optical networks.

#### I. Introduction

Passive optical networks (PONs) are an emerging multiaccess network technology that provide a low-cost method of deploying fiber-to-the-home. Fig. 1 shows an example of a gigabit PON (GPON). In the upstream direction, the network is point-to-multipoint: using time-division multiple access (TDMA), multiple optical network units (ONUs) transmit bursty data to the optical line terminal (OLT). Due to optical path differences, packets can vary in phase and amplitude. To deal with these variations, the OLT requires a burst-mode receiver (BMRx). The BMRx front-end is responsible for amplitude recovery, whereas clock and data recovery (CDR) is performed with phase acquisition by a clock phase aligner (CPA). This paper focuses on the CPA aspect of the BMRx.



Fig. 1. A generic PON showing our work in context (APD: avalanche photodiode; TIA: transimpedance amplifier; BM-LA; burst-mode limiting amplifier; Des: deserializer; DSP: digital signal processing).

The most important characteristic of the BM-CPA is its phase acquisition time which must be as short as possible. In [1], the authors proposed a BM-CPA at 622 Mb/s by making use of an oversampling CDR operated at twice the bit rate and a phase picking algorithm. In this paper, we demonstrate a 5 Gb/s BM-CPA that achieves instantaneous (0-bit) phase acquisition for any phase step ( $\pm 2\pi$  rads) between consecutive packets, with packet loss ratio (PLR)  $< 10^{-6}$  and bit-error rate (BER)  $< 10^{-10}$ . In addition, we achieve this in a much more cost-effective manner by employing a simple local oscillator (LO), and thus eliminating the need

of complex and expensive CDR circuits based on phase-locked loops (PLLs). Our receiver also features forward-error correction (FEC) using (64, 57) Hamming codes.

### II. BURST-MODE CLOCK PHASE ALIGNER

#### A. Overall Design

A block diagram of the BM-CPA is shown in Fig. 2. An LO or a CDR can be used to either generate a clock signal, or recover the clock from the incoming bursty data, respectively. The CDR/LO is followed by a 1:16 deserializer from Maxim-IC (MAX3995). The lower rate parallel data is then brought onto a Virtex IV field programmable gate array (FPGA) from Xilinx for further processing. On the board, it is first necessary to further parallelize the data and clock to a lower frequency that will ensure proper synchronization and better stability of these signals before they can be sent to the CPA for automatic phase acquisition. Thus, an integrated double-data rate (DDR) 1:8 deserializer is implemented on the FPGA. The CPA can be turned ON or by-passed to operate at different modes for experimental purposes. The realigned data is then sent to the (64, 57) Hamming decoder which can be turned ON for BER measurements with FEC. The decoder is followed by a customly designed FPGA based BER tester (BERT), to selectively perform BER and PLR measurements on the payload of the packets. Note that, while a conventional BERT can be used to make the BER measurements, PLR measurements on discontinuous, bursty data, is not supported.



Fig. 2. Block diagram of BM-CPA.

#### B. Data Deserialization

The main challenge in designing gigabit-capable receivers based on FPGAs, is of limited processing speed of digital logic on commercially available FPGAs. For example, the digital clock manager (DCM) module<sup>1</sup> on the FPGA, in essence a

<sup>1</sup>A key design component that provides multiple phases of a source clock, and a zero propagation delay with low clock skew between the output clock signals distributed throughout the board.

digital PLL, is limited to an operating range of 24 MHz to 500 MHz. The latter frequency is 20 times slower than the targeted 10 Gb/s ( $2\times$  oversampling of the 5 Gb/s data). Thus, two stages of deserialization are employed.

The first deserialization stage is performed by the off-board 1:16 deserializer. The oversampled 10 Gb/s data and clock are deserialized to 34 parallel signals (32 differential data signals + 2 differential clock signals), each at 625 Mb/s each. These signals are then brought on to the FPGA board through low-voltage differential signaling. However, the 625 MHz clock signal is 1.25× faster than the maximum operating frequency of DCM which is 500 MHz. Thus, a clock divider is used to reduce the frequency of the received clock to 312.5 MHz. This clock signal is then fed to a DCM block for further clock distribution throughout the system.

The second deserialization stage is based on the DDR signaling, and it is accomplished by a 1:8 deserializer designed and implemented on the FPGA. It uses the 312.5 MHz DCM output clock signal to sample the 625 Mb/s incoming data at both the rising and the falling clock edges—DDR signaling. In this way, each data signal is separated into two data lines by a half-rate clock signal. The same clock is then used to demultiplex these two lines of data into an 8-bit data path. In summary, the 16 input data signals are deserialized to 128 data lines at  $\sim$ 78 Mb/s which is eight times lower than 625 Mb/s. The advantage of this method is that the clock signal is well within the 24 MHz to 500 MHz operating range of the DCM guaranteeing system synchronization while keeping the same harmonic content of the clock and data lines.

#### C. Clock and Phase Alignment

The idea behind the CPA is based on a simple, fast, and effective algorithm. Since the data is sampled twice per bit, the odd samples and even samples (O and E, respectively, in Fig. 2), sampled on the alternate (odd and even) clock rising edges are identical. The odd samples are forwarded to path O and the even samples are forwarded to path E. The byte synchronizer is responsible for detecting the delimiter. It makes use of a payload detection algorithm to look for a preprogrammed delimiter. The two byte synchronizer attempt to detect the delimiter on either the odd and/or even samples of the data respectively. That is, regardless of the phase step between two consecutive packets ( $\pm 2\pi$  rads), there will be at least one clock edge (odd or even) that will yield an accurate sample. The phase picker then uses feedback from the byte synchronizers to select the right path from the two possibilities. The realigned data is then sent to the BERT.

In essence, the BM-CPA supports three modes of operation:
1) conventional mode—essentially a SONET CDR, 2) burstmode with CDR-CPA turned ON with CDR locking at twice
the data rate, 3) burst-mode with LO-CPA turned ON with
LO locking at twice the data rate. These modes of operation
are useful in measuring the relative performances.

#### D. Hamming Decoder

A Hamming code is a linear error-correcting code, in which parity symbols are appended to original data for error

detection and correction [2]. More specifically, a (p, k) Hamming codeword has a length of p bits, out of which k bits are information bits, and p-k are check bits, also known as parity bits. Conventional Hamming code has minimum distance of 3, which allows either single error correction or double error detection, but not both. Optimal minimum oddweight-column single error correction double error detection (SEC-DED) code is a shortened version of Hamming code. It has better performance, cost, and reliability than conventional Hamming code [3]. The shortened Hamming code's paritycheck matrix is constructed by deleting certain columns from the conventional Hamming parity-check matrix to insure a minimum distance of 4. Thus, SEC-DED can be performed at the same time. The reconstruction of the the parity-check matrix also reduces the complexity of the decoder design and features more cost-effective hardware implementation.

We implement (64, 57) shortened Hamming codes in the receiver design. In which case, every 57 bits of data is concatenated with 7 bits of parity to make a codeword of 64 bits in length. The check bits are encoded from XORing certain bits in the original data word.

The decoding process is done in three steps: syndrome generation, mask generation and, and data correction. Fig. 3 shows the design of the decoder block. A 7-bit syndrome vector is generated by XORing certain bits in the original 57-bit data with their corresponding parity bits. When a single error is present in a codeword, the generated syndrome contains information of the exact location of the erroneous bit. A codeword with even numbers of errors generates a syndrome with even number of 1's, which signals the decoder to leave the data as it is with the errors uncorrected. If multiple odd number of errors occur in the data, there are cases in which the generated syndrome pattern is outside the columns of the code's parity-check matrix. In these cases, no error correction is performed. In the cases that the generated syndrome pattern coincides with a column in the parity-check matrix, the decoder is forced to perform a miscorrection. However, the probability of occurence of the last case is very low [3]. During mask generation step, syndrome information is used to create a mask through a look up table (LUT). A mask is a 57-bit long binary vector with value '1' at the erroneous bit location and value '0' for all the other bits. It is used to XOR with the transmitted data to invert the error bit during data correction step.



Fig. 3. Hamming decoder circuit diagram. (Reg: register)

#### III. RESULTS AND DISCUSSION

Fig. 4 shows the experimental setup to measure the phase acquisition time of the BM-CPA in the three modes of operation. Bursty upstream PON traffic is generated by adjusting the phase  $-2\pi \le \Delta \varphi \le +2\pi$  rads on a 1-ps resolution, between alternating packets from two programmable ports of a pattern generator, which are then concatenated via a radio frequency power combiner (PC). These packets are formed from: preamble bits, 36 delimiter bits,  $2^{15} - 1$  payload bits, and 48 comma bits. The preamble field is used to perform amplitude and phase recovery. We define the phase acquisition time as the number of preamble bits needed to achieve errorfree operation (PLR  $< 10^{-6}$  and BER  $< 10^{-10}$ ) for any phase step  $|\Delta \varphi| \leq 2\pi$  rads, between consecutive packets. The delimiter is a unique pattern indicating the start of the packet to perform byte synchronization. Likewise, the comma is a unique pattern to indicate the end of the payload. The payload is simply a nonreturn-to-zero  $2^{15}-1$  PRBS. The PLR and the BER are measured on the payload bits only.



Fig. 4. Experimental setup (LPF: low-pass filter).

# A. PLR Measurements

Fig. 5 shows the PLR performance of the system as a function of the phase difference between consecutive packets. Fig. 5(a) depicts the phase step response of the receiver with only the CDR and the CPA turned off (mode 1), for different preamble lengths at 1.25 Gb/s. The reason why we have a bell-shaped curve centered at 400 ps is that this is the half bit period corresponding to the worst-case phase step ( $\pi$  rads), and therefore the CDR is sampling exactly at the edge of the eye diagram [1]. Preamble bits ("1010. . ." pattern) can be inserted at the beginning of the packets to help the CDR settle down and acquire lock. As the preamble length is increased, there is an improvement in the PLR. We observe error-free operation (PLR <  $10^{-6}$ ) for any phase step after 28 preamble bits. However, the use of the preamble reduces effective throughput and increases delay.

By switching on the burst-mode functionality of the receiver with the CPA (mode 2) as shown in Fig. 5 (b), we observe error-free operation for any phase step with *no* preamble bits, allowing for instantaneous phase acquisition. We also plot the phase step response of the receiver at 2.5 Gb/s.<sup>2</sup> Again, as expected, with only the CDR, the curve is centered at 200 ps



Fig. 5. PLR performance for the BM-CPA.

as this is the half bit period corresponding to the worst-case phase step ( $\pi$  rads) at 2.5 Gb/s.

By replacing the PLL-based CDR by the oversampling LO (mode 3), we obtain error-free operation for any phase step with no preamble bits for data rates up to 5 Gb/s as demonstrated in Fig. 5(c). To the best of our knowledge, this is the first time that a BM-CPA has been successfully implemented without CDR circuitry. This novel design is simpler and cheaper, without any reduction in performance.

#### B. BER Measurements

Current PON systems employ Fabry-Perot (FP) lasers, a multi-longitudinal mode device, at the ONU to minimize the cost per subscriber. FP lasers provide the most cost effective solution for meeting the PON requirements by having the optimum optical power required for a 20-km reach in a GPON uplink [4]. However, the BER performance of the optical

<sup>&</sup>lt;sup>2</sup>The CDR could not be tested for different preamble lengths at 5 Gb/s as this rate is not supported on commercially available SONET CDRs.

system may be severely impaired by the mode-partition noise (MPN) of the FP laser coupled with the chromatic dispersion that exists in the transmission fiber [5].

To study the impact of FEC on the optical budget of the GPON uplink, we plot the BER performance of the system with and without FEC, as a function of the received power as shown in Fig. 6. According to the G.984.2 standard, coding gain is defined as the difference in input power at the receiver with and without FEC at a BER =  $10^{-10}$ . With the implemented (64, 57) Hamming codes, we observe a coding gain of  $\sim$ 1.8 dB.



Fig. 6. BER performance for the BM-CPA.

It is interesting to compare the performance of the (64, 57) Hamming codes, a class of binary linear codes, with the well-known (255, 239) Reed-Solomon (RS), a nonbinary subclass of multiple-error-correcting BCH codes [2]. The block-based RS codes defined as RS(n,k), divide a codeword into n symbols of m bits each, of which k are data (uncoded) symbols. In a memoryless channel, a symbol-error rate after FEC  $p_s^{FEC}$ , is given by [2]

$$p_S^{FEC} \approx \frac{1}{2^m - 1} \sum_{j=t+1}^{2^m - 1} j \binom{2^m - 1}{j} p_S^j (1 - p_S)^{2^m - 1 - j}$$
 (1)

where  $t = \left\lfloor \frac{n-k}{2} \right\rfloor$  is the symbol-error correcting capability of the code,  $p_e$  is the BER before FEC, and  $p_S$  is the symbol-error rate which can be expressed, under the assumption of purely random bit errors, as

$$p_S = 1 - (1 - p_e)^m. (2)$$

Again, assuming a memoryless channel and since we are using orthogonal signaling (ON-OFF keying), the BER after FEC  $p_e^{FEC} \approx p_S^{FEC}/2$ . As shown in Fig. 6, the RS(255, 239) codes provide a coding gain of  $\sim$ 4 dB, which is 2.2 dB more than that achieved by the (64, 57) Hamming codes. This result is expected because RS codes have a better error-correcting capability than Hamming codes [2]. Note that, while Hamming codes correct bit errors, RS codes correct symbol errors. It should also be noted that the experimental results with Hamming codes have been compared with the theoretical performance of RS codes in a memoryless channel. In practice, the coding gain obtained by the RS codes in a GPON uplink will be lower because of the memory associated with the channel. Based on the these initial results, we propose the

use of binary codes for correcting errors that arise due to intrisic effect (error introduced at receiver) of the channel, and in conjunction employing nonbinary codes for errors that arise due to extrinsic effect (errors introduced by fiber link) of the channel. It has been shown in theory that such concatenated coding schemes achieve higher coding gains, and further relax the requirements and/or increase the optical link budget of GPONs. This work is subject to further experimentation<sup>3</sup>.

## IV. CONCLUSION

In conclusion, we successfully designed and implemented a 5 Gb/s BM-CPA based on a  $2\times$  oversampling LO and a phase picking algorithm. The receiver achieves instantaneous phase acquisition for any phase step between consecutive packets with PLR  $< 10^{-6}$ . The price to pay is faster electronics. However, our CDR-free BM-CPA greatly reduces the complexity of electronics, providing a cost-effective solution for GPON receivers. We note that a sensitivity penalty results from the quick extraction of the decision threshold and clock phase from a short preamble at the start of each packet [6]. However, by reducing the phase acquisition time, as demonstrated in this work, more bits are left for amplitude recovery, thus reducing the burst-mode sensitivity penalty. Alternatively, with the reduced number of preamble bits, more bits can be left for the payload, thereby increasing the information rate.

The receiver also features a (64, 57) Hamming decoder that accomplishes a 1.8-dB coding gain at a BER  $= 10^{-10}$ . The coding gain can be used to reduce the penalty due to MPN, reduce transmitter power, increase receiver sensitivity, achieve a longer physical reach, or support more splits per single PON tree. Off-the-shelf components for FEC are available for throughput below 1 Gb/s. However, with the scaling of GPON data rates as they reach 10 Gb/s and beyond, the power consumption and complexity of these FEC devices will be the main barrier to integrate them into optical communication systems at low cost.

## V. ACKNOWLEDGMENT

This work was supported by Bell Canada and National Sciences and Engineering Research Council of Canada (NSERC) Industrial Research Chair (IRC).

## REFERENCES

- J. Faucher, M. Y. Mukadam, A. Li, and D. V. Plant, "622/1244 Mb/s burst-mode CDR for GPONs," in *Proc. IEEE LEOS Annual Meeting*, Montreal, QC, Canada, Oct. 2006, Paper TuDD3.
- [2] S. Lin, D. J. Costello, Error Control Coding, 2nd ed. Prentice Hall, 2004.
- [3] M. Y. Hsiao, "A Class of Optimal Minimum Odd-weight-column SEC-DED Codes," *IBM J. Res. Develop.*, pp. 395-401, July 1970.
- [4] Gigabit-capable Passive Optical Networks: Physical Media Dependent layer specification, ITU-T Recommendation G.984.2., 2003.
- [5] G. P. Agrawal, P. J. Anthony, and T.-M. Shen, "Dispersion penalty for 1.3μm lightwave systems with multimode semiconductor lasers," *IEEE J. Lightw. Technol.*, vol. 6, no. 5, pp. 620-625, May 1988.
- [6] P. Ossieur, X.-Z. Qiu, J. Bauwelinck, and J. Vandewege, "Sensitivity penalty calculation for burst-mode receivers using avalanche photodiodes," *IEEE J. Lightw. Technol.*, vol. 21, no. 11, pp. 2565-2575, Nov. 2003.

<sup>3</sup>We are currently implementing RS(255, 239) and low-density parity check codes on 5 Gb/s and 10 Gb/s hardware.