1

LPDDR4 in Agilex 3 FPGA

Presently (2025) there are two perspective busses for connecting DRAM devices in embedded systems:

  • DDR3L
  • LPDDR4 (and its lower-power variant LPDDR4X)

The DDR3L supports traditional DDR multidrop bus topology with VTT termination and multiple memory devices sharing the address/control bus. The LPDDR4 is a completely different protocol and allows just a single memory device attached to the controller, with the advantage that external termination not required (saving board space and power).

All other older SDRAM and DDR technologies are obsolete and going out of mass production.

LPDDR4 vs LPDDR4X

The difference between LPDDR4 and LPDDR4X devices is just the voltage levels (and thus power consumption). There is no difference in speed or bandwidth. The newer LPDDR4X uses lower voltage for data lines (DQ) – only 0.6V – to conserve power. Typical Micron devices (MT53E…) support both standards.

In LPDDR4 mode the voltage supply rails are:

  • VDD1 = 1.8V nominal
  • VDD2 = 1.1V nominal
  • VDDQ = 1.1V nomimal

In LPDDR4X mode:

  • VDD1 = 1.8V nominal
  • VDD2 = 1.1V nominal
  • VDDQ = 0.6V nominal

Nevertheless, the Agilex 3 supports just the LPDDR4 bus, not the -X variant. Since the performance is identical, only the power consumption is different, this may not matter much in the end.

Internal Organization of LPDDR4 Device

For reference, the official Agilex 3 FPGA C-Series Development Kit uses the LPDDR4 Micron device MT53E512M32D1ZW-046 IT:B TR.

This device is single-die (“D1”) dual-channel (2x16b = 32b) interface, organized 512M-lines x 32-bit = 16Gbit (2GB) total capacity. (This Micron device supports both LPDDR4 and LPDDR4X per the “E” option.)

LPDDR4 is organized in channels, dies, and die density. The picture below shows the internal organization of the particular 16Gbit Micron device found in the Devkit. The outer black rectangle represents the package of the device. The next inner blue rectangle represents the single die in the package. Finally, the inner two narrow black rectangles represent two independent channels (A and B) that independently access their own 8Gbit arrays (blue-filled boxes).

Notice that that the channels A and B are totally independent of each other, including separate memory arrays (the bits). Each channel has own chip-select (CS0), clock and clock-enable (CK, CKE) signals, control/address bus (CA, ODT) and the 16-bit data bus (DMI, DQ, DQS). The only common signals are power, reset and the external ZQ calibration resistor. The total capacity, as mentioned previously, is 16Gbit = 2GB.

To get a higher memory capacity for the system, we must use a memory device with two dies in one package. (Remember that the LPDDR4 bus, in contrast with regular DDRx, does not support multiple memory devices on the same bus, so we cannot just add a second package to the PCB.) The organization of a 32Gbit (twice the capacity) dual-channel dual-die device is shown below (e.g. MT53E1G32D2FW-046):

Previously, we had a package which implemented two channels and two 8Gbit memory arrays on a single die. Now we have a package with two dies that each implement one channel and one 16Gbit memory array; in total we have again two channels each with own array, only double the size. Therefore, the outside package interface of the dual-channel dual-die device is the same as dual-channel single-die; the number of signals on PCB is identical. Of course, the host controller must be configured to address the deeper memory arrays, but the DRAM devices are pin-to-pin compatible. The total capacity of this solution is 32Gbit = 4GB.

If 4GB is still not enough (of course it is not!), Micron can put four dies in one package: dual-channel quad-die device, shown below (e.g. MT53E2G32D4DT-046):

With four dies, things are getting complicated. From the outside the package is still dual-channel A/B, but internally it is in fact quad-channel a/b/c/d. To get around, channel signals are connected together inside the package (a||c -> A, b||d-> B), and there are two additional chip-select signals on the package called CS1_A and CS1_B that select between dies, plus two clock-enable signals CKE1_A/B. This paralleling of data channels is called dual-rank. The additional chip-selects and clock-enables must be routed on the PCB and must be supported by the host controller. Connecting data signals inside the package may create signal-integrity issues in some applications. The total capacity of this solution is 64Gbit = 8GB.

Memory Host Controller (EMIF) in Agilex 3

Agilex 3 FPGA supports LPDDR4 by two built-in memory controllers, called EMIF: one is reserved for the Hard Processor System (HPS, the Cortex-A) and the other is for the FPGA fabric. We will focus here on the EMIF reserved for fabric.

The EMIF supports one or two LPDDR4 channels, and 16 or 32-bit data bus. The 32-bit data bus is only allowed in the 1 channel configuration:

Number of Channels =2 (dual)1 (single)
Data DQ Width =1616 or 32

As mentioned previously, the Agilex 3 Devkit board uses the 16Gbit dual-channel single-die device (MT53E512M32D1ZW-046, capacity 2GB). However, as you see in part of the schematic of the board below, the memory device is connected on the PCB in single-channel 32-bit mode. I highlighted the Channels A and B with green and orange colors:

Control signals of channels A and B are connected together, while the data signals of both channels form a 32-bit bus. The advantage of this organization is less number of PCB signals for the control bus, at the cost of worse signal integrity because of multi-dropping of the said bus (reminiscent of regular DDRx technology).

The EMIF IP core in Devkit’s demo FPGA design is configured the following way (highlighting changes from defaults):

ParameterDescription
Number of Channels = 1One channel of EMIF controls both channels of the memory device.
Data DQ Width = 3232 bit data bus
Die Density = 8GbitsCapacity of each memory die (in Gbits), per channel per die. For dual-die packages, this is the density of each die, not the density of the full package. Legal values are: 1 to 16Gbits.
[The Devkit has 16Gbit dual-channel single-die memory device working as single-channel device with double data width (32b)… so I guess this somehow explains the 8Gbit value here – times two?]
CS Width = 1Specifies the total number of CS pins used by each channel. Legal values are: 1, 2
Memory Operating Frequency = 1066MHzSpecifies the frequency at which the memory interface will run.
Value is specified in megahertz
Legal values are: 800, 1066.667, 1333.333, 1600, 1866.667, 2133.333.
[In the case of this FPGA only the values 800MHz and 1066MHz are offered. The 1066MHz corresponds to DDR 2133MT/s and this is the fastest Agilex 3 supports.]
Reference Clock Frequency = 166.6MHzSpecifies the reference clock frequency for the EMIF IOPLL. This clock frequency signal must be provided to the FPGA on specific differential pins – see below.
Pin Swizzle Map = BYTE_SWIZZLE_CH0=1,0,X,X,X,X,2,3; PIN_SWIZZLE_CH0_DQS0=0,1,3,2,6,7,4,5; PIN_SWIZZLE_CH0_DQS1=15,14,13,12,9,11,8,10; PIN_SWIZZLE_CH0_DQS2=19,20,21,18,17,16,23,22; PIN_SWIZZLE_CH0_DQS3=25,30,24,31,27,26,28,29;The Quartus Prime software uses fixed pin placement for each supported memory
standard, as described in the Pin Placement topic in the EMIF Architecture chapter. The Pin Swizzle Map allows you to specify the ordering of the physical connections of each DQ pin on the memory component to the FPGA. Incorrect information entered in the Pin Swizzle Map may cause calibration errors and/or synthesis errors.
[This particular swizzle map is for EMIF at Bank 2A as connected in the Devkit.]
ParameterDescription
tREFI = 3906.0Average refresh interval. The Micron datasheet specifies 3.904us.
tMRR = 8.0MODE Register Read (MRR) command period. The Micron datasheet specifies min 8 clock cycles.

All other parameters are at defaults.

The picture below shows an auto-generated example system design in Quartus Platform Designer (QSYS) for the EMIF.

Yellow-marked conduits of the emif IP are the signals exported out of the platform design as FPGA pins. The following code shows the signal definition in Verilog, including the vector width of each signal:

output wire [0:0]  emif_io96b_lpddr4_0_mem_0_mem_cs,            // emif_io96b_lpddr4_0_mem_0.mem_cs
output wire [5:0]  emif_io96b_lpddr4_0_mem_0_mem_ca,            //                          .mem_ca
output wire [0:0]  emif_io96b_lpddr4_0_mem_0_mem_cke,           //                          .mem_cke
inout  wire [31:0] emif_io96b_lpddr4_0_mem_0_mem_dq,            //                          .mem_dq
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dqs_t,         //                          .mem_dqs_t
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dqs_c,         //                          .mem_dqs_c
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dmi,           //                          .mem_dmi
output wire [0:0]  emif_io96b_lpddr4_0_mem_ck_0_mem_ck_t,       // emif_io96b_lpddr4_0_mem_ck_0.mem_ck_t
output wire [0:0]  emif_io96b_lpddr4_0_mem_ck_0_mem_ck_c,       //                          .mem_ck_c
output wire        emif_io96b_lpddr4_0_mem_reset_n_mem_reset_n, // f_io96b_lpddr4_0_mem_reset_n.mem_reset_n
input  wire        emif_io96b_lpddr4_0_oct_0_oct_rzqin,         // emif_io96b_lpddr4_0_oct_0.oct_rzqin
input  wire        ref_clk_clk,                                 // ref_clk.clk (100MHz)

LPDDR4 in the Circuit

The schematic part below shows how LPDDR4 signals are connected to the Agilex 3 FPGA (click to enlarge):

The FPGA has multiple banks (of GPIO pins) and not every bank could be used for EMIF. In Agilex 3 it is necessary to use HSIO (High-Speed IO) banks, which there are 2 in the FPGA device used in the devkit. The HSIO banks are named 2A and 3A. The EMIF for fabric memory is placed in 2A, and the EMIF for HPS (not discussed here) is placed in 3A.

The EMIF requires a reference clock signal from an external source, supplied at two differential pins in the bank 2A – the pins are highlighted in yellow above. The frequency of the reference signal is configured in the “Reference Clock Frequency” of the EMIF IP and it is 166.67MHz in the Devkit.

Other pins in the bank 2A – not occupied by the EMIF – are free to use by other functions on the board. In this case they are used mainly by MIPI.

Maximum Configuration

Reference EMIF design in the Devkit demo package is configured for the 2GB RAM device by these parameters: Die Density = 8Gbit && CS Width = 1. By changing the IP parametrization to Die Density=16Gbit && CS Width = 2 the EMIF can be reconfigured for the 8GB / 64Gbit quad-die RAM. After re-generating the HDL code in QSYS we get a slightly updated EMIF interface (in verilog):

output wire [1:0]  emif_io96b_lpddr4_0_mem_0_mem_cs,            //  emif_io96b_lpddr4_0_mem_0.mem_cs
output wire [5:0]  emif_io96b_lpddr4_0_mem_0_mem_ca,            //                           .mem_ca
output wire [1:0]  emif_io96b_lpddr4_0_mem_0_mem_cke,           //                           .mem_cke
inout  wire [31:0] emif_io96b_lpddr4_0_mem_0_mem_dq,            //                           .mem_dq
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dqs_t,         //                           .mem_dqs_t
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dqs_c,         //                           .mem_dqs_c
inout  wire [3:0]  emif_io96b_lpddr4_0_mem_0_mem_dmi,           //                           .mem_dmi
output wire [0:0]  emif_io96b_lpddr4_0_mem_ck_0_mem_ck_t,       //    emif_io96b_lpddr4_0_mem_ck_0.mem_ck_t
output wire [0:0]  emif_io96b_lpddr4_0_mem_ck_0_mem_ck_c,       //                                .mem_ck_c
output wire        emif_io96b_lpddr4_0_mem_reset_n_mem_reset_n, //emif_io96b_lpddr4_0_mem_reset_n
                                                                //      .mem_reset_n
input  wire        emif_io96b_lpddr4_0_oct_0_oct_rzqin,         //   emif_io96b_lpddr4_0_oct_0.oct_rzqin
input  wire        ref_clk_clk,                                 //   ref_clk.clk

The changes are in the wire widths of the CS and CKE signals: previously we had only one CS and one CKE, newly there are two of each kind: added CS[1] and CKE[1]. These two new signals must be mapped to correct FPGA pins and routed on the PCB to the DRAM.

Inspecting the schematic diagram we see the CS[0] is on pin 32 of LANE 2 (ball AF4), and CKE[0] is on pin 30 of LANE 2 (ball AH6):

The pins 31 and 33 are not used in the LANE 2 so far. By checking the EMIF pin-mapping table we see these pins are indeed reserved for the CS[1] and CKE[1]:

Therefore: CS[1] is on pin 33 (ball AG4) and CKE[1] is on pin 31 (ball AG6).

PCB Layout

The PCB layout of the Devkit board is available as Altium BRD file from Altera. The BRD file can be opened in the Altium Online Viewer. The board has 14 layers (!). The FPGA and DRAM are located curiously on the bottom side (layer 14). Their positions are marked in the following layout picture:

Routing between the FPGA and LPDDR4 DRAM is placed on layers 10 to 14 (i.e. only 5 layers are used). The following animation shows the 5 layers. Layers 11 (full) and 13 (mostly) are devoted to the ground plane. Solder balls of the FPGA (lower left corner) and LPDDR4 DRAM (upper right corner) are visible in Layer 14.

Note: all information presented in this page is based on analysis of public Altera documents.

Jarda

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.