iMX6 Baremetal Programming Notes

2020-05-31

/uploads/blog/2020/1590946909446-imx6_gpu_demo.jpg

Introduction

The following BM development are all based on the i.MX6 Platform SDK. The latest revision was released in 2013, so there would be some issues compiling with gcc9. All development are done on Ubuntu 20.04, targeting i.MX6D/Q. Most of the stuff should apply to i.MX6S/DL as well, but might require major changes for i.MX6UL/ULL/ULZ.

Compiling & Run

PlatformSDK's Makefile already has the command to generate target image. Simply compling it would generate image file that could be written into SD card or loaded via uuu. (uuu is the replacement for MfgTools)

It's possible to do things like sudo uuu application.bin to load the application to RAM and run. Or it could be written into a script like the following:

uuu_version 1.2.135

SDP: boot -f path_to_the_application.bin
SDP: done

If loading via JTAG is needed, then DDR initialization script for JTAG needs to be written separately. I haven't tried that yet.

Serial

On EVK, the default debugging serial port is UART4. If the target board doesn't use UART4, then it needs to be modified. The definition is located in board/common/hardware_modules.c中。

Memory Configuration

The configuration for DDR controller is located in the DCD area of the bootable image. If the DDR used is different from the EVK, then the DCD needs to be modified. These data are located in board/mx6dq/board/dcd.c. NXP's Programming Aid tool and DDR Stress Test tool and be used to generate a proper DDR configuration for the board.

Typically it would have a non-cached memory area for DMA use. The memory mapping code is located in sdk/core/src/mmu.c, and the DMA memory configuration is located in apps/common/platform_init.c. The following are the memory partition I use on a board with 1GB RAM:

mmu_map_l1_range(0x10000000, 0x10000000, 0x20000000, kOuterInner_WB_WA, share_attr, kRWAccess); // First 512MB
mmu_map_l1_range(0x30000000, 0x30000000, 0x20000000, kNoncacheable, kShareable, kRWAccess); // Last 512MB

HDMI output

1280x720 60Hz

The HDMI output example provided has some serious issues in the clock configuration. While it's set to be 1080p60 (148.5MHz pixel clock), the PLL output clock is set to about 75MHz, more close to the frequency of 720p60 (74.25MHz). However even if the resolution is set to 720p60, it still won't work, due to an error in the PLL code.

The code is located in board/common/board_hdmi.c. Based on the comment, the PLL3 PFD1 is set to 445MHz, and divide by 6 to get 74.15MHz, which is quite close to 74.25MHz. However, the actual clock is 454MHz instead of 445MHz. The actual frequency is 454/6=75.6 MHz, which is too high. I ended up setting PFD1 to 298MHz, and then divide by 4 to get 74.48MHz. Though a better approach would be using the dedicated video PLL provided by i.MX6, rather than using the limited PLL3 PFD1.

Here are the configuration I used:

void hdmi_clock_set(int ipu_index, uint32_t pclk)
{
    switch (pclk) {
    case 74250000:
        if (ipu_index == 1) {
            //clk output from 540M PFD1 of PLL3 
            HW_CCM_CHSCCDR.B.IPU1_DI0_CLK_SEL = 0;  // derive clock from divided pre-muxed ipu1 di0 clock
            HW_CCM_CHSCCDR.B.IPU1_DI0_PODF = 3; // div by 4
            HW_CCM_CHSCCDR.B.IPU1_DI0_PRE_CLK_SEL = 5;  // derive clock from 540M PFD
        }
        //config PFD1 of PLL3 to be 298MHz 
        BW_CCM_ANALOG_PFD_480_PFD1_FRAC(29);
        break;
    default:
        printf("the hdmi pixel clock is not supported!\n");
    }
}

1024x768 60Hz

As all other examples uses XGA resolution LVDS output, in order to run other example codes over HDMI output, I am adding XGA mode to the HDMI.

In hdmi_test function, select the DVI mode, and DMT mode 16:

    myHDMI_info.video_mode->mCode = 16;
    myHDMI_info.video_mode->mHdmiDviSel = 0;

Then adding the code to generate 65MHz clock. I am using 480MHz*18/19/7=64.96MHz.

    case 65000000:
        if (ipu_index == 1) {
            //clk output from 540M PFD1 of PLL3 
            HW_CCM_CHSCCDR.B.IPU1_DI0_CLK_SEL = 0;  // derive clock from divided pre-muxed ipu1 di0 clock
            HW_CCM_CHSCCDR.B.IPU1_DI0_PODF = 6; // div by 7
            HW_CCM_CHSCCDR.B.IPU1_DI0_PRE_CLK_SEL = 5;  // derive clock from 540M PFD
        }
        //config PFD1 of PLL3 to be 454MHz 
        BW_CCM_ANALOG_PFD_480_PFD1_FRAC(19);
        break;

In the IPU driver (sdk/drivers/ipu/src/ips_disp_panel.c), add timing of XGA 60Hz:

    {
     "HDMI XGA 60Hz",           // name
     HDMI_XGA60,                // panel id flag
     DISP_DEV_HDMI,             // panel type
     DCMAP_RGB888,              // data format for panel
     60,                        // refresh rate
     1024,                      // panel width
     768,                       //panel height
     65000000,                  // pixel clock frequency
     296,                       // hsync start width
     136,                       // hsync width
     24,                        // hsyn back width
     32,                        // vysnc start width
     3,                         // vsync width
     6,                         // vsync back width
     0,                         // delay from hsync to vsync
     0,                         // interlaced mode
     1,                         // clock selection, external
     0,                         // clock polarity
     1,                         // hsync polarity
     1,                         // vync polarity
     1,                         // drdy polarity
     0,                         // data polarity
     &hdmi_xga60_init,
     &hdmi_xga60_deinit,
     },

Similarly, in the HDMI driver (sdk/drivers/hdmi/src/hdmi_common.c), add support to DMT mode 16:

        switch (vmode->mCode) {
        case 16:
            vmode->mHActive = 1024;
            vmode->mVActive = 768;
            vmode->mHBlanking = 320;
            vmode->mVBlanking = 38;
            vmode->mHSyncOffset = 24;
            vmode->mVSyncOffset = 3;
            vmode->mHSyncPulseWidth = 136;
            vmode->mVSyncPulseWidth = 6;
            vmode->mHSyncPolarity = vmode->mVSyncPolarity = FALSE;
            vmode->mInterlaced = FALSE;
            vmode->mPixelClock = 6500;
        }

It's also recommended to add 65MHz clock support in the HDMI PHY driver (sdk/drivers/hdmi/src/hdmi_tx_phy.c). The default configuration is good enough for it to work though.

GPU

One thing to note is that GPU is quite memory heavy. It's necessary to set the AXI QoS to ensure IPU gets enough memory BW for screen refresh. Set based on the IPU used:

    //IPU QoS
    HW_IOMUXC_GPR6_SET(0xFFFFFFFF); // IPU1 QoS
    //HW_IOMUXC_GPR7_SET(0xFFFFFFFF); // IPU2 QoS

PCIe

I am only going to discuss the PCIe as EP mode. For RC mode, refer to the existing code.

About clock

For EP device, the 125MHz/ 250MHz clock should be generated from the 100MHz reference clock provided by the host. However the internal 125MHz PLL could support external 100MHz clock input. So external PLL chip is required.

The internal PLL is also not good enough for Gen2 PCIe clock requirement. If Gen2 5GT/s operation is desired, external PLL chip is needed.

About interrupt

In EP mode, it's possible to generate an interrupt on RC from EP (https://community.nxp.com/thread/321747), but not the other way around (https://community.nxp.com/thread/312874). Typically EP device would have a doorbell or malibox register that triggers the device to operate. Polling is a possible way, but if interrupt is desired, one possible approach would be mapping a peripheral on the BAR that's unrelated to PCIe. For example, mapping a timer there, and let the PCIe RC write to the timer to generate an interrupt on i.MX6.

About EP register

When the connection is interrupted (such as a reset sent from host), all EP registers would be reset to the powerup value, including but not limited to: VID, PID, device class, BAR configurations, etc. As the host is free to issue a reset at any time, some workaround is needed here.

About BAR

BAR0 to 4 are not identical.

BAR0 and BAR1 forms a 64bit BAR, type memory, prefetchable, mask 0xfffff, mask writable. BAR1 couldn't work independently even if only 32bit mode is used.

BAR2: 32bit addressing, type memory, prefetchable, mask 0xfffff, mask writable.

BAR3: 32bit addressing, type IO, mask 0xff.

These are pretty much non-documented feature. These information comes from https://community.nxp.com/thread/428633. My experimentation confirms these. It's not possible to set the BAR freely as described by the reference manual.

Everything is fixed except 2 masks. If BAR0 is set to 32bit, then BAR1 would be automatically disabled. Or they could be combined as a 64bit BAR. The size for BAR2 is modifiable, but only in 32bit mode. BAR3 doesn't allow any modification.

Default BAR configuration grabbed from Linux kernel boot log:

[    1.875727] pci 0000:04:00.0: reg 0x10: [mem 0xfaf00000-0xfaffffff 64bit pref]
[    1.883699] pci 0000:04:00.0: reg 0x18: [mem 0xfae00000-0xfaefffff pref]
[    1.891698] pci 0000:04:00.0: reg 0x1c: [io  0xe800-0xe8ff]
[    1.895718] pci 0000:04:00.0: reg 0x30: [mem 0xfbff0000-0xfbffffff pref]
[    1.903786] pci 0000:04:00.0: supports D1
[    1.907687] pci 0000:04:00.0: PME# supported from D0 D1 D3hot D3cold
[    1.911725] pci 0000:04:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:1c.0 (capable of 4.000 Gb/s with 5 GT/s x1 link)

Set BAR0 to 64KB non-prefetchable memory:

    HW_PCIE_EP_DEVICEID.U = 0xDEADBEAF;
    HW_PCIE_RC_REVID.U |= (PCI_CLASS_MEMORY_RAM << 16);
    HW_PCIE_EP_SSID.U = 0xDEADBEEF;

About memory mapping

For EP, the address is allocated by RC. For example the BAR0 is allocated to 0xfaf00000. Typically this should map to a fixed memory location in the EP device. The address translation is done by iATU. RC to EP access is inbound mapping, and EP to RC access is outbound maaping.

For inbound mapping, iATU could automatically map based on the address got from the RC, without software monitoring the BAR registers. The following functions maps the BAR0 to the specified address:

uint32_t pcie_map_inbound(uint32_t viewport, uint32_t tlp_type,
                          uint32_t addr_base_cpu_side, uint32_t bar)
{
    // configure as an inbound region
    HW_PCIE_PL_IATUVR_WR(BF_PCIE_PL_IATUVR_REGION_INDEX(viewport) |
            BF_PCIE_PL_IATUVR_REGION_DIRECTION(PCIE_IATU_VP_DIR_INBOUND));

    // configure target address
    HW_PCIE_PL_IATURLTA_WR(addr_base_cpu_side);
    HW_PCIE_PL_IATURUTA_WR(0);


    // configure TLP type
    HW_PCIE_PL_IATURC1_WR(BF_PCIE_PL_IATURC1_TYPE(tlp_type));

    // enable region and bar match type
    HW_PCIE_PL_IATURC2_WR(BF_PCIE_PL_IATURC2_BAR_NUMBER(0) |
            BF_PCIE_PL_IATURC2_RESPONSE_CODE(0x0) |
            BF_PCIE_PL_IATURC2_MATCH_MODE(1) |
            BF_PCIE_PL_IATURC2_REGION_ENABLE(1));

    return addr_base_cpu_side;
}

Note that in pcie_common.h, the definition for PCIE_IATU_VP_DIR_INBOUND and PCIE_IATU_VP_DIR_OUTBOUND are reversed.

Initialization process

  1. Enable PCIe PLL (pcie_clk_setup)
  2. Start PCIe controller in EP mode (pcie_init)
  3. Write PCIe configurations (device ID, subsystem ID, device class, BAR, etc.)
  4. Enable PCIe link
  5. Setup iATU address mapping

Some of the steps could be swapped.

TBH there aren't much info about how to properly program that. Maybe look into ADSP-SC589 Hardware Reference Manual?