A generic optimised 16-bit LCD adaptor for the Arduino

There are many TFT modules available on the market that are designed to connect directly to an MCU to provide a full colour graphical display, just search ebay for “tft module” to see what I mean.

Unfortunately for Arduino users the majority of these modules expose a 16-bit interface and are designed to be connected to ARM devices. The way I see it there are two problems to be solved here in order to allow them to be connected to an Arduino. Firstly, they are always designed to be run at 3.3V and secondly the 16-bit data bus uses up a ton of GPIO pins.

I’m going to present a combination of hardware and software that solves both of those problems and also shows how you can achieve extremely high performance with a few driver optimisation tricks.

The hardware adaptor

I decided to design an adaptor that would feature 5V to 3.3V level conversion as well as a latch that would allow me to reduce the pins required by the data bus from 16 to 8 at the expense of an additional pin for controlling the latch. A total saving of 7 pins over the simplistic solution.

How does the latch work?

The latch works sort of like a small memory. In ‘transparent’ mode signals pass right through it and come straight out the other side. When you flip to ‘latched’ mode the latch ignores its inputs and continues to output the last values it received on those inputs.

I can use these properties of the latch to push out two 8-bit values in sequence and have the latch ‘expand’ them to 16-bits.

The diagram shows the two phases of the operation of the latch. In the first phase I push out the low 8-bits which gets routed through and around the latch to occupy both the low (which I want) and the high (which I don’t want) bytes of the output. In the second phase I lock the latch and then push out the high 8-bits. The latch ignores this new data and so the only route is around the latch into the correct position in the outputs.

Another advantage of the latched design is that it dovetails nicely with the external memory (XMEM) peripheral that comes built in to the Arduino Mega 1280 and 2560. With a little bit of clever code I should be able to use the XMEM interface to drive the LCDs.

The schematic




Click the image to download a PDF

The schematic pulls together the level conversion, latch, a connector designed to directly fit the Arduino Mega and an output connector for the 16-bit LCD. I’ll explain each sub-section of the schematic here.

The Arduino Mega Connector

This a 10×2 male pin header with pin assigments designed to match the Arduino Mega’s XMEM interface that is exposed in the block of pins starting at 23. The letter ‘H’ in the net designators indicates that these are high level 5V signals. I should probably have used ‘T’ for TTL but there you go.

The data and control signals are all there, including CS (chip-select) that I map to ALE (address latch enable) on the XMEM interface even though this signal is rarely required by the LCD panels and can usually be just tied to ground.

The level converters

The SN54LVTH2245 octal bus transceiver from Texas Instruments is an ideal choice for level conversion in this design. The propagation delay at 3.3V is around 2ns which is very fast. The inputs and outputs are on opposite sides of the package which simplifies routing and it’s easy to find the device on the market.

The latch

The SN74LV573APW transparent D-Type latch, again from Texas Instruments, satisfies the speed requirements imposed by the XMEM interface and is easy to find on the market. Latches, like bus transceivers are readily available in all shapes and sizes and there’s nothing to stop you using different parts as long as the specifications are similar to these.

The control header

This is a 2×2 pin male header that I’ll use to provide the vital power and ground supplies from the Arduino board. The two ‘user’ inputs U1 and U2 can be used to convert any arbitrary output signal from the Arduino from 5V down to 3.3V. I added these because I had some unused ports on the level converters and it did seem a shame to waste them. Later on you will see how I use one of these to provide a PWM backlight control to the LCD panels.

The output connector

This connector provides the output signals for the LCD. In my design I’ll be using a right-angled male header to simplify the rats nest of interconnecting wires that need to be connected here.

As well as the output signals there are pairs of power and ground outputs. You can’t have too many of those!

Bill of materials

Here’s a table showing the full bill of materials (BOM) for this design.

Description Designator Footprint
Ceramic capacitor, 100nF C1, C2, C3 0603
Ceramic capacitor, 10µF C4 0805
Header, 13×2, right angle P1 HDR2X13H
Header, 10×2 P2 HDR2X10
Header, 2×2 P3 HDR2X2
SN74LV573APW transparent D-Type latch U1 TSOP-20
SN54LVTH2245 octal bus transceiver U2, U3 SOIC-20


The PCB design







Click images for PDFs

Manually routing the board didn’t present any real challenges and I was able to fit the whole design into a 45mm by 35mm PCB. Thankfully this time I’d remembered to check that the level converters had inputs and outputs on opposing sides of the package instead of adjacent to each other.

Building the board

Once the PCB layout was finalised and the Gerber CAM files exported I then uploaded them to ITead Studio’s manufacturing service and waited the requisite 2-3 weeks for them to arrive in the post.

As usual the quality is spot-on with no defects evident at all. This time I went with a blue solder mask so that it would match the PCB colour of the Arduino Mega that it’s designed to slot into.

Now it’s time to build it.

My procedure for building a PCB with surface mount components remains unchanged. Firstly I tin all the pads with a soldering iron and some highly active flux that makes for easy drag soldering across the small IC pads. Then I thoroughly clean all the flux residue off the board because the acids can attack the board over the long term.

Now the pads are tinned I use a hot-plate to reflow the ICs into place and touch up the reflowed joints afterwards under a binocular microscope. The discrete components are then reflowed into place with my Aoyue 852A hot air gun and finally the pin headers are soldered into place with an ordinary soldering iron. The final touch is to wash the boards in hot soapy water and leave to dry for 24 hours.

Now I’ve got something that I can play with it’s time to see if I can come up with some driver software that will do it justice. Something tells me I can…

The software drivers

My previous Nokia QVGA reverse engineering efforts that you can find documented extensively on this site have left me with a flexible and expandable driver that will make a good base from which to add support for this new adaptor.

The driver code is separated into access modes and panel drivers that are bound together with a graphics library at compile-time by using C++ templates. To support this new adaptor I will need to write at least one new access mode and as many panel drivers as are required to support the panels that I own.

Let’s get started by taking a look at the 16-bit XMEM access mode.

The 16-bit xmem access mode

The XMEM interface built into the Arduino Mega MCU is designed to provide access to external memories with a 16-bit address bus and an 8-bit data bus. It just so happens that the protocol for accessing these memories is close enough to the 8080 protocol required by the LCDs that I can sometimes use it successfully. I say ‘sometimes’ because external factors such as the panel timing parameters do come into play.

So, when the Arduino Mega wants to write to an external memory it does the following:

  1. ALE is driven high.
  2. It drives the 16-bit address on to the bus where the lower 8 bits are multiplexed with the data lines.
  3. ALE is driven low, locking in the lower 8-bits of the address.
  4. It pules the WR line from high to low and to high again. Data is transferred on the rising edge of the pulse.

I can make use of this sequence to transfer 16-bits to the LCD in a single transaction that also includes the RS (register/data select) line. I will use the lower 8-bits of the 16-bit address to transfer the lower 8-bits of LCD data. I will use address line A8 to hold the RS signal and I’ll use the 8 data lines to transfer the upper 8-bits of LCD data. I will also set bit 15 of the address so that I’m guaranteed to be in the external memory range and will not collide with the internal 8Kb of SRAM.

Let’s take a look at the pinout for this access mode:

Arduino Port Function
22 PA0 D0/D8
23 PA1 D1/D9
24 PA2 D2/D10
25 PA3 D3/D11
26 PA4 D4/D12
27 PA5 D5/D13
28 PA6 D6/D14
29 PA7 D7/D15
35 PC2 /RESET
37 PC0 RS
39 PG2 /CS
41 PG0 /WR


Initialising the XMEM interface is a simple matter of writing the correct values to the two XMEM registers. The XMEM pins 30..34 (PC3..PC7) are free’d up for GPIO.

inline void Xmem16AccessMode::initialise() {

  // set up the reset pin
  pinMode(RESET_PIN,OUTPUT);
  digitalWrite(RESET_PIN,HIGH);

  // set up the xmem registers

  // free PC3..PC7 for user GPIO
  XMCRB=_BV(XMM1) | _BV(XMM2);    

  // enable xmem, no wait states
  XMCRA=_BV(SRE);
}

Because performance is absolutely paramount I wrote the data and command output functions in AVR assembly language.

inline void Xmem16AccessMode::writeData(uint8_t lo8,uint8_t hi8) {

  // this is equivalent to:
  // *reinterpret_cast<volatile uint8_t *>(0x8100 | lo8)=hi8;
  // this method costs 5 clock cycles

  __asm volatile("  ldi r27,0x81  \n\t"
                 "  mov r26,%0    \n\t"
                 "  st  X,%1      \n\t"
                 :: "d" (lo8), "d" (hi8)
                 : "r26", "r27");
}

In AVR assembly language the combination of r26 and r27 make up a 16-bit X register that we use in the above code to send the 8-bit data value to the external memory. It’s also helpful that r26 and r27 are free for use in inline assembly code without fear of colliding with code emitted by the C++ compiler. Nevertheless I do declare them in the ‘clobber’ list at the end of the __asm section.

The method used to write a command is almost the same, except bit A9 is set to zero:


inline void Xmem16AccessMode::writeCommand(uint8_t lo8,uint8_t hi8) {

  // this is equivalent to: 
  // *reinterpret_cast<volatile uint8_t *>(0x8000 | lo8)=hi8;
  // this method costs 5 clock cycles

  __asm volatile("  ldi r27,0x80  \n\t"
                 "  mov r26,%0    \n\t"
                 "  st  X,%1      \n\t"
                 :: "d" (lo8), "d" (hi8)
                 : "r26", "r27");
}

To see this access mode in action we need a logic analyser so I hooked up the output of the adaptor to my Ant18e and captured the lower 8 data lines, RS and WR.




Click for larger

I can see from the section between the two timing markers that the AVR core pulses WR for exactly one clock cycle.

Some of the panels that I have are not happy with the tight timings provided by the XMEM interface, even when I add some address and data hold cycles so for this reason I decided to branch out and build an optimised GPIO access mode. I’m glad that I did because in some areas the optimisations that I created resulted in a huge performance increase.

The 16-bit gpio access mode

The 16-bit GPIO access mode emulates the automatic operation of the XMEM interface by using ordinary GPIO. There are advantages and disadvantages to this approach. Single writes are going to be a few cycles slower than the XMEM interface but for multiple writes of the same value I can optimise heavily to create a routine that will outperform anything else as long as the panel supports 16-bit (64K colour) mode. The other advantage is that I’m not restricted to the XMEM pins and I could even get it to work on the ordinary 32Kb Arduino.

Gpio16LatchAccessMode is a template that takes a type containing my port and pin mappings as its parameter. For example, using the same ports and pins as Xmem16AccessMode I declare this type:

struct Gpio16LatchAccessModeXmemMapping {
    enum {

      // ports are the I/O index, not the physical address

      PORT_DATA  = 0x02,    // PORTA
      PORT_WR    = 0x14,    // PORTG
      PORT_RS    = 0x08,    // PORTC
      PORT_ALE   = 0x14,    // PORTG
      PORT_RESET = 0x08,    // PORTC

      // pins are the 0..7 port index, not the arduino numbers

      PIN_WR    = PIN0,
      PIN_RS    = PIN0,
      PIN_ALE   = PIN2,
      PIN_RESET = PIN2
    };
  };

The Gpio16LatchAccessMode template looks like this. You can see that I also provide a concrete instantiation of the template using a typedef for easy use.

template<typename TPinMappings>
class Gpio16LatchAccessMode {

  protected:
    static uint8_t _streamIndex;
    static void initOutputHigh(uint8_t port,uint8_t pin);

  public:
    static void initialise();
    static void hardReset();

    static void writeCommand(uint8_t lo8,uint8_t hi8=0);
    static void writeCommandData(uint8_t cmd,uint8_t data);
    static void writeData(uint8_t lo8,uint8_t hi8=0);
    static void writeMultiData(uint32_t howMuch,uint8_t lo8,uint8_t hi8=0);
    static void writeStreamedData(uint8_t data);
};

typedef Gpio16LatchAccessMode<Gpio16LatchAccessModeXmemMapping> DefaultMegaGpio16LatchAccessMode;

I’ll skip the initialise method because I’m sure you’re not particularly interested in code that just sets the direction and level of GPIO pins. Let’s get straight on to the writeData() and particularly the writeMultiData() methods. Firstly, writeData(), the method that writes out a single value:

template<class TPinMappings>
inline void Gpio16LatchAccessMode<TPinMappings>::writeData(uint8_t lo8,uint8_t hi8) {

  __asm volatile(
      "  sbi %1, %5   \n\t"     // ALE   = HIGH
      "  out %3, %7   \n\t"     // PORTA = lo8
      "  sbi %2, %6   \n\t"     // RS    = HIGH
      "  cbi %1, %5   \n\t"     // ALE   = LOW
      "  out %3, %8   \n\t"     // PORTA = hi8
      "  cbi %0, %6   \n\t"     // /WR   = LOW
      "  sbi %0, %6   \n\t"     // /WR   = HIGH

      :: "I" (TPinMappings::PORT_WR),     // %0
         "I" (TPinMappings::PORT_ALE),    // %1
         "I" (TPinMappings::PORT_RS),     // %2
         "I" (TPinMappings::PORT_DATA),   // %3
         "I" (TPinMappings::PIN_WR),      // %4
         "I" (TPinMappings::PIN_ALE),     // %5
         "I" (TPinMappings::PIN_RS),      // %6
         "d" (lo8),                       // %7
         "d" (hi8)                        // %8
        );
}

Straightforward stuff. We just bit-bang the 8080 protocol with the additional overhead of the latch programming.

Now, it’s very common in graphics libraries to write out solid blocks of colour. Whether you’re clearing the screen, drawing rectangles or just straight lines it all comes down to setting an ‘output window’ on the display and then pumping out pixels.

The writeMultiData() method takes advantage of the fact that you only need to set the data and RS lines once and then you can toggle the /WR line as often as you need to in order to write out your block of pixels. This technique was first suggested to me by Andrew, the author of this instructable. Here’s my take on the method.

template<class TPinMappings>
inline void Gpio16LatchAccessMode<TPinMappings>::writeMultiData(uint32_t howMuch,uint8_t lo8,uint8_t hi8) {

  __asm volatile(
    "    sbi  %9, %7       \n\t"      // ALE   = HIGH
    "    out  %1, %2       \n\t"      // PORTA = lo8
    "    sbi  %3, %8       \n\t"      // RS    = HIGH
    "    cbi  %9, %7       \n\t"      // ALE   = LOW
    "    out  %1, %4       \n\t"      // PORTA = hi8

    "    clr  r20          \n\t"      // r20 = global interrupt status
    "    brid intdis%=     \n\t"      // branch if global interrupts are off
    "    cli               \n\t"      // disable global interrupts (because we cannot let PORTG get modified by an IRQ)
    "    inc  r20          \n\t"      // r20 (global interrupts disabled) = 1
    "intdis%=:             \n\t"
    "    in   r18, %0      \n\t"      // get PORT_WR to r18 and r19
    "    mov  r19, r18     \n\t"
    "    cbr  r18, %6      \n\t"      // clear WR in r18
    "    sbr  r19, %6      \n\t"      // set WR in r19
    "    cpi  %A5, 40      \n\t"      // if howMuch<40 then jump to lastlot
    "    cpc  %B5, r1      \n\t"
    "    cpc  %C5, r1      \n\t"
    "    brsh batchloop%=  \n\t"
    "    rjmp lastlot%=    \n\t"
    "batchloop%=:          \n\t"
    "    out  %0,  r18     \n\t"      // toggle /WR 40 times
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    subi %A5, 40      \n\t"        // subtract 40 from howMuch
    "    sbci %B5, 0       \n\t"
    "    sbci %C5, 0       \n\t"
    "    cpi  %A5, 40      \n\t"        // if howMuch >= 40 then go back for another batch
    "    cpc  %B5, r1      \n\t"
    "    cpc  %C5, r1      \n\t"
    "    brlo lastlot%=    \n\t"
    "    rjmp batchloop%=  \n\t"
    "lastlot%=:            \n\t"        // load index Z with the address of the end
    "    ldi  r31, pm_hi8(finished%=)   \n\t"
    "    ldi  r30, pm_lo8(finished%=)   \n\t"
    "    lsl  %A5          \n\t"        // multiply remaining by 2
    "    sub  r30, %A5     \n\t"        // subtract remaining*2 from Z
    "    sbci r31, 0       \n\t"
    "    ijmp              \n\t"        // jump to Z to finish off the writing
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "    out  %0,  r18     \n\t"
    "    out  %0,  r19     \n\t"
    "finished%=:           \n\t"
    "    cpi  r20, 0       \n\t"          // if global interrupts were enabled when we came in, restore them now
    "    breq skipinten%=  \n\t"
    "    sei               \n\t"
    "skipinten%=:          \n\t"

    :: "I" (TPinMappings::PORT_WR),     // %0
       "I" (TPinMappings::PORT_DATA),   // %1
       "d" (lo8),                       // %2
       "I" (TPinMappings::PORT_RS),     // %3
       "d" (hi8),                       // %4
       "d" (howMuch),                   // %C5,%B5,%A5
       "I" (_BV(TPinMappings::PIN_WR)), // %6
       "I" (TPinMappings::PIN_ALE),     // %7
       "I" (TPinMappings::PIN_RS),      // %8
       "I" (TPinMappings::PORT_ALE)     // %9
    : "r18", "r19", "r20", "r30", "r31"
  );
}

This is how it works.

  1. We program RS and set up the 16 data lines.
  2. If interrupts were enabled when we came in then I disable them. As well as ensuring a consistent timing this ensures that an ISR cannot change the port values that I set up in (3).
  3. The port holding WR is read and I cache this value in registers with the WR bit set and reset.
  4. If there are fewer than 40 pixels to write then go to step 7.
  5. Write out 40 pixels using consecutive out commands. This achieves an 8 megapixel/second fill rate.
  6. Subtract 40 from the pixel counter and if more than 40 remain then go back to step 5.
  7. Calculate an indirect jump into the trailing 39 out commands that will fill the remaining pixels and use ijmp to go there.
  8. If we disabled interrupts then restore them.

The number 40 was chosen somewhat arbitrarily as a trade off between flash memory usage and performance. I tried values of 10, 20, 30, 40 and stopped there as the increase in speed was getting smaller and smaller to the point of irrelevance.

Once again, to see the effect of this code a logic analyser is required. Hooking up the trusty Ant18e yielded these captures.







Click for larger

The first capture shows the effect of the optimised /WR toggling. As expected we are achieving a fill rate of 8 megapixels/second.

The second capture shows how much overhead is added by the code that subtracts 40 from the total and jumps back to execute the next block of 40 pixels. It is 1 microsecond.

So is it worth this hand-optimisation? Yes it definitely is. In the following sections where I show the panel drivers that I’ve written, the accompanying videos are all running this driver. You can see how fast the screen clearing, rectangle drawing, gradient drawing and solid ellipse drawing operations are for yourself.

Optimising the line drawing algorithm

The first versions of my driver code featured the Extremely Fast Line Drawing Algorithm. It seemed to perform well and so I didn’t turn my attention to optimising it.

Recently however, I was tipped off that the EFLA was suboptimal on an MCU without an intrinsic divide instruction and that the classic Bresenham would probably be be better. Time for some profiling. I implemented the Bresenham algorithm using the pseudo-code in the Wikipedia article under the Simplification header.

My pseudo-random line drawing example code runs over a fixed 5 second period. These are the timings that I got on an ILI9325 panel in 64K colours and portrait mode:

Algorithm Lines drawn in 5 seconds
EFLA 1055
Bresenham 1110


Sure enough Bresenham is slightly faster. Not much, but enough to get me thinking about whether it could be further optimised for this library. In producing an optimised implementation we rely on some basic facts:

  1. All the pixels in the line are the same colour.
  2. The panels provide ‘windowed’ output with auto-increment in the X-direction.
  3. The panels allow X and Y positions to be changed independently.

Here’s the resulting code:

template<class TDevice,class TAccessMode>
inline void GraphicsLibrary<TDevice,TAccessMode>::drawLine(const Point& p1,const Point& p2) const {

  // optimisation for straight lines. filling rectangles is much more efficient than plotting points

  if(p1.X==p2.X)
    fillRectangle(Rectangle(p1.X,Min(p1.Y,p2.Y),1,Abs(p2.Y-p1.Y)+1));
  else if(p1.Y==p2.Y)
    fillRectangle(Rectangle(Min(p1.X,p2.X),p1.Y,Abs(p2.X-p1.X)+1,1));
  else {

    int16_t x0,x1,y0,y1;

    x0=p1.X;
    y0=p1.Y;
    x1=p2.X;
    y1=p2.Y;

    if(x0>x1) {
    
      // the optimiser does this swap method faster than
      // the xor-trick

      int16_t t;

      t=x0;
      x0=x1;
      x1=t;

      t=y0;
      y0=y1;
      y1=t;
    }

    // calculate constants up-front

    int16_t dx=x1-x0;
    int16_t dy=Abs(y1-y0);
    int16_t sy=y0<y1 ? 1 : -1;
    int16_t mdy=-dy;
    int16_t err=dx-dy;
    bool xinc;

    // set the drawing rectangle that we need and plot the first point

    this->moveTo(x0,y0,this->getXmax(),this->getYmax());
    this->beginWriting();
    this->writePixel(_foreground);

    while(x0!=x1 || y0!=y1) {

      int16_t e2=2*err;

      if(e2>mdy) {

        err-=dy;
        x0++;

        // make a note that X has incremented

        xinc=true;
      }
      else
        xinc=false;       // nothing happened to X

      if(x0==x1 && y0==y1) {

        if(xinc) {

          // plot the pending X increment before returning

          this->writePixelAgain(_foreground);
          break;
        }
      }

      if(e2<dx) {
        err+=dx;
        y0+=sy;

        // Y has changed. We're going to have to do a complete
        // pixel write after we've moved the bare minimum of
        // window pointers

        if(xinc)
          this->moveX(x0,this->getXmax());

        this->moveY(y0,this->getYmax());

        this->beginWriting();
        this->writePixel(_foreground);
      }
      else {

        // Y has not changed, if X has changed then all we need
        // to do is push out another pixel

        if(xinc)
          this->writePixelAgain(_foreground);
      }
    }
  }
}

The algorithm should be recognisable from the Wikipedia article. I initially set a window that extends from the first point to the far right and bottom of the panel. This means that in cases where two horizontal pixels have to be written out together I can omit the cursor movement entirely because the panel has done it automatically for me.

When I do have to move the cursor I am careful to only move it in the horizontal or vertical direction that’s required.

Running the line demo with this optimised implementation results in a score of 2060 lines, a solid 85% speed increase for very little effort. Here’s a chart showing the final scores:

The actual speed increase obtained is variable and the flatter the line the better the increase. My demo draws pseudo-random lines so the speed increase that I quote could be viewed as an average.

Now that I have a solid pair of access modes I can move on to the driver templates that call upon their services to perform primitive operations.

ILI9325

The ILI9325 is a very common QVGA controller found in many of the standalone panels that you see on the market as well as in panels that ship with MCU development boards. The ILI9325 panel that I have came packed with an STM32 development board, but there’s nothing to stop me detaching it and hooking it up to the Arduino as I’ve done here.

I’ve provided support for portrait and landscape orientations in the 64K (16-bit) colour mode. Hardware scrolling is supported in both orientations. To use the driver you need to include the header file.

#include "Generic16BitILI9325.h"

Typedefs are provided for all combinations of orientations and access modes. In all my examples I declare a panel using code similar to this:

typedef ILI9325_Landscape_64K_Gpio16Latch TftPanel;
//typedef ILI9325_Portrait_64K_Gpio16Latch TftPanel;
//typedef ILI9325_Landscape_64K_Xmem16 TftPanel;
//typedef ILI9325_Portrait_64K_Xmem16 TftPanel;
//TftPanel *tft=new TftPanel;

The commented out drivers are provided for reference. The first two are GPIO access modes and the last two are the XMEM modes.


My touch screen is looking a bit scuffed

Version 3.0.0 and above of the xmemtft library provides support for the ILI9325 and includes a suite of demos that you can use to get started. Here’s a video that shows the graphics library demo in action using the optimised GPIO driver.



HX8347A

The HX8347A controller from Himax is a 320×240 (QVGA) device commonly featured in the LCD boards that you can get on ebay.

I’ve provided support for portrait and landscape orientations in the 64K (16-bit) colour mode. Hardware scrolling is supported in both orientations. To use the driver you need to include the header file.

#include "Generic16BitHX8347A.h"

Typedefs are provided for all combinations of orientations and access modes. In all my examples I declare a panel using code similar to this:

typedef HX8347A_Landscape_64K_Gpio16Latch TftPanel;
//typedef HX8347A_Portrait_64K_Gpio16Latch TftPanel;
//typedef HX8347A_Landscape_64K_Xmem16 TftPanel;
//typedef HX8347A_Portrait_64K_Xmem16 TftPanel;
//TftPanel *tft=new TftPanel;

The commented out drivers are provided for reference. The first two are GPIO access modes and the last two are the XMEM modes.

Version 3.0.0 and above of the xmemtft library provides support for the HX8347A and includes a suite of demos that you can use to get started. Here’s a video that shows the graphics library demo in action using the optimised GPIO driver.



Get the source code from github

As of version 3.0.0 you can now find all the source code on github.com. If you’re interested in extending the library or just curious as to how it works then please feel free to get involved.

If you don’t have or want a github account then you can download the pre-packaged release from my downloads page.

License change

This new release is now licensed under the terms of the Apache License, version 2. Previous versions used the BSD license and the reason for the change is primarily the migration of the source code to Github. The Apache license preserves all the rights that the BSD license conveyed, formally recognises and protects the role of the contributor and includes protection against patent abuse.

Some boards for sale

As usual this project has left me with a small surplus of boards that I’m going to sell on a first come first served basis. All boards are fully constructed just like in the pictures featured in this article. All you need to do is supply a TFT board with a 16-bit interface to connect to it. If your TFT supports the 64K colour mode then you can use the optimised GPIO driver featured in this article.

If your TFT controller is one of those featured in this article then it should work ‘out-of-the-box’, otherwise you will need to write the driver code yourself, something I will provide help and advice with.


Location




  • CunningFellow

    WRT – "Some of the panels that I have are not happy with the tight timings provided by the XMEM interface, even when I add some address and data hold cycles so for this reason"

    Is it simply a case of race? As per the 2nd step of the Asteroid Belt instructable I had to add a schottky, cap and resistor to guarantee timing requirements where met on some (rats nest) layouts. It's only about 5ns between a pass and fail on those datasheets.

    I was a bit surprised that bresh only beat EFLA by such a small margin. I guess AVR-GCC is not as bad at divides as I suspected.

    For further optimizing – You can break the inner loop into 8 sectors. Then the little avr can save a clock off each step (at the cost of some decisions at the start)

    Also "set window" is very expensive. so I actually add another 4 inner loops just for the H and V lines. That way each preamble is a "set write direction" which is only one register rather than set window which is 5 register writes

    If you are comfortable linking ASM to your library code, you can have a look at my line draws (Not the version posted yet, the WIP version I have here)

    Apart from the small boost it would give you with the 8+4 sectored idea above, it actually gets a huge boost by dropping down to 8 bit math where it can. This is something that gets very messy to try do in C.

    • http://www.andybrown.me.uk Andy Brown

      Hi Andrew,

      It could be a race but I did take logic analyser measurements at 500Mhz sampling rate from the LCD end of the wires and the timing did look good. One thing that did concern me was that the Atmega only holds the address lines for 5ns after pulling the latch low which is very tight. The other thing was that the screen corruption was completely predictable. I could do move, plot at position (x,y) and it would be always 100% correct. Then I could do move, plot (x+1,y) and it would 100% fail with the logic analyser showing nothing wrong. Very odd. Anyway the GPIO driver with its optimised fills and line drawing works much better overall I think.

      I'm very happy to include assembly language in the driver where it optimises the performance, there's quite a lot in there already! Feel free to drop me a link to your code when you're happy to release it, or you could always fork the github repo and plug it in yourself if you want.

  • http://bobdavis321.blogspot.com Robert davis

    Great Job!! I love how you also demonstrated several different LCD’s. Thanks!

    • http://www.andybrown.me.uk Andy Brown

      Cheers, there's a few more I'd like to see drivers for like the SSD1289 and SSD1963. Maybe later…

      • WRS

        The SSD1289 would be awesome. Purchased one from sainsmart and the redraw rate is so slow.

        • http://www.andybrown.me.uk Andy Brown

          It would be easy to port across the driver I wrote for the STM32 but I wouldn't be able to test it with the SSD1289 screen that I've got because the useless ebay seller wired it up in such a way as to make it incompatible with windowed graphics output. <sigh>

          • Paul

            I have the exact same display and I'm willing to give it a shot to port the driver to arduino. However, I know nothing about the STM32 except that it's probably too cheap for what you get and that I want one. Do you have any pointers on where to start? By the way, the adapter arrived today, thanks for the great service. :)

          • http://www.andybrown.me.uk Andy Brown

            Hi Paul, you could give the port a go if you like. The driver structure for stm32plus and the Arduino (xmemtft) are very similar indeed.

            Regarding the STM32. The best place to start is with one of ST Micro's 'discovery' boards, for example the STM32 F4 Discovery. They are very cheap, especially considering that the F4 is a real powerhouse of an MCU, and they come with a built-in debugger (ST-Link) so you don't need any additional hardware to do PC-based visual debugging with Eclipse.

          • Paul

            Thanks, actually I meant where to start with a driver port though. I'm only just breaking free from the arduino IDE in favor of avr studio, but porting a display driver is no small step for me. Therefore I was hoping if you could nudge me in the right direction. Of course, if I manage to get it working, I'll make the code available so it can be packed with the library.

  • Camilo

    would it work with an arduino uno?

    • http://www.andybrown.me.uk Andy Brown

      Physically you could probably get it to work. A GPIO driver could achieve a high speed using all of port D for data with some of the other pins used for the control signals. I did write a driver for the Nokia displays that could be extended for the purpose:
      http://andybrown.me.uk/wk/2012/07/28/interfacing-

      The main problem is that 32Kb is going to be a tight fit for your program and any graphics that you need.

  • Jorge Jacome

    Hi Andy
    I just have a TFT LCD module with the ILI9481, and I would like to start doing some programming with it.
    Can you please put me in the right direction or step by step instruction how to start the initialization process, then to read and write data into/from the LCD.

    I did successfully learn and did program a 20×4 LCD.
    Many thanks for your help and for the information you posted on internet.

    regards
    Jorge Jacome
    Sydney

  • peter gims

    Hello Andy, I found your great webpage and read some of your interesting doings.

    I try to work also with an ILI9325, it's working well in text and grafics. Also v-scrolling the complete screen it's working.

    Du you know a solution to only scoll a rectangle window area or horziontial band with a fixed top and botton area on a display with the ILI9325 controller ?

    It locks very limited for me. They start to implement a rudimentary scrolling but it's enough only for a very special application.

    regards, Pit

    • http://www.andybrown.me.uk Andy Brown

      Hi Pit,

      No unfortunately not. Although the spec for many of these controllers allows you to specify a fixed top and bottom area all those that I've tried have scrolled the entire display whatever you set for these values.