X65-SBC Video-Terminal for the Forth OF816 Interpreter

In the previous post I have mentioned that I am working on a port of the 32-bit Forth interpreter OF816 to my X65-SBC, an 8/16-bit retro computer that I am building. The software runs on the X65-SBC in the 65C816 processor, but the user textual input and output was so far realized via the USB/UART interface terminated on a host PC in a terminal emulator (e.g. putty).

As the next logical step, shown in the demo below, I have implemented a video text terminal using the VERA chip and the VGA output from the X65-SBC. VERA is the computer’s video chip implemented in an FPGA. VERA has 128kB of internal VRAM and could be configured in various graphics modes, typically generating a 640×480-pixel resolution screen. For the purpose of a textual terminal output I am configuring VERA to display 80 columns by 60 rows of visible characters. Each character is 8×8 pixels, and each character can have one of the 16 foreground and background colors.

The following short demo shows the OF816 code running on the X65-SBC computer with the video terminal output from the VGA port (for the purpose of a youtube demo the VGA signal is captured in a PC and displayed in a live window). The text input to the OF816 software is (still) provided over the USB/UART (putty); this is a work in progress to utilize the PS/2 keyboard of the computer, next time.


VERA Mode Configuration

VERA is configured in the 80×60 character text mode with this code snippet (65c02 assembly):

    ; DCSEL=0, ADRSEL=0
    ; Enable output to VGA 640x480, enable Layer0
    lda   #TV_VGA | LAYER0_ENABLE
    sta   VERA_VIDEO_REG
    ; DCSEL=0, ADRSEL=0

    ; characters are 8x8, visible screen 80 columns, 60 rows.
    ; Complete screen is 128x128 characters, 8x8 font
    ; # Layer0 setup: Tile mode 1bpp, Map Width = 128 tiles, Map Height = 128 tiles 
    ; ==> 16384 tiles, each 2B => 32768 B
    lda   #MAP_WH_128T << 6 | MAP_WH_128T << 4 | BPP_1

    ; map entries start at address 0 of VRAM, and occupy 32kB
    lda   #mapbase_va

    ; tile (font) starts at 32kB offset
    lda   #(tilebase_va >> 11) << 2

The “map” size is 128×128 characters (tiles), but only 80×60 is visible on the screen. Using registers VERA_LAYER0_HSCROLL_REG ($9F30) and VERA_LAYER0_VSCROLL_REG ($9F32) it is possible to smoothly scroll the 80×60 viewport over the larger 128×128 map. This feature is typically used in 2D scrolling games. VERA allows map widths and heights from 32 to 256 tiles (32, 64, 128, 256). Tile width and height could be configured to 8 or 16 pixels; for the textual display we use the 8×8 pixel tiles.

Memory requirements for the map are: 128 * 128 tiles = 16834 tiles. Each tile consumes 2 Bytes of the VRAM, for the total tile-buffer memory 32768 B = 32kB. VERA supports multiple Tile Modes that differ in colour depth and in the support for additional features (e.g. V-flip, H-flip). For the textual display the most suitable mode is the “Tile mode 1 bpp (16 color text mode)“, as described in the VERA documentation:

In this mode the first byte of each map tile is the 8-bit character index (ASCII code), and the second byte contains a 4-bit background and a 4-bit foreground colour of the tile. To display a text in the 80×60 characters screen grid you just set the character index and colours for particular tiles.

Font Data

The 8-bit character index in each tile points to an 8×8-pixel “picture” (glyph) that shall be drawn at the tile position. This is the font data, and the font needs to be loaded in VERA’s VRAM during initialization phase. Each character in the font is 8×8 monochrome pixels, i.e. 8*8=64 bits, and that is 8 Bytes. The font has 256 ASCII characters, so the font data is 256 * 8 = 2048 B = 2 kB in total. The picture below shows the first 576 Bytes of the font that I am using:

In the picture I have highlighted 8 Bytes at the position 0x208 to 0x20F: 0x18 0x3C 0x66 0x7E 0x66 0x66 0x66 0x00. Since each character glyph takes 8 Bytes of the bitmap, by dividing 0x208 by 8 we get the character index (ASCII code) for the glyph: 0x208 / 8 = 0x41 = 65 decimal, which is the capital ‘A’ letter. We can visually decode the glyph bitmap data in a spreadsheet by converting each data byte to its binary representation, and (for better visibility), replacing each ‘0’ with a ‘.’, as shown below. Now you can seen the bitmap that VERA would draw to the screen for each tile referencing the 65th character index.

Loading of Font Data to VERA in 65C02 vs. 65C816 CPU

The 2 kB font data must be loaded to VERA during each system initialization phase from a RAM. Let’s compare how this can be done in software of the X65-SBC, depending if the computer is equipped with the 8-bit 65C02 processor or the newer 16-bit 65C816 processor.

The first presented code snippet is for the 16-bit 65C816 processor:

.P816     ; this is a code for 65C816.
    ; configure addressing ptr at the font data (tilebase), autoincrement
    lda   #<tilebase_va
    lda   #>tilebase_va
    lda   #((tilebase_va >> 16) & 1) | (1 << 4)

    ; configure CPU register widths:
    rep   #SHORT_I          ; 16-bit index regs X, Y
    sep   #SHORT_A          ; 8-bit memory and accu
    ; copy font data to VRAM
    ldx   #0                ; X := 0
    lda   _font8x8,x        ; A := _font8x8[x]
    sta   VERA_DATA0_REG    ; store A to VERA VRAM
    inx                     ; X++
    cpx   #SIZEOF_font8x8   ; X compare with 2048
    bne   loop_font_cp      ; branch back if not equal.

After the VERA setup code at the beginning, which takes 6 instructions, the main copy loop (highlighted in bold) takes just 6 instructions. The highlighted loop is executed 2048-times, i.e. for each byte of font data.

Now lets have a look at the same program but implemented in the assembler for the 8-bit 65C02 processor:

    ; configure addressing ptr at the font data (tilebase), autoincrement
    lda   #<tilebase_va
    lda   #>tilebase_va
    lda   #((tilebase_va >> 16) & 1) | (1 << 4)
    lda     #<_font8x8
    sta     PTR
    lda     #>_font8x8
    sta     PTR+1
    ldx     #0                 ; X := 0
    ldy     #0                 ; Y := 0
    lda     (PTR),y            ; A := PTR[Y]
    sta     VERA_DATA0_REG     ; store A to VERA VRAM
    iny                        ; Y++
    cpy     #0                 ; compare Y to zero
    bne     loop_font_cp_inner  ; branch if not zero yet

    inc     PTR+1               ; PTR := PTR + 256
    inx                           ; X++
    cpx     #SIZEOF_font8x8/256   ; compare X to 2048/256
    bne     loop_font_cp          ; branch if not equal yet.

Again, I have highlighted the copy loop. This time, however, we need two loops utilizing both the X and Y registers, and an additional 2 Byte variable PTR in the RAM (in zero page). The 8-bit X and Y registers together implement a 16-bit loop counter, in fact.

The cause of this “complexity explosion” in the assembler is the limited 8-bit width of all registers in the 65C02 CPU. To implement a loop that runs 2048-times, you need (for example) the X and Y registers, and then you need an additional variable to keep track of the source array position (the processor could do just 8-bit index, which is too less).

In comparison the 16-bit 65C816 CPU is much easier and efficient to program in the assembly language (at least in this case). The 2048-time iterated loop can run with just the X register, which is 16-bit wide now, leaving Y reg. unused, and utilizing a 16-bit indexed access (lda _font8x8,x) to get a byte from the font data into the accumulator. This coding efficiency is the reason why I prefer using the X65-SBC computer with the 16-bit CPU 65C816, and do not look back at the 65C02.

In the end I am relatively satisfied how the terminal output turned out. (Note below the inconspicuous error message “Stack u/f” from OF816 => there is more to explore here! 😎)


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.